In particular, we should not assume ControllerRefs are necessarily set.
However, we can still use ControllerRefs that do exist to avoid
interfering with controllers that do use it.
Automatic merge from submit-queue
Update npd to the official v0.3.0 release.
Update npd to the official release v0.3.0.
This also fixes a npd bug https://github.com/kubernetes/node-problem-detector/pull/98.
@dchen1107 @kubernetes/node-problem-detector-reviewers
Automatic merge from submit-queue
Add guards for StatefulSet and AppArmor upgrade testing
This PR adds automated upgrade infrastructure to allow test suites to know what versions and node images are going to be testing and whether or not they should be skipped. It also adds a guard to prevent StatefulSets from being tested with versions prior to 1.5.0, and a guard to prevent AppArmor from running on distros other than gci and ubuntu.
Automatic merge from submit-queue (batch tested with PRs 43180, 42928)
Fix waitForScheduler in scheduer predicates e2e tests
**What this PR does / why we need it**: Fixes waitForScheduler in e2e to resolve flaky tests in scheduler_predicates.go
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#42691
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 43162, 43157)
Use beta default class annotation for default storageclass tests.
**What this PR does / why we need it**:
The default storageclasses are still installed with the beta annotation, so the test should explicitly use the beta annotation.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#43150
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Add saad-ali and marun to test/OWNERS
/assign @saad-ali @marun
Also ensure that approvers are in the reviewer list, and sort both lists.
Automatic merge from submit-queue (batch tested with PRs 40964, 42967, 43091, 43115)
Add process debug information to summary test
Print out the processes in each system cgroup when the Summary API test fails, to help debug https://github.com/kubernetes/kubernetes/issues/40607
/cc @yujuhong @Random-Liu
Automatic merge from submit-queue (batch tested with PRs 40964, 42967, 43091, 43115)
fixes dswp flake
Sometimes a pod may not appear in desired state
of world immediately, we poll before failing.
It only adds additional 30s to tests in worst case.
Fixes#42990
cc @jingxu97
Automatic merge from submit-queue
Guarantee watch before action in e2e event observer helper function.
**What this PR does / why we need it**:
Adds a missing synchronization barrier to an e2e event observation helper function.
- This change should guarantee that in observeEventAfterAction,
the action is only executed after the informer begins watching
the event stream.
**Release note**:
```release-note
NONE
```
cc @kubernetes/sig-scheduling-pr-reviews @bsalamat
Automatic merge from submit-queue (batch tested with PRs 40404, 43134, 43117)
Fix ES cluster logging test
Fix#37324
Test was broken because fluentd-gcp now parses golang and fluentd-es doesn't
Automatic merge from submit-queue
Fix Deployment upgrade test.
**What this PR does / why we need it**:
When the upgrade test operates on Deployments in a pre-1.6 cluster (i.e. during the Setup phase), it needs to use the v1.5 deployment/util logic. In particular, the v1.5 logic does not filter children to only those with a matching ControllerRef.
**Which issue this PR fixes**:
Fixes#42738
**Special notes for your reviewer**:
**Release note**:
```release-note
```
cc @kubernetes/sig-apps-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 43106, 43110)
Wait for garbagecollector to be synced in test
Fix#42952
Without the `cache.WaitForCacheSync` in the test, it's possible for the GC to get a merged event of RC's creation and its update (update to deletionTimestamp != 0), before GC gets the creation event of the pod, so it's possible the GC will handle the foreground deletion of the RC before it adds the Pod to the dependency graph, thus the race.
With the `cache.WaitForCacheSync` in the test, because GC runs a single thread to process graph changes, it's guaranteed the Pod will be added to the dependency graph before GC handles the foreground deletion of the RC.
Note that this pull fixes the race in the test. The race described in the first point of #26120 still exists.
Automatic merge from submit-queue
Retry calls to ReadFileViaContainer in PD tests
**What this PR does / why we need it**:
kubectl exec occasionally fails to return a valid output string. It seems to be an issue with docker #34256. This PR retries the 'kubectl exec' call to workaround the issue. This should fix the flaky PD test issues.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#28283
**Release note**:
NONE
Automatic merge from submit-queue (batch tested with PRs 42854, 43105, 43090)
Add a timeout to allow replacement pod to become ready
Hopefully fixes https://github.com/kubernetes/kubernetes/issues/37259
```
I0314 04:26:02.562] Mar 14 04:26:02.562: INFO: Pod my-hostname-net-1bgrj still exists
I0314 04:26:22.491] Mar 14 04:26:22.491: INFO: Waiting for pod my-hostname-net-1bgrj to disappear
I0314 04:26:22.496] Mar 14 04:26:22.495: INFO: Pod my-hostname-net-1bgrj no longer exists
I0314 04:26:22.496] STEP: verifying whether the pod from the unreachable node is recreated
I0314 04:26:22.498] Mar 14 04:26:22.498: INFO: Pod name my-hostname-net: Found 3 pods out of 3
I0314 04:26:22.499] STEP: ensuring each pod is running
I0314 04:26:22.499] STEP: trying to dial each unique pod
I0314 04:26:22.579] Mar 14 04:26:22.579: INFO: Controller my-hostname-net: Got expected result from replica 1 [my-hostname-net-5jrdb]: "my-hostname-net-5jrdb", 1 of 3 required successes so far
I0314 04:26:22.642] Mar 14 04:26:22.642: INFO: Controller my-hostname-net: Got expected result from replica 2 [my-hostname-net-mjf3c]: "my-hostname-net-mjf3c", 2 of 3 required successes so far
I0314 04:31:22.645] Mar 14 04:31:22.644: INFO: Controller my-hostname-net: Failed to Get from replica 3 [my-hostname-net-rf46s]: Get https://35.184.87.178/api/v1/namespaces/e2e-tests-network-partition-s5gqt/pods/my-hostname-net-rf46s/proxy/: context deadline exceeded
```
The issue appears to be that we have a race between the pod being "running + ready" and being accessible via the APIServer proxy.
cc @kow3ns @bowei @davidopp
Automatic merge from submit-queue (batch tested with PRs 42854, 43105, 43090)
Move e2e sched event predicates to new file.
**What this PR does / why we need it**:
Small e2e test refactor for scheduler. Moves scheduler event predicates out of opaque_resource.go for reuse elsewhere.
**Release note**:
```release-note
NONE
```
cc @kubernetes/sig-scheduling-pr-reviews @timothysc @bsalamat
When the upgrade test operates on Deployments in a pre-1.6 cluster
(i.e. during the Setup phase), it needs to use the v1.5 deployment/util
logic. In particular, the v1.5 logic does not filter children to only
those with a matching ControllerRef.
Automatic merge from submit-queue (batch tested with PRs 43018, 42713)
Log instead of fail on GLBCs tendency to leak resources
**What this PR does / why we need it**:
Stops upgrade tests from flaking because the GLBC does not cleanup all resources due to a race condition.
**Which issue this PR fixes**: fixes#38569
**Special notes for your reviewer**:
To be reviewed by @mml
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 42775, 42991, 42968, 43029)
Add e2e test for Deployment controllerRef orphaning and adoption
Follow up #42908
@enisoc @kubernetes/sig-apps-bugs @kargakis
Automatic merge from submit-queue (batch tested with PRs 42775, 42991, 42968, 43029)
Initial breakout of scheduling e2es to help assist in assignment and refactoring
**What this PR does / why we need it**:
This PR segregates the scheduling specific e2es to isolate the library which will assist both in refactoring but also auto-assignment of issues.
**Which issue this PR fixes**
xref: https://github.com/kubernetes/kubernetes/issues/42691#issuecomment-285563265
**Special notes for your reviewer**:
All this change does is shuffle code around and quarantine. Behavioral, and other cleanup changes, will be in follow on PRs. As of today, the e2es are a monolith and there is massive symbol pollution, this 1st step allows us to segregate the e2es and tease apart the dependency mess.
**Release note**:
```
NONE
```
/cc @kubernetes/sig-scheduling-pr-reviews @kubernetes/sig-testing-pr-reviews @marun @skriss
/cc @gmarek - same trick for load + density, etc.
Automatic merge from submit-queue (batch tested with PRs 43034, 43066)
Allow StatefulSet controller to PATCH Pods.
**What this PR does / why we need it**:
StatefulSet now needs the PATCH permission on Pods since it calls into ControllerRefManager to adopt and release. This adds the permission and the missing e2e test that should have caught this.
**Which issue this PR fixes**:
**Special notes for your reviewer**:
This is based on #42925.
**Release note**:
```release-note
```
cc @kubernetes/sig-apps-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 42942, 42935)
[Bug] Handle container restarts and avoid using runtime pod cache while allocating GPUs
Fixes#42412
**Background**
Support for multiple GPUs is an experimental feature in v1.6.
Container restarts were handled incorrectly which resulted in stranding of GPUs
Kubelet is incorrectly using runtime cache to track running pods which can result in race conditions (as it did in other parts of kubelet). This can result in same GPU being assigned to multiple pods.
**What does this PR do**
This PR tracks assignment of GPUs to containers and returns pre-allocated GPUs instead of (incorrectly) allocating new GPUs.
GPU manager is updated to consume a list of active pods derived from apiserver cache instead of runtime cache.
Node e2e has been extended to validate this failure scenario.
**Risk**
Minimal/None since support for GPUs is an experimental feature that is turned off by default. The code is also isolated to GPU manager in kubelet.
**Workarounds**
In the absence of this PR, users can mitigate the original issue by setting `RestartPolicyNever` in their pods.
There is no workaround for the race condition caused by using the runtime cache though.
Hence it is worth including this fix in v1.6.0.
cc @jianzhangbjz @seelam @kubernetes/sig-node-pr-reviews
Replaces #42560
Automatic merge from submit-queue
Allow DaemonSet controller to PATCH pods, and add more steps and logs in DaemonSet pods adoption e2e test
DaemonSet pods adoption failed because DS controller aren't allowed to patch pods when claiming pods.
[Edit] This PR fixes#42908 by modifying RBAC to allow DaemonSet controllers to patch pods, as well as adding more logs and steps to the original e2e test to make debugging easier.
Tested locally with a local cluster and GCE cluster.
@kargakis @lukaszo @kubernetes/sig-apps-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 42940, 42906, 42970, 42848)
Move node and event observer helpers to e2e/common
**What this PR does / why we need it**:
Moves existing test helper functions in OIR e2e tests to `test/e2e/common`. These functions wrap informers to help test writers to observe events instead of long-polling for status updates.
For usage examples, see `test/e2e/opaque_resource.go`.
cc @kubernetes/sig-scheduling-misc
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Add fabianofranz as approver for test/e2e/kubectl.go
Adding myself as approver for `kubectl` end-to-end tests.
```release-note
NONE
```