Automatic merge from submit-queue (batch tested with PRs 51311, 52575, 53169). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix a scheduler flaky e2e test
**What this PR does / why we need it**:
Makes a scheduler e2e test that verifies the resource limit predicate more robust.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#53066
**Release note**:
```release-note
NONE
```
@kubernetes/sig-scheduling-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 50280, 52529, 53093, 53108, 53168). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Improve PVC ref volume metric test robustness
This test has been flaking. The current working theory is that
volume stats collection didn't run in time to grab the metrics
from the newly created pod.
Made the following changes:
- Added more logs to help debug future failures
- Poll metrics a few additional times before failing the test
fixes#53150
This test has been flaking. The current working theory is that
volume stats collection didn't run in time to grab the metrics
from the newly created pod.
Made the following changes:
- Added more logs to help debug future failures
- Poll metrics a few additional times before failing the test
Automatic merge from submit-queue (batch tested with PRs 52630, 53110, 53136, 53075). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix host network flake tests
**What this PR does / why we need it**:
Fix flaky test "Security Context when creating a pod in the host network namespace should listen on same port in the host network containers".
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#53091
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
federation: simplify deepcopy calls
We have static DeepCopy now without the possibility of errors. Makes the code much simpler.
Automatic merge from submit-queue (batch tested with PRs 50988, 50509, 52660, 52663, 52250). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Added device plugin e2e kubelet failure test
Signed-off-by: Renaud Gaubert <renaud.gaubert@gmail.com>
**What this PR does / why we need it**:
This is part of issue #52859 (fixes#52859)
This PR adds a e2e_node test for the device plugin.
Specifically it implements testing of failure handling by the device plugin components in case Kubelet restart / crashes.
I might try to refactor the GPU tests in a later PR.
**Special notes for your reviewer**:
@jiayingz @vishh
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 52990, 53064, 52686, 52221, 53069). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Allow kubelet metrics tests to run on gke
**What this PR does / why we need it**:
On GKE, you can still access kubelet metrics, so allow the kubelet metrics test.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
NONE
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Improve HT detection
**What this PR does / why we need it**:
Fix Cpu Manager e2e node tests that fail due to hard-coded `Thread(s) per core` char position.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#52988
Automatic merge from submit-queue (batch tested with PRs 52721, 53057, 52493, 52998, 52896). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Move deployment collision avoidance e2e test to integration
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: ref #52113
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
In autoscaling tests, add waiting for new pool to become ready
This adds missing timeout when adding a node pool in GKE scale to 0 test and improves logging error when enabling autoscaling.
Automatic merge from submit-queue (batch tested with PRs 51648, 53030, 53009). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Fixed intermitant e2e aggregator test on GKE.
**What this PR does / why we need it**: Issue was caused by another test cleaning up its namespace.
This caused the namespace controller to try to clean up that namespace.
This involves deleting all flunders under that namespace.
However the sample-apiserver was not honoring the namespace filter.
So the flunders for the test would randomly disappear.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#50945
**Special notes for your reviewer**: Requires we fix the container image to contain this fix to work.
**Release note**:
```release-note NONE
```
Fixes issues/50945.
Issue was caused by another test cleaning up its namespace.
This caused the namespace controller to try to clean up that namespace.
This involves deleting all flunders under that namespace.
However the sample-apiserver was not honoring the namespace filter.
So the flunders for the test would randomly disappear.
Fixed image path to pick up newly built fixes from this PR.
Automatic merge from submit-queue (batch tested with PRs 51759, 53001, 52806). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Fix broken statefulset e2e test
**What this PR does / why we need it**:
Fixes the CockroachDB statefulset e2e test.
This was broken back in #43637 when the logic in
`(*StatefulSetTester).CreateStatefulSet` switched from using
`generated.ReadOrDie` to read the entire service.yaml file and pass it
to kubectl to using `manifest.SvcFromManifest`, which assumes that the
file contains only a single service.
To fix the test, just remove the second service, which isn't needed to test the Statefulset functionality.
**Which issue this PR fixes**:
Fixes#52750
**Special notes for your reviewer**:
N/A
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51067, 52319, 52803, 52961, 51972). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Add support for skeleton in GetSigner
Adding support for skeleton to GetSigner to be able to run
e2e tests against a bare metal multinode cluster.
Closes#35613
Automatic merge from submit-queue (batch tested with PRs 51067, 52319, 52803, 52961, 51972). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Move prometheus metrics for docker operations into dockershim
Automatic merge from submit-queue (batch tested with PRs 52960, 52373). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Refactor eviction tests
fixes: #52203
We have a bunch of eviction tests, which each break independently, and take a large amount of time to fix.
This refactors these tests to share the core eviction testing logic. Each tests needs only to set kubelet flags, and specify which pods to run.
I decided to omit the memory eviction tests because they work. Best not to disturb them.
A large portion of the code changes are the renaming of inode_eviction_test.go -> eviction_test.go
This should probably wait until after https://github.com/kubernetes/kubernetes/pull/50392
/assign @mtaufen @Random-Liu
Automatic merge from submit-queue (batch tested with PRs 52905, 52766). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Refactor parsing cluster autoscaler status, add logging error
Minor improvements to autoscaling test suite and e2e framework.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Fix GCE LB resource cleanup for service e2e tests.
**What this PR does / why we need it**: Fix GCE LB resource cleanup logic.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#52347
**Special notes for your reviewer**:
/assign @shyamjvs @nicksardo
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 52880, 52855, 52761, 52885, 52929). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Don't need to check useAnnotation in dns e2e test
**What this PR does / why we need it**:
hostname/subdomain annotations were removed in #44137. This PR removes the check.
Also, `var dnsServiceLabelSelector` is not used anymore.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
ref: https://github.com/kubernetes/kubernetes/pull/44137
**Special notes for your reviewer**:
/cc @bowei @MrHohn
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 52880, 52855, 52761, 52885, 52929). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Remove cloud provider rackspace
**What this PR does / why we need it**:
For now, we have to implement functions in both `rackspace` and `openstack` packages if we want to add function for cinder, for example [resize for cinder](https://github.com/kubernetes/kubernetes/pull/51498). Since openstack has implemented all the functions rackspace has, and rackspace is considered deprecated for a long time, [rackspace deprecated](https://github.com/rackspace/gophercloud/issues/592) ,
after talking with @mikedanese and @jamiehannaford offline , i sent this PR to remove `rackspace` in favor of `openstack`
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#52854
**Special notes for your reviewer**:
**Release note**:
```release-note
The Rackspace cloud provider has been removed after a long deprecation period. It was deprecated because it duplicates a lot of the OpenStack logic and can no longer be maintained. Please use the OpenStack cloud provider instead.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Allow dns e2e test case for ExternalName to run on aws
**What this PR does / why we need it**:
#52840 uses allocated clusterIP instead of hard-coded one. So we don't need to care about the clusterIP range of the CI job config. Let it run on pull-kubernetes-e2e-kops-aws
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#47224
**Special notes for your reviewer**:
ref: https://github.com/kubernetes/test-infra/pull/4462
/cc @bowei @MrHohn @justinsb
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
bazel: build/test almost everything
**What this PR does / why we need it**: Miscellaneous cleanups and bug fixes. The main motivating idea here was to make `bazel build //...` and `bazel test //...` mostly work. (There's a few reasons these still don't work, but we're a lot closer.)
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
/assign @BenTheElder @mikedanese @spxtr
Automatic merge from submit-queue (batch tested with PRs 52469, 52574, 52330, 52689, 52829). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Fixing E2E Test - After restarting kubelet test expects node's status to be NotReady
**What this PR does / why we need it**:
This PR is fixing the e2e tests involves restarting the kubelets. After the kubelet is restarted, test expect the desired state to be NotReady.
After restarting the kubelet we should wait for some time and then check nodes status to be Ready.
Node should not be checked for NotReady state, after restarting kubelet.
**Which issue this PR fixes**
fixes # https://github.com/vmware/kubernetes/issues/285
**Special notes for your reviewer**:
@BaluDontu @rohitjogvmw @tusharnt
Test logs before fix
-----
STEP: Restarting kubelet
Sep 15 11:26:32.768: INFO: Attempting sudo systemctl restart kubelet
Sep 15 11:26:33.001: INFO: ssh root@10.162.22.205:22: command: sudo systemctl restart kubelet
Sep 15 11:26:33.001: INFO: ssh root@10.162.22.205:22: stdout: ""
Sep 15 11:26:33.001: INFO: ssh root@10.162.22.205:22: stderr: ""
Sep 15 11:26:33.001: INFO: ssh root@10.162.22.205:22: exit code: 0
Sep 15 11:26:33.002: INFO: Waiting up to 1m0s for node kubernetes-node2 condition Ready to be false
Sep 15 11:26:33.012: INFO: Condition Ready of node kubernetes-node2 is true instead of false. Reason: KubeletReady, message: kubelet is posting ready status
Sep 15 11:26:35.023: INFO: Condition Ready of node kubernetes-node2 is true instead of false. Reason: KubeletReady, message: kubelet is posting ready status
Sep 15 11:26:37.032: INFO: Condition Ready of node kubernetes-node2 is true instead of false. Reason: KubeletReady, message: kubelet is posting ready status
Sep 15 11:26:39.041: INFO: Condition Ready of node kubernetes-node2 is true instead of false. Reason: KubeletReady, message: kubelet is posting ready status
Sep 15 11:26:41.051: INFO: Condition Ready of node kubernetes-node2 is true instead of false. Reason: KubeletReady, message: kubelet is posting ready status
Sep 15 11:26:43.061: INFO: Condition Ready of node kubernetes-node2 is true instead of false. Reason: KubeletReady, message: kubelet is posting ready status
Sep 15 11:26:45.070: INFO: Condition Ready of node kubernetes-node2 is true instead of false. Reason: KubeletReady, message: kubelet is posting ready status
Sep 15 11:26:47.080: INFO: Condition Ready of node kubernetes-node2 is true instead of false. Reason: KubeletReady, message: kubelet is posting ready status
Sep 15 11:26:49.093: INFO: Condition Ready of node kubernetes-node2 is true instead of false. Reason: KubeletReady, message: kubelet is posting ready status
Sep 15 11:26:51.105: INFO: Condition Ready of node kubernetes-node2 is true instead of false. Reason: KubeletReady, message: kubelet is posting ready status
Sep 15 11:26:53.117: INFO: Condition Ready of node kubernetes-node2 is true instead of false. Reason: KubeletReady, message: kubelet is posting ready status
Sep 15 11:26:55.128: INFO: Condition Ready of node kubernetes-node2 is true instead of false. Reason: KubeletReady, message: kubelet is posting ready status
Sep 15 11:26:57.140: INFO: Condition Ready of node kubernetes-node2 is true instead of false. Reason: KubeletReady, message: kubelet is posting ready status
Sep 15 11:26:59.151: INFO: Condition Ready of node kubernetes-node2 is true instead of false. Reason: KubeletReady, message: kubelet is posting ready status
Sep 15 11:27:01.158: INFO: Condition Ready of node kubernetes-node2 is true instead of false. Reason: KubeletReady, message: kubelet is posting ready status
Sep 15 11:27:03.167: INFO: Condition Ready of node kubernetes-node2 is true instead of false. Reason: KubeletReady, message: kubelet is posting ready status
Sep 15 11:27:05.180: INFO: Condition Ready of node kubernetes-node2 is true instead of false. Reason: KubeletReady, message: kubelet is posting ready status
Sep 15 11:27:07.188: INFO: Condition Ready of node kubernetes-node2 is true instead of false. Reason: KubeletReady, message: kubelet is posting ready status
Sep 15 11:27:09.210: INFO: Condition Ready of node kubernetes-node2 is true instead of false. Reason: KubeletReady, message: kubelet is posting ready status
Sep 15 11:27:11.221: INFO: Condition Ready of node kubernetes-node2 is true instead of false. Reason: KubeletReady, message: kubelet is posting ready status
Sep 15 11:27:13.231: INFO: Condition Ready of node kubernetes-node2 is true instead of false. Reason: KubeletReady, message: kubelet is posting ready status
Sep 15 11:27:15.240: INFO: Condition Ready of node kubernetes-node2 is true instead of false. Reason: KubeletReady, message: kubelet is posting ready status
Sep 15 11:27:17.249: INFO: Condition Ready of node kubernetes-node2 is true instead of false. Reason: KubeletReady, message: kubelet is posting ready status
Sep 15 11:27:19.263: INFO: Condition Ready of node kubernetes-node2 is true instead of false. Reason: KubeletReady, message: kubelet is posting ready status
Sep 15 11:27:21.272: INFO: Condition Ready of node kubernetes-node2 is true instead of false. Reason: KubeletReady, message: kubelet is posting ready status
Sep 15 11:27:23.283: INFO: Condition Ready of node kubernetes-node2 is true instead of false. Reason: KubeletReady, message: kubelet is posting ready status
Sep 15 11:27:25.309: INFO: Condition Ready of node kubernetes-node2 is true instead of false. Reason: KubeletReady, message: kubelet is posting ready status
Sep 15 11:27:27.317: INFO: Condition Ready of node kubernetes-node2 is true instead of false. Reason: KubeletReady, message: kubelet is posting ready status
Sep 15 11:27:29.327: INFO: Condition Ready of node kubernetes-node2 is true instead of false. Reason: KubeletReady, message: kubelet is posting ready status
Sep 15 11:27:31.342: INFO: Condition Ready of node kubernetes-node2 is true instead of false. Reason: KubeletReady, message: kubelet is posting ready status
Sep 15 11:27:33.343: INFO: Node kubernetes-node2 didn't reach desired Ready condition status (false) within 1m0s
Sep 15 11:27:33.343: INFO: Node kubernetes-node2 failed to enter NotReady state
[AfterEach] [sig-storage] PersistentVolumes:vsphere
Test logs after fix
-----
STEP: Restarting kubelet
Sep 18 15:40:49.066: INFO: Checking if sudo command is present
Sep 18 15:40:49.342: INFO: Checking if systemctl command is present
Sep 18 15:40:49.573: INFO: Attempting `sudo systemctl status kubelet | grep 'Main PID'`
Sep 18 15:40:49.733: INFO: ssh root@10.162.16.97:22: command: sudo systemctl status kubelet | grep 'Main PID'
Sep 18 15:40:49.733: INFO: ssh root@10.162.16.97:22: stdout: " Main PID: 19715 (docker)\n"
Sep 18 15:40:49.733: INFO: ssh root@10.162.16.97:22: stderr: ""
Sep 18 15:40:49.733: INFO: ssh root@10.162.16.97:22: exit code: 0
Sep 18 15:40:49.733: INFO: Attempting `sudo systemctl restart kubelet`
Sep 18 15:40:49.986: INFO: ssh root@10.162.16.97:22: command: sudo systemctl restart kubelet
Sep 18 15:40:49.986: INFO: ssh root@10.162.16.97:22: stdout: ""
Sep 18 15:40:49.986: INFO: ssh root@10.162.16.97:22: stderr: ""
Sep 18 15:40:49.986: INFO: ssh root@10.162.16.97:22: exit code: 0
Sep 18 15:40:49.988: INFO: Attempting `sudo systemctl status kubelet | grep 'Main PID'`
Sep 18 15:40:50.158: INFO: ssh root@10.162.16.97:22: command: sudo systemctl status kubelet | grep 'Main PID'
Sep 18 15:40:50.158: INFO: ssh root@10.162.16.97:22: stdout: " Main PID: 25021 (docker)\n"
Sep 18 15:40:50.158: INFO: ssh root@10.162.16.97:22: stderr: ""
Sep 18 15:40:50.158: INFO: ssh root@10.162.16.97:22: exit code: 0
Sep 18 15:40:50.158: INFO: Noticed that kubelet PID is changed. Waiting for 30 Seconds for Kubelet to come back
Sep 18 15:41:20.159: INFO: Waiting up to 1m0s for node kubernetes-node4 condition Ready to be true
STEP: Testing that written file is accessible.
Sep 18 15:41:20.191: INFO: Running '/Users/divyenp/github/vmware/kubernetes/_output/dockerized/bin/darwin/amd64/kubectl --server=https://10.162.0.45 --kubeconfig=/Users/divyenp/.kube/config exec --namespace=e2e-tests-pv-9j8j0 pvc-tester-3t9ds -- /bin/sh -c cat /mnt/_SUCCESS'
Sep 18 15:41:20.855: INFO: stderr: ""
Sep 18 15:41:20.855: INFO:
Sep 18 15:41:20.855: INFO: Volume mount detected on pod pvc-tester-3t9ds and written file /mnt/_SUCCESS is readable post-restart.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51902, 52718, 52687, 52137, 52697). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Multi-arch allowPrivilegeEscalation tests
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#52698
**Special notes for your reviewer**:
**Release note**:
```NONE
```
Automatic merge from submit-queue (batch tested with PRs 52485, 52443, 52597, 52450, 51971). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Removing PrometheusPushGateway --prom-push-gateway flag from e2e tests.
**What this PR does / why we need it**: Removing obsolete PrometheusPushGateway --prom-push-gateway flag from e2e tests.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#45947
**Special notes for your reviewer**:
**Release note**:
```release-note
Removing `--prom-push-gateway` flag from e2e tests
```
Automatic merge from submit-queue (batch tested with PRs 50392, 52108, 52083, 52134, 51526). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
e2e: minor changes to network/service testing utils
Add more logging to help debug. Also refactor several functions to improve
reusability.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
inode eviction tests fill a constant number of inodes
Issue: #52203
inode eviction tests pass often on some OS distributions, and almost never on others. See [these testgrid tests](https://k8s-testgrid.appspot.com/sig-node#kubelet-flaky-gce-e2e&include-filter-by-regex=Inode)
These differences are most likely because different images have fewer or greater inode capacity, and thus percentage based rules (e.g. inodesFree<50%) make the test more stressful for some OS distributions than others.
This changes the test to require that a constant number of inodes are consumed, regardless of the number of inodes in the filesystem, by setting the new threshold to:
nodefs.inodesFree<(current_inodes_free - 200k)
so that after pods consume 200k inodes, they will be evicted. It requires querying the summary API until we successfully determine the current number of free Inodes.
Automatic merge from submit-queue (batch tested with PRs 51929, 52015, 51906, 52069, 51542). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
move specialDefaultResourcePrefixes out of vendor/k8s.io/apiserver
just a clean-up, fixes TODO: move out of this package, it is not generic
@sttts PTAL
/assign @sttts
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
should use time.Since instead of time.Now().Sub
**What this PR does / why we need it**:
should use time.Since instead of time.Now().Sub
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
NONE
**Special notes for your reviewer**:
NONE
**Release note**:
```release-note
```
NONE
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Debug for issues #50945
Aggregator e2e test is intermittantly failing on GKE but not GCE.
Adding the following debugging for help trace issue.
Make sure we always use the same rest client.
Randomly generate the flunder resource name to detect parallel tests.
Print endpoints for sample-system in case multiple instances.
Print original and new pods in case the pod has been restarted.
**What this PR does / why we need it**: Adds debugging for aggregator e2e test to track down GKE flakiness.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#50945
**Special notes for your reviewer**: This is primarily additional debugging information.
**Release note**:
```release-note NONE
```
Aggregator e2e test is intermittantly failing on GKE but not GCE.
Adding the following debugging for help trace issue.
Make sure we always use the same rest client.
Randomly generate the flunder resource name to detect parallel tests.
Print endpoints for sample-system in case multiple instances.
Print original and new pods in case the pod has been restarted.
Fixed import list.
Remove rand seed.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Don't specify clusterIP in dns e2e test
**What this PR does / why we need it**:
Different upgrade tests may configure different service clusterIP ranges. If we specify the clusterIP in dns e2e test, it will succeed in one upgrade test but fail in another. This PR doesn't specify clusterIP. It just uses the allocated clusterIP.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#50274
**Special notes for your reviewer**:
Hope this can really fixes that issue.
/cc @thockin @MrHohn
**Release note**:
```release-note
NONE
```
This was broken back in #43637 when the logic in
`(*StatefulSetTester).CreateStatefulSet` switched from using
`generated.ReadOrDie` to read the entire service.yaml file and pass it
to kubectl to using `manifest.SvcFromManifest`, which assumes that the
file contains only a single service.
Fixes#52750
Automatic merge from submit-queue (batch tested with PRs 52843, 52710, 52821, 52844). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
improve retrying logic when checking CA status
This should reduce the flake rate in cluster size autoscaling test suite.
Automatic merge from submit-queue (batch tested with PRs 48406, 52819). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Fixed nil dereference in dynamic provisioning e2e tests
**What this PR does / why we need it**: Fixed nil dereference in dynamic provisioning e2e tests.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#52815
**Release note**:
```release-note-none
NONE
```
/sig storage
/assign @saad-ali
/cc @wongma7
/release-note-none
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Retry if possible while creating latency pods in density test
Saw the [last run](https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scale-performance/37) of density test on 5k-node fail due to it:
```
Expected error:
<*errors.StatusError | 0xc44f2fd7a0>: {
ErrStatus: {
TypeMeta: {Kind: "", APIVersion: ""},
ListMeta: {SelfLink: "", ResourceVersion: "", Continue: ""},
Status: "Failure",
Message: "timeout",
Reason: "",
Details: nil,
Code: 500,
},
}
timeout
not to have occurred
```
cc @kubernetes/sig-scalability-misc
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Support kubernetes-anywhere provider
**What this PR does / why we need it**:
Implements a new `kubernetes-anywhere` provider to allow upgrade testing in the e2e binary. This is the final step to allow https://github.com/kubernetes/test-infra/pull/4495 and https://github.com/kubernetes/kubernetes-anywhere/pull/450.
**Which issue this PR fixes**:
https://github.com/kubernetes/kubeadm/issues/311
**Special notes for your reviewer**:
Some questions I had
- Does the `--provider` flag specified [here](dbbf6261e0/jobs/config.json (L8587)) get sent to the flag defined [here](https://github.com/kubernetes/kubernetes/blob/master/test/e2e/framework/test_context.go#L219)? Or should I add another `--provider` flag inside `--upgrade_args` like this: `--upgrade_args=... --provider=kubernetes-anywhere`?
- Is it necessary to add waiting logic after the `make` command, or will it implicitly handle that by itself?
Some other points:
- I chose `sed` to manipulate the current kubernetes-anywhere `.config` rather than duplicating another [`anywhere.go`](https://github.com/kubernetes/test-infra/blob/master/kubetest/anywhere.go). One suggestion was to use `jq` but since the config on disk is not serialized to JSON yet, I'm not sure how that'd work.
- Since I don't have a GCE/GKE account or vCenter, I can't actually verify the e2e binary works. I've managed to build it, but if somebody could quickly run a smoke test, I'd appreciate it. This is my first poke around test-infra and e2e, so there might be some plumbing missing
/cc @jessicaochen @luxas @pipejakob @roberthbailey
Automatic merge from submit-queue (batch tested with PRs 52500, 52533). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Add mount options e2e test
**What this PR does / why we need it**: A test for newly added StorageClass.mountOptions and PV.mountOptions: provision a pv using a class with its storageclass.mountoptions set, and the end result should be that the mount options can be seen from the mounter.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: Fixes#52138
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
In autoscaling integration test, use allocatable instead of capacity for node memory
This makes the remaining cluster autoscaling test (integration test of HPA and CA working together to scale up the cluster) use node allocatable resources when computing how much memory we need to consume in order to trigger scale up/prevent scale down. Follow up to #52650 as that one is already merging.
cc @wasylkowski
Automatic merge from submit-queue (batch tested with PRs 51337, 47080, 52646, 52635, 52666). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Fix: update system spec to support Docker 17.03
Docker 17.03 is 1.13 with bug fixes so they are of the same minor version release. We've validated them both in https://github.com/kubernetes/kubernetes/issues/42926. This PR changes the system spec to support Docker 17.03.
**This should be in 1.8.**
**Release note**:
```
Kubernetes 1.8 supports docker version 17.03.x.
```
/assign @Random-Liu
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
In cluster size autoscaling tests, use allocatable instead of capacity for node memory
This makes cluster size autoscaling e2e tests use node allocatable resources when computing how much memory we need to consume in order to trigger scale up/prevent scale down. It should fix failing tests in GKE.
Automatic merge from submit-queue (batch tested with PRs 52350, 52659). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Add e2e test for storageclass.reclaimpolicy
**What this PR does / why we need it**: Adds another dynamic provisioning test where the storageclass.reclaimpolicy == retain. Have to manually delete the PV at the end of the test.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: https://github.com/kubernetes/kubernetes/issues/52138
**Special notes for your reviewer**: I have not tested it but it's ready for review, I will comment and edit this when i've verified it actually works.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51796, 52223)
Add bsalamat to sig-scheduling-maintainers
**What this PR does / why we need it**:
Adds bsalamat to sig-scheduling-maintainers.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes # N/A
**Release note**:
```release-note
NONE
```
@kubernetes/sig-scheduling-pr-reviews @davidopp @timothysc @k82cn @wojtek-t
Automatic merge from submit-queue
Bugfix: Fix e2e Flaky Apps/Job BackoffLimit test
This fix is linked to the PR #51153 that introduce the `JobSpec.BackoffLimit`.
Previously the Timeout used in the test was too aggressive and generates flaky test execution. Now it used the default `framework.JobTimeout` used in others tests.
**What this PR does / why we need it**:
This PR should fix flaky "[sig-apps] Job should exceed backoffLimit" test, due to a too short timeout duration.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
fixes#51153
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 51824, 50476, 52451, 52009, 52237)
Improve apiserver metrics reporting
Normalize "WATCHLIST" to "WATCH", add "scope" to the other metrics (listing 50k pods is != listing pods in a namespace), and add a new scope "resource" to cover individual resource calls.
This roughly aligns metrics with our ACL model (technically resource scope is GET, but POST to a subresource and POST to a namespace are not the same thing).
```release-note
WATCHLIST calls are now reported as WATCH verbs in prometheus for the apiserver_request_* series. A new "scope" label is added to all apiserver_request_* values that is either 'cluster', 'resource', or 'namespace' depending on which level the query is performed at.
```
Automatic merge from submit-queue (batch tested with PRs 52442, 52247, 46542, 52363, 51781)
Add more tests for pod preemption
**What this PR does / why we need it**:
Adds more e2e and integration tests for pod preemption.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
This PR is based on #50949. Only the last commit is new.
**Release note**:
```release-note
NONE
```
ref/ #47604
@kubernetes/sig-scheduling-pr-reviews @davidopp
Automatic merge from submit-queue
[fluentd-gcp addon] Remove some e2e tests out of blocking suites
Fixes https://github.com/kubernetes/kubernetes/issues/52433
Some Stackdriver Logging e2e tests are broken in release-blocking suites:
- Due to the change in Docker 1.13, on some systems logs are automatically split by 16K chunks. This PR removes an e2e test that assumes otherwise
- In large clusters, it's not possible to ingest system logs from all nodes
Since it's not a Kubernetes problem per se, mitigating this by removing these tests from blocking suites.
Automatic merge from submit-queue
Fix failing autoscaling test in GKE
This should fix `[sig-autoscaling] Cluster size autoscaling [Slow] should increase cluster size if pending pods are small and there is another node pool that is not autoscaled [Feature:ClusterSizeAutoscalingScaleUp]` by getting a list of nodes from GKE nodepool in a different way (filtering nodes by labels.) Currently, gcloud command used for it is failing, as we only have GKE node pool name in the test and not the actual MIG name.
Automatic merge from submit-queue (batch tested with PRs 52376, 52439, 52382, 52358, 52372)
Remove the conversion of client config
It was needed because the clientset code in client-go was a copy of the clientset code in Kubernetes.. client-go is authoritative now, so we can remove the nasty copy.
This fix is linked to the PR #51153 that introduce the
JobSpec.BackoffLimit.
Previously the Timeout used in the test was too agressive and generates
flaky test execution. Now it used the default framework.JobTimeout used
in others tests.
Automatic merge from submit-queue
Make CPU constraint for l7-lb-controller in density test scale with #nodes
Just noticed that we changed the memory last time, but didn't change cpu. From the last run:
```
Sep 13 04:25:03.360: INFO: Unexpected error occurred: Container l7-lb-controller-v0.9.6-gce-scale-cluster-master/l7-lb-controller is using 0.642709233/0.15 CPU
```
Automatic merge from submit-queue (batch tested with PRs 52316, 52289, 52375)
Extends GPUDevicePlugin e2e test to exercise device plugin restarts.
**What this PR does / why we need it**:
This is part of issue #52189 but does not fix it.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 52316, 52289, 52375)
[fluentd-gcp addon] Trim too long log entries due to Stackdriver limitations
Stackdriver doesn't support log entries bigger than 100KB, so by default fluentd plugin just drops such entries. To avoid that and increase the visibility of this problem it's suggested to trim long lines instead.
/cc @igorpeshansky
```release-note
[fluentd-gcp addon] Fluentd will trim lines exceeding 100KB instead of dropping them.
```
Automatic merge from submit-queue
Version gates the ephemeral storage e2e test
Version gates the ephemeral storage e2e test.
**Release note**:
```
NONE
```
@kubernetes/sig-testing-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 48226, 52046, 52231, 52344, 52352)
StatefulSet: Deflake e2e RunHostCmd more.
It turns out that at some points while the Node is recovering from a reboot, we get a different kind of error ("unable to upgrade connection"). Since we can't distinguish these transient errors from an error encountered after successfully executing the remote command, let's just retry all errors for 5min. If this doesn't work, I'm gonna blame it on sig-node.
ref #48031
Automatic merge from submit-queue (batch tested with PRs 48226, 52046, 52231, 52344, 52352)
Port Guestbook tests to mutiarch
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#52232
**Special notes for your reviewer**:
**Release note**:
```NONE
NONE
```
Automatic merge from submit-queue
Node e2e tests for the CPU Manager.
**What this PR does / why we need it**:
- Adds node e2e tests for the CPU Manager implementation in https://github.com/kubernetes/kubernetes/pull/49186.
**Special notes for your reviewer**:
- Previous PR in this series: #51180
- Only `test/e2e_node/cpu_manager_test.go` must be reviewed as a part of this PR (i.e., the last commit). Rest of the comments belong in #51357 and #51180.
- The tests have been on run on `n1-standard-n4` and `n1-standard-n2` instances on GCE.
To run this node e2e test, use the following command:
```sh
make test-e2e-node TEST_ARGS='--feature-gates=DynamicKubeletConfig=true' FOCUS="CPU Manager" SKIP="" PARALLELISM=1
```
CC @ConnorDoyle @sjenning
It turns out that at some points while the Node is recovering from a
reboot, we get a different kind of error ("unable to upgrade
connection"). Since we can't distinguish these transient errors from an
error encountered after successfully executing the remote command,
let's just retry all errors for 5min. If this doesn't work, I'm gonna
blame it on sig-node.
Automatic merge from submit-queue (batch tested with PRs 52007, 52196, 52169, 52263, 52291)
Summary tests should expect rss usage now
**What this PR does / why we need it**:
Fixes summary test to expect rss usage now.
Previously, cAdvisor reported rss and not total_rss, but that has now been fixed in most recent version of cAdvisor now in the project.
See: https://github.com/kubernetes/kubernetes/pull/43399#issuecomment-287858599
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50289, 52106)
Fix AppArmor test at scale
**What this PR does / why we need it**:
The AppArmor test only runs on a single node, but previously was loading the necessary profiles to every node. This caused unnecessary churn in very large clusters, so this PR updates the test to only load the profiles to a single node, and ensure the test pod is run on that node (using pod affinity).
**Which issue this PR fixes**: fixes#51791
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 52227, 52120)
Fix discovery restmapper finding resources in non-preferred versions
Fixes: #52219
Also reverts behavioral changes to tests that version-qualified cronjobs to work around this issue.
The discovery rest mapper was only populating the priority rest mapper's search list with preferred groupversions.
That meant that if a resource existed in multiple non-preferred versions, AND did not exist in the preferred version (like cronjob, which only exists in v1beta2.batch and v2alpha1.batch, but not v1.batch), the priority restmapper would not find it in its group/version priority list, and would return an error.
```release-note
Fixed an issue looking up cronjobs when they existed in more than one API version
```
Automatic merge from submit-queue
Extend nvidia-gpus e2e test to include a device plugin based test
**What this PR does / why we need it**:
This is needed to verify device plugin feature.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/kubernetes/features/issues/368
**Special notes for your reviewer**:
Related test_infra PR: https://github.com/kubernetes/test-infra/pull/4265
**Release note**:
Add an e2e test for nvidia gpu device plugin
Automatic merge from submit-queue (batch tested with PRs 52047, 52063, 51528)
Improve dynamic kubelet config e2e node test and fix bugs
Rather than just changing the config once to see if dynamic kubelet
config at-least-sort-of-works, this extends the test to check that the
Kubelet reports the expected Node condition and the expected configuration
values after several possible state transitions.
Additionally, this adds a stress test that changes the configuration 100
times. It is possible for resource leaks across Kubelet restarts to
eventually prevent the Kubelet from restarting. For example, this test
revealed that cAdvisor's leaking journalctl processes (see:
https://github.com/google/cadvisor/issues/1725) could break dynamic
kubelet config. This test will help reveal these problems earlier.
This commit also makes better use of const strings and fixes a few bugs
that the new testing turned up.
Related issue: #50217
I had been sitting on this until the cAdvisor fix merged in #51751, as these tests fail without that fix.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Add pod preemption to the scheduler
**What this PR does / why we need it**:
This is the last of a series of PRs to add priority-based preemption to the scheduler. This PR connects the preemption logic to the scheduler workflow.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#48646
**Special notes for your reviewer**:
This PR includes other PRs which are under review (#50805, #50405, #50190). All the new code is located in 43627afdf9.
**Release note**:
```release-note
Add priority-based preemption to the scheduler.
```
ref/ #47604
/assign @davidopp
@kubernetes/sig-scheduling-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 51900, 51782, 52030)
apiservers: stratify versioned informer construction
The versioned share informer factory has been part of the GenericApiServer config,
but its construction depended on other fields of that config (e.g. the loopback
client config). Hence, the order of changes to the config mattered.
This PR stratifies this by moving the SharedInformerFactory from the generic Config
to the CompleteConfig struct. Hence, it is only filled during completion when it is
guaranteed that the loopback client config is set.
While doing this, the CompletedConfig construction is made more type-safe again,
i.e. the use of SkipCompletion() is considereably reduced. This is archieved by
splitting the derived apiserver Configs into the GenericConfig and the ExtraConfig
part. Then the completion is structural again because CompleteConfig is again
of the same structure: generic CompletedConfig and local completed ExtraConfig.
Fixes#50661.
Automatic merge from submit-queue
fix format of forbidden messages
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#51813
**Special notes for your reviewer**:
/assign @deads2k @liggitt
**Release note**:
```release-note
None
```
Automatic merge from submit-queue
Multiarch support for pets images
**What this PR does / why we need it**:
This PR is for multiarch support for pets image
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#52133
**Special notes for your reviewer**:
Copied over the `contrib/pets/peer-finder` as this one is heavily used in many docker images under `test/images`. After this PR I'll submit the PR in contrib project to remove it.
**Release note**:
```NONE
```
Automatic merge from submit-queue
Pipe in upgrade image target for kube-proxy migration tests
**What this PR does / why we need it**:
https://k8s-testgrid.appspot.com/sig-network#gci-gce-latest-upgrade-kube-proxy-ds&width=20
and
https://k8s-testgrid.appspot.com/sig-network#gci-gce-latest-downgrade-kube-proxy-ds&width=20
are still failing.
Reproduced it locally and found node image is being default to debian during upgrade (it was gci before upgrade) because we don't pass in `gci` via `--upgrade--target`. And for some reasons (haven't figured out yet), the upgraded node uses debian image with gci startupscripts...
This PR pipes in `--upgrade-target` for kube-proxy migration tests, hopefully in conjunction with https://github.com/kubernetes/test-infra/pull/4447 it will bring the tests back to normal.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #NONE
**Special notes for your reviewer**:
Sorry for bothering again.
/assign @krousey
**Release note**:
```release-note
NONE
```
Rather than just changing the config once to see if dynamic kubelet
config at-least-sort-of-works, this extends the test to check that the
Kubelet reports the expected Node condition and the expected configuration
values after several possible state transitions.
Additionally, this adds a stress test that changes the configuration 100
times. It is possible for resource leaks across Kubelet restarts to
eventually prevent the Kubelet from restarting. For example, this test
revealed that cAdvisor's leaking journalctl processes (see:
https://github.com/google/cadvisor/issues/1725) could break dynamic
kubelet config. This test will help reveal these problems earlier.
This commit also makes better use of const strings and fixes a few bugs
that the new testing turned up.
Related issue: #50217
Automatic merge from submit-queue (batch tested with PRs 52097, 52054)
Move paused deployment e2e tests to integration
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: xref #52113
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
StatefulSet: Deflake e2e RunHostCmd.
The initial retry up to 20s was giving up too soon. I'm seeing this test flake because the Node rebooted and it takes ~2min to recover. Now StatefulSet RunHostCmd calls will use the same 5min timeout as with other Pod state checks.
ref #48031
Automatic merge from submit-queue (batch tested with PRs 51728, 49202)
Enable CRI-O stats from cAdvisor
**What this PR does / why we need it**:
cAdvisor may support multiple container runtimes (docker, rkt, cri-o, systemd, etc.)
As long as the kubelet continues to run cAdvisor, runtimes with native cAdvisor support may not want to run multiple monitoring agents to avoid performance regression in production. Pending kubelet running a more light-weight monitoring solution, this PR allows remote runtimes to have their stats pulled from cAdvisor when cAdvisor is registered stats provider by introspection of the runtime endpoint.
See issue https://github.com/kubernetes/kubernetes/issues/51798
**Special notes for your reviewer**:
cAdvisor will be bumped to pick up https://github.com/google/cadvisor/pull/1741
At that time, CRI-O will support fetching stats from cAdvisor.
**Release note**:
```release-note
NONE
```
The initial retry up to 20s was giving up too soon.
I'm seeing this test flake because the Node rebooted and it takes ~2min
to recover.
Now StatefulSet RunHostCmd calls will use the same 5min timeout as with
other Pod state checks.
Automatic merge from submit-queue (batch tested with PRs 51956, 50708)
Move autoscaling/v2 from alpha1 to beta1
This graduates autoscaling/v2alpha1 to autoscaling/v2beta1. The move is more-or-less just a straightforward rename.
Part of kubernetes/features#117
```release-note
v2 of the autoscaling API group, including improvements to the HorizontalPodAutoscaler, has moved from alpha1 to beta1.
```
Automatic merge from submit-queue (batch tested with PRs 51839, 51987)
Disable rbac/v1alpha1, settings/v1alpha1, and scheduling/v1alpha1 by default
**What this PR does / why we need it**: Disables alpha features which were previously enabled by default. Also changes tests which relied on these alpha features being enabled by default.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#47691
**Special notes for your reviewer**:
**Release note**:
```release-note
Fixed a bug where some alpha features were enabled by default.
The feature is still Alpha and at times, the IP address previously used
by the load balancer in the test will not completely freed even after
the load balancer is long gone. In this case, the test URL with the IP
would return a 404 response. Tolerate this error and retry until the new
load balancer is fully established.
Charge object count when object is created, no matter if the object is
initialized or not.
Charge the remaining quota when the object is initialized.
Also, checking initializer.Pending and initializer.Result when
determining if an object is initialized. We didn't need to check them
because before 51082, having 0 pending initializer and nil
initializers.Result is invalid.
Automatic merge from submit-queue (batch tested with PRs 51733, 51838)
Decouple kube-proxy upgrade/downgrade tests from upgradeTests
**What this PR does / why we need it**:
Fixes the failing kube-proxy migration CI jobs:
- https://k8s-testgrid.appspot.com/sig-network#gci-gce-latest-upgrade-kube-proxy-ds
- https://k8s-testgrid.appspot.com/sig-network#gci-gce-latest-downgrade-kube-proxy-ds
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#51729
**Special notes for your reviewer**:
/assign @krousey @nicksardo
Could you please take a look post code-freeze (I believe it is fixing things)? Thanks!
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51733, 51838)
Relax update validation of uninitialized pod
Split from https://github.com/kubernetes/kubernetes/pull/50344
Fix https://github.com/kubernetes/kubernetes/issues/47837
* Let the podStrategy to only call `validation.ValidatePod()` if the old pod is not initialized, so fields are mutable.
* Let the podStatusStrategy refuse updates if the old pod is not initialized.
cc @smarterclayton
```release-note
Pod spec is mutable when the pod is uninitialized. The apiserver requires the pod spec to be valid even if it's uninitialized. Updating the status field of uninitialized pods is invalid.
```
Automatic merge from submit-queue (batch tested with PRs 51984, 51351, 51873, 51795, 51634)
Revert to using isolated PID namespaces in Docker
**What this PR does / why we need it**: Reverts to the previous docker default of using isolated PID namespaces for containers in a pod. There exist container images that expect always to be PID 1 which we want to support unmodified in 1.8.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#48937
**Special notes for your reviewer**:
**Release note**:
```release-note
Sharing a PID namespace between containers in a pod is disabled by default in 1.8. To enable for a node, use the --docker-disable-shared-pid=false kubelet flag. Note that PID namespace sharing requires docker >= 1.13.1.
```
Automatic merge from submit-queue (batch tested with PRs 51186, 50350, 51751, 51645, 51837)
Enabling aggregator functionality on kubemark, gce
Enabling full functionality aggregator functionality in kubemark tests.
This includes configuring it to work in gce (we seem to assume gce in our kubemark tests)
It also includes setting up the relevant security and auth config.
**What this PR does / why we need it**: Configure aggregator properly on kubemark tests.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#48428
**Special notes for your reviewer**:
**Release note**:
```release-note NONE
```
Enabling full functionality aggregator functionality in kubemark tests.
This includes configuring it to work in gce (we seem to assume gce in our kubemark tests)
It also includes setting up the relevant security and auth config.
Removing unneeded reference to CA key for MHBauer.
Fixed to pull the "parsed" values for the certs.
Fix from shyamjvs.
Automatic merge from submit-queue (batch tested with PRs 51915, 51294, 51562, 51911)
Remove OutOfDisk from controllers
This is one of the working items for #48843 for 1.8.
This changes the scheduler and daemonset controllers to no longer respect the OutOfDisk condition. The kubelet has not published OutOfDisk=True since 1.5.
This still preserves the Toleration for the OutOfDisk condition, as (I think?) this is required for backwards compatibility. I added TODOs to remove this in 1.10.
Automatic merge from submit-queue (batch tested with PRs 51833, 51936)
Changed volume IO e2e test to verify file hash instead of content.
**What this PR does / why we need it**: The existing way of verifying file content takes too much memory, causing processes to be OOM killed.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/kubernetes/kubernetes/issues/51717
**Release note**:
```release-note
NONE
```
/sig storage
/release-note-none
/assign @jeffvance @rootfs
/cc @msau42
Automatic merge from submit-queue (batch tested with PRs 51845, 51868, 51864)
Update sys spec to support docker 1.11-1.13 and overlay2.
Fixes https://github.com/kubernetes/kubernetes/issues/32536.
Update docker spec to:
1) Support overlay2;
2) Support docker version 1.11-1.13.
@dchen1107 @yguo0905 @luxas
/cc @kubernetes/sig-node-pr-reviews
```release-note
Kubernetes 1.8 supports docker version 1.11.x, 1.12.x and 1.13.x. And also supports overlay2.
```
Automatic merge from submit-queue
Enable batch/v1beta1.CronJobs by default
This PR re-applies the cronjobs->beta back (https://github.com/kubernetes/kubernetes/pull/51720) with the fix from @shyamjvs.
Fixes#51692
@apelisse @dchen1107 @smarterclayton ptal
@janetkuo @erictune fyi
Automatic merge from submit-queue
Remove deprecated init-container in annotations
fixes#50655fixes#51816closes#41004fixes#51816
Builds on #50654 and drops the initContainer annotations on conversion to prevent bypassing API server validation/security and targeting version-skewed kubelets that still honor the annotations
```release-note
The deprecated alpha and beta initContainer annotations are no longer supported. Init containers must be specified using the initContainers field in the pod spec.
```
Automatic merge from submit-queue
Job failure policy controller support
**What this PR does / why we need it**:
Start implementing the support of the "Backoff policy and failed pod limit" in the ```JobController``` defined in https://github.com/kubernetes/community/pull/583.
This PR depends on a previous PR #48075 that updates the K8s API types.
TODO:
* [X] Implement ```JobSpec.BackoffLimit``` support
* [x] Rebase when #48075 has been merged.
* [X] Implement end2end tests
implements https://github.com/kubernetes/community/pull/583
**Special notes for your reviewer**:
**Release note**:
```release-note
Add backoff policy and failed pod limit for a job
```
Automatic merge from submit-queue (batch tested with PRs 51805, 51725, 50925, 51474, 51638)
Flexvolume dynamic plugin discovery: Prober unit tests and basic e2e test.
**What this PR does / why we need it**: Tests for changes introduced in PR #50031 .
As part of the prober unit test, I mocked filesystem, filesystem watch, and Flexvolume plugin initialization.
Moved the filesystem event goroutine to watcher implementation.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#51147
**Special notes for your reviewer**:
First commit contains added functionality of the mock filesystem.
Second commit is the refactor for moving mock filesystem into a common util directory.
Third commit is the unit and e2e tests.
**Release note**:
```release-note
NONE
```
/release-note-none
/sig storage
/assign @saad-ali @liggitt
/cc @mtaufen @chakri-nelluri @wongma7
Automatic merge from submit-queue
bump QEMU version to v2.9.1
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
xref #38067
**Special notes for your reviewer**:
/assign @luxas
**Release note**:
```release-note
update QEMU version to v2.9.1
```
Job failure policy integration in JobController. From the
JobSpec.BackoffLimit the JobController will define the backoff
duration between Job retry.
It use the ```workqueue.RateLimitingInterface``` to store the number of
"retry" as "requeue" and the default Job backoff initial duration is set
during the initialization of the ```workqueue.RateLimiter.
Since the number of retry for each job is store in a local structure
"JobController.queue" if the JobController restarts the number of retries
will be lost and the backoff duration will be reset to 0.
Add e2e test for Job backoff failure policy
Automatic merge from submit-queue (batch tested with PRs 50602, 51561, 51703, 51748, 49142)
Use arm32v7|arm64v8 images instead of the deprecated armhf|aarch64 image organizations
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#50601
**Special notes for your reviewer**:
/assign @ixdy @jbeda @zmerlynn
**Release note**:
```release-note
Use arm32v7|arm64v8 images instead of the deprecated armhf|aarch64 image organizations
```
Automatic merge from submit-queue (batch tested with PRs 51583, 51283, 51374, 51690, 51716)
Unify initializer name validation
Unify the validation rules on initializer names. Fix https://github.com/kubernetes/kubernetes/issues/51843.
```release-note
Action required: validation rule on metadata.initializers.pending[x].name is tightened. The initializer name needs to contain at least three segments separated by dots. If you create objects with pending initializers, (i.e., not relying on apiserver adding pending initializers according to initializerconfiguration), you need to update the initializer name in existing objects and in configuration files to comply to the new validation rule.
```
Automatic merge from submit-queue (batch tested with PRs 50832, 51119, 51636, 48921, 51712)
add reconcile command to kubectl auth
This pull exposes the RBAC reconcile commands through `kubectl auth reconcile -f FILE`. When passed a file which contains RBAC roles, rolebindings, clusterroles, or clusterrolebindings, it will compute covers and add the missing rules.
The logic required to properly "apply" rbac permissions is more complicated that a json merge since you have to compute logical covers operations between rule sets. This means that we cannot use `kubectl apply` to update rbac roles without risking breaking old clients (like controllers).
To solve this problem, RBAC created reconcile functions to use during startup for "stock" roles. We want to offer this power to users who are running their own controllers and extension servers.
This is an intersection between @kubernetes/sig-auth-misc and @kubernetes/sig-cli-misc
Automatic merge from submit-queue (batch tested with PRs 51335, 51364, 51130, 48075, 50920)
[API] Feature/job failure policy
**What this PR does / why we need it**: Implements the Backoff policy and failed pod limit defined in https://github.com/kubernetes/community/pull/583
**Which issue this PR fixes**:
fixes#27997, fixes#30243
**Special notes for your reviewer**:
This is a WIP PR, I updated the api batchv1.JobSpec in order to prepare the backoff policy implementation in the JobController.
**Release note**:
```release-note
Add backoff policy and failed pod limit for a job
```
Automatic merge from submit-queue
Use the right image for the right platform in the e2e tests
**What this PR does / why we need it**:
This PR is for enabling kubernetes tests for multi architecture platform
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#38067
**Special notes for your reviewer**:
This will enable conformance tests for all the supported architectures.
**Release note**:
```release-note
Make all e2e tests lookup image to use from a centralized place. In that centralized place, add support for multiple platforms.
```
x-ref #38067
Automatic merge from submit-queue (batch tested with PRs 45724, 48051, 46444, 51056, 51605)
Add selfsubjectrulesreview in authorization
**What this PR does / why we need it**:
**Which issue this PR fixes**: fixes#47834#31292
**Special notes for your reviewer**:
**Release note**:
```release-note
Add selfsubjectrulesreview API for allowing users to query which permissions they have in a given namespace.
```
/cc @deads2k @liggitt
Automatic merge from submit-queue (batch tested with PRs 50381, 51307, 49645, 50995, 51523)
Address TestEtcdStoragePath flakes
- Wait for the master to be healthy
- Wait longer for the master to start
- Fail gracefully if starting the master panics
Signed-off-by: Monis Khan <mkhan@redhat.com>
```release-note
NONE
```
Fixes#49423
@kubernetes/sig-api-machinery-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 50381, 51307, 49645, 50995, 51523)
Remove deprecated and experimental fields from KubeletConfiguration
As we work towards providing a stable (v1) kubeletconfig API,
we cannot afford to have deprecated or "experimental" (alpha) fields
living in the KubeletConfiguration struct. This removes all existing
experimental or deprecated fields, and places them in KubeletFlags
instead.
I'm going to send another PR after this one that organizes the remaining
fields into substructures for readability. Then, we should try to move
to v1 ASAP (maybe not v1 in 1.8, given how close we are, but definitely in 1.9).
It makes far more sense to focus on a clean API in kubeletconfig v2,
than to try and further clean up the existing "API" that everyone
already depends on.
fixes: #51657
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51632, 51055, 51676, 51560, 50007)
test/e2e/auth: fix audit log test format parsing
Fixes https://github.com/kubernetes/kubernetes/issues/51556
```release-note
NONE
```
cc @CaoShuFeng
Still need to figure out how to run this test locally.
Automatic merge from submit-queue (batch tested with PRs 51628, 51637, 51490, 51279, 51302)
Fix pod local ephemeral storage usage calculation
We use podDiskUsage to calculate pod local ephemeral storage which is not correct, because podDiskUsage also contains HostPath volume which is considered as persistent storage
This pr fixes it
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#51489
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
/assign @jingxu97 @vishh
cc @ddysher
Automatic merge from submit-queue (batch tested with PRs 51513, 51515, 50570, 51482, 51448)
Add PVCRef to VolumeStats
**What this PR does / why we need it**:
For pod volumes that reference a PVC, add a PVCRef to the corresponding
volume stat. This allows metrics to be indexed/queried by PVC name
which is more user-friendly than Pod reference
**Which issue this PR fixes** : [#363](https://github.com/kubernetes/features/issues/363)
**Special notes for your reviewer**:
**Release note**:
```
`VolumeStats` reported by the kubelet stats summary API
(http://<node>:10255/stats/summary) now include a PVCRef
field describing the PVC referenced by the volume (if any).
```
Automatic merge from submit-queue (batch tested with PRs 51513, 51515, 50570, 51482, 51448)
Removes redundant prefix in cluster-lifecycle e2e test names
**What this PR does / why we need it**:
Removes redundant prefix in cluster-lifecycle e2e test names
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Umbrella issue #49161
xref: #50054
**Special notes for your reviewer**:
/cc @jbeda
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50719, 51216, 50212, 51408, 51381)
Make selector immutable for v1beta2 deployment, replicaset and daemonset prior update
**What this PR does / why we need it**:
This PR ensures controller selector is immutable for deployment and replicaset prior update by ignoring any change to `Spec`.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#50808
**Special notes for your reviewer**:
This will be a breaking change.
**Release note**:
```release-note
For Deployment, ReplicaSet, and DaemonSet, selectors are now immutable when updating via the new `apps/v1beta2` API. For backward compatibility, selectors can still be changed when updating via `apps/v1beta1` or `extensions/v1beta1`.
```
Automatic merge from submit-queue (batch tested with PRs 51707, 51662, 51723, 50163, 51633)
update GC controller to wait until controllers have been initialized …
fixes#51013
Alternative to https://github.com/kubernetes/kubernetes/pull/51492 which keeps those few controllers (only one) from starting the informers early.
Automatic merge from submit-queue (batch tested with PRs 51707, 51662, 51723, 50163, 51633)
Change SizeLimit to a pointer
This PR fixes issue #50121
```release-note
The `emptyDir.sizeLimit` field is now correctly omitted from API requests and responses when unset.
```
Automatic merge from submit-queue (batch tested with PRs 51707, 51662, 51723, 50163, 51633)
Adding vishh to test/ reviewers and approvers
Rationale: Reviewing/Shepherding lots of features/PRs around node and resource management.
Automatic merge from submit-queue (batch tested with PRs 50775, 51397, 51168, 51465, 51536)
Enable batch/v1beta1.CronJobs by default
This PR moves to CronJobs beta entirely, enabling `batch/v1beta1` by default.
Related issue: #41039
@erictune @janetkuo ptal
```release-note
Promote CronJobs to batch/v1beta1.
```
Automatic merge from submit-queue (batch tested with PRs 50775, 51397, 51168, 51465, 51536)
Allow bearer requests to be proxied by kubectl proxy
Use a fake transport to capture changes to the request and then surface
them back to the end user.
Fixes#50466
@liggitt no tests yet, but works locally
As we work towards providing a stable (v1) kubeletconfig API,
we cannot afford to have deprecated or "experimental" (alpha) fields
living in the KubeletConfiguration struct. This removes all existing
experimental or deprecated fields, and places them in KubeletFlags
instead.
I'm going to send another PR after this one that organizes the remaining
fields into substructures for readability. Then, we should try to move
to v1 ASAP.
It makes far more sense to focus on a clean API in kubeletconfig v2,
than to try and further clean up the existing "API" that everyone
already depends on.
Automatic merge from submit-queue
e2e: Add tests for network tiers in GCE
This test depends on #51301, which adds the new feature. Only the `e2e: Add tests for network tiers in GCE` commit is new.
#51301 should pass this new test.
Automatic merge from submit-queue (batch tested with PRs 51439, 51361, 51140, 51539, 51585)
Enable alpha GCE disk API
This PR builds on top of #50467 to allow the GCE disk API to use either the alpha or stable APIs.
CC @freehan
Automatic merge from submit-queue (batch tested with PRs 47054, 50398, 51541, 51535, 51545)
Switch away from gcloud deprecated flags in compute resource listings
**What is fixed**
Remove deprecated `gcloud compute` flags, see linked issue.
**Which issue this PR fixes**:
fixes#49673
**Special notes for your reviewer**:
The change in `gcloudComputeResourceList` in `test/e2e/framework/ingress_utils.go` isn't strictly needed as currently no affected resources are called on within that file, however the function has the _potential_ to access affected resources so I covered it as well. Happy to change if deemed unnecessary.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51228, 50185, 50940, 51544, 51543)
Add upgrades tests for kube-proxy daemonset migration path
**What this PR does / why we need it**:
From #23225, this is a part of setting up CIs to validate the kube-proxy migration path (static pods -> daemonset and reverse).
The other part of the works (adding real CIs that run these tests) will be in a separate PR against [kubernetes/test-infra](https://github.com/kubernetes/test-infra).
Though this is currently blocked by #50705.
**Special notes for your reviewer**:
cc @roberthbailey @pwittrock
**Release note**:
```release-note
NONE
```
For pod volumes that reference a PVC, add a PVCRef to the corresponding
volume stat. This allows metrics to be indexed/queried by PVC name
which is more user-friendly than Pod reference
Automatic merge from submit-queue (batch tested with PRs 51377, 46580, 50998, 51466, 49749)
Adding e2e SELinux test for local storage
Adding e2e test for SELinux enabled local storage
/sig storage
Closes#45054
Automatic merge from submit-queue (batch tested with PRs 51377, 46580, 50998, 51466, 49749)
Use the pre-built docker binaries on Ubuntu for benchmark tests
- Tested manually.
- The `ubuntu-init-docker.yaml` is copied from `cos-init-docker.yaml` with the following changes needed by Ubuntu. This change is temporary -- we will remove the script and the tests once we know the performance of using the pre-built Docker 1.12 on Ubuntu.
```
71,72c71,72
< mount --bind "${install_location}"/docker-containerd /usr/bin/docker-containerd
< mount --bind "${install_location}"/docker-containerd-shim /usr/bin/docker-containerd-shim
---
> mount --bind "${install_location}"/docker-containerd /usr/bin/containerd
> mount --bind "${install_location}"/docker-containerd-shim /usr/bin/containerd-shim
75c75
< mount --bind "${install_location}"/docker-runc /usr/bin/docker-runc
---
> mount --bind "${install_location}"/docker-runc /usr/sbin/runc
88c88
< local requested_version="$(get_metadata "gci-docker-version")"
---
> local requested_version="$(get_metadata "ubuntu-docker-version")"
93,98d92
< # Check if we have the requested version installed.
< if check_installed /usr/bin/docker "${requested_version}"; then
< echo "Requested version already installed. Exiting."
< exit 0
< fi
<
100c94
< /usr/bin/systemctl stop docker
---
> systemctl stop docker
106c100
< /usr/bin/systemctl start docker && exit $rc
---
> systemctl start docker && exit $rc
```
- Updated all tests to use the latest Ubuntu image.
**Release note**:
```
None
```
/assign @Random-Liu
Automatic merge from submit-queue (batch tested with PRs 49961, 50005, 50738, 51045, 49927)
Add cluster e2es to verify scheduler local storage support
Add cluster e2es to verify scheduler local storage support and remove some unused private functions
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
part of #50818
**Release note**:
```release-note
Add cluster e2es to verify scheduler local ephemeral storage support
```
/assign @jingxu97
/cc @ddysher
Automatic merge from submit-queue (batch tested with PRs 44719, 48454)
check job ActiveDeadlineSeconds
**What this PR does / why we need it**:
enqueue a sync task after ActiveDeadlineSeconds
**Which issue this PR fixes** *:
fixes#32149
**Special notes for your reviewer**:
**Release note**:
```release-note
enqueue a sync task to wake up jobcontroller to check job ActiveDeadlineSeconds in time
```
Automatic merge from submit-queue
Added an end-to-end test ensuring that Cluster Autoscaler does not scale up when all pending pods are unschedulable
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50919, 51410, 50099, 51300, 50296)
GCE: Read networkProjectID param
Fixes#48515
/assign bowei
The first commit is the original PR cherrypicked. The master's kubelet isn't provided a cloud config path, so the project is retrieved via instance metadata. In the GKE case, this project cannot be retrieved by the master and caused an error.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51471, 50561, 50435, 51473, 51436)
Feature gate initializers field
The metadata.initializers field should be feature gated and disabled by default while in alpha, especially since enforcement of initializer permission that keeps users from submitting objects with their own initializers specified is done via an admission plugin most clusters do not enable yet.
Not gating the field and tests caused tests added in https://github.com/kubernetes/kubernetes/issues/51429 to fail on clusters that don't enable the admission plugin.
This PR:
* adds an `Initializers` feature gate, auto-enables the feature gate if the admission plugin is enabled
* clears the `metadata.initializers` field of objects on create/update if the feature gate is not set
* marks the e2e tests as feature-dependent (will follow up with PR to test-infra to enable the feature and opt in for GCE e2e tests)
```release-note
Use of the alpha initializers feature now requires enabling the `Initializers` feature gate. This feature gate is auto-enabled if the `Initialzers` admission plugin is enabled.
```
Automatic merge from submit-queue
Moved node condition filter into a predicates.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#50360
**Release note**:
```release-note
A new predicates, named 'CheckNodeCondition', was added to replace node condition filter. 'NetworkUnavailable', 'OutOfDisk' and 'NotReady' maybe reported as a reason when failed to schedule pods.
```
Automatic merge from submit-queue (batch tested with PRs 50953, 51082)
Fix mergekey of initializers; Repair invalid update of initializers
Fix https://github.com/kubernetes/kubernetes/issues/51131
The PR did two things to make parallel patching `metadata.initializers.pending` possible:
* Add mergekey to initializers.pending
* Let the initializer admission plugin set the `metadata.intializers` to nil if an update makes the `pending` and the `result` both nil, instead of returning a validation error. Otherwise if multiple initializer controllers sending the patch removing themselves from `pending` at the same time, one of them will get a validation error.
```release-note
The patch to remove the last initializer from metadata.initializer.pending will result in metadata.initializer to be set to nil (assuming metadata.initializer.result is also nil), instead of resulting in an validation error.
```
Automatic merge from submit-queue
Fix forbidden message format
Before this change:
$ kubectl get pods --as=tom
Error from server (Forbidden): pods "" is forbidden: User "tom" cannot list pods in the namespace "default".
After this change:
$ kubectl get pods --as=tom
Error from server (Forbidden): pods is forbidden: User "tom" cannot list pods in the namespace "default".
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```
Fix forbidden message format, remove extra ""
```
Automatic merge from submit-queue
Let the quota evaluator handle mutating specs of pod & pvc
### Background
The final goal is to address https://github.com/kubernetes/kubernetes/issues/47837, which aims to allow more mutation for uninitialized objects.
To do that, we [decided](https://github.com/kubernetes/kubernetes/issues/47837#issuecomment-321462433) to let the admission controllers to handle mutation of uninitialized objects.
### Issue
#50399 attempted to fix all admission controllers so that can handle mutating uninitialized objects. It was incomplete. I didn't realize although the resourcequota admission plugin handles the update operation, the underlying evaluator didn't. This PR updated the evaluators to handle updates of uninitialized pods/pvc.
### TODO
We still miss another piece. The [quota replenish controller](https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/resourcequota/replenishment_controller.go) uses the sharedinformer, which doesn't observe the deletion of uninitialized pods at the moment. So there is a quota leak if a pod is deleted before it's initialized. It will be addressed with https://github.com/kubernetes/kubernetes/issues/48893.
Automatic merge from submit-queue
Make coreos test images sshd not allow password login.
This will prevent security scanners from triggering.
Configuration is verbatim from:
https://coreos.com/os/docs/latest/customizing-sshd.html
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51054, 51101, 50031, 51296, 51173)
Dynamic Flexvolume plugin discovery, probing with filesystem watch.
**What this PR does / why we need it**: Enables dynamic Flexvolume plugin discovery. This model uses a filesystem watch (fsnotify library), which notifies the system that a probe is necessary only if something changes in the Flexvolume plugin directory.
This PR uses the dependency injection model in https://github.com/kubernetes/kubernetes/pull/49668.
**Release Note**:
```release-note
Dynamic Flexvolume plugin discovery. Flexvolume plugins can now be discovered on the fly rather than only at system initialization time.
```
/sig-storage
/assign @jsafrane @saad-ali
/cc @bassam @chakri-nelluri @kokhang @liggitt @thockin
Automatic merge from submit-queue (batch tested with PRs 50889, 51347, 50582, 51297, 51264)
Change eviction manager to manage one single local storage resource
**What this PR does / why we need it**:
We decided to manage one single resource name, eviction policy should be modified too.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: part of #50818
**Special notes for your reviewer**:
**Release note**:
```release-note
Change eviction manager to manage one single local ephemeral storage resource
```
/assign @jingxu97
Before this change:
# kubectl get pods --as=tom
Error from server (Forbidden): pods "" is forbidden: User "tom" cannot list pods in the namespace "default".
After this change:
# kubectl get pods --as=tom
Error from server (Forbidden): pods is forbidden: User "tom" cannot list pods in the namespace "default".
Automatic merge from submit-queue
Fixed gke auth update wait condition.
Lookup whoami on gke using gcloud auth list.
Make sure we do not run the test on any cluster older than 1.7.
**What this PR does / why we need it**: Fixes issue with aggregator e2e test on GKE
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#50945
**Special notes for your reviewer**: There is a TODO, follow up will be provided when the immediate problem is resolved.
**Release note**: ```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51134, 51122, 50562, 50971, 51327)
Made the tests ensure that Cluster Autoscaler is on before running.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Configuration is based on:
https://coreos.com/os/docs/latest/customizing-sshd.html
The specific SSHD config is:
# Use most defaults for sshd configuration.
UsePrivilegeSeparation sandbox
Subsystem sftp internal-sftp
ClientAliveInterval 180
UseDNS no
UsePAM yes
PrintLastLog no # handled by PAM
PrintMotd no # handled by PAM
AuthenticationMethods publickey
This will prevent security scanners from triggering.
Automatic merge from submit-queue
AllowedNotReadyNodes allowed to be not ready for absolutely *any* reason
It's as good as we allow those many nodes to be not part of the cluster at all, ever.
Btw - currently our 5k-node correctness test fails if "kubelet stopped posting node status" or "route not created", etc (ref: https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scale-correctness/3/build-log.txt)
cc @kubernetes/sig-scalability-misc
Automatic merge from submit-queue (batch tested with PRs 51244, 50559, 49770, 51194, 50901)
Distribute pods efficiently in CA scalability tests
**What this PR does / why we need it**:
Instead of using runReplicatedPodOnEachNode method
which is suited to a small number of nodes,
distribute pods on the nodes with desired load
using RCs that eat up all the space we want to be
empty after distribution.
**Release note**:
```
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50213, 50707, 49502, 51230, 50848)
StatefulSet: Deflake e2e `kubectl exec` commands.
This may help with another source of flakiness found while investigating #48031.
We seem to get a lot of flakes due to "connection refused" while running `kubectl exec`. I can't find any reason this would be caused by the test flow, so I'm adding retries to see if that helps.
Automatic merge from submit-queue (batch tested with PRs 51224, 51191, 51158, 50669, 51222)
Enable overlay2 on cos-m60 in node e2e tests
Ref: https://github.com/kubernetes/kubernetes/issues/42926
- Restart docker with `-s overlay2` in cloud-init before running all node e2e tests. I have to copy the systemd unit file to `/etc/systemd/system` because the `/usr/lib/systemd/system/` is read only.
- Updated node e2e tests to use the new cos-m60 image.
- The name of the cloud init file (`cos-init-live-restore.yaml`) does not indicate overlay2 will be enabled, but I can't just change the name in this PR, since it's referenced in test-infra.
**Release note**:
```
None
```
/assign @Random-Liu
Automatic merge from submit-queue (batch tested with PRs 51224, 51191, 51158, 50669, 51222)
StatefulSet: Deflake e2e "restart" phase.
This addresses another source of flakiness found while investigating #48031.
The test used to scale the StatefulSet down to 0, wait for ListPods to return 0 matching Pods, and then scale the StatefulSet back up.
This was prone to a race in which StatefulSet was told to scale back up before it had observed its own deletion of the last Pod, as evidenced by logs showing the creation of Pod ss-1 prior to the creation of the replacement Pod ss-0.
Instead, we now wait for the controller to observe all deletions before scaling it back up. This should fix flakes of the form:
```
Too many pods scheduled, expected 1 got 2
```
We seem to get a lot of flakes due to "connection refused" while running
`kubectl exec`. I can't find any reason this would be caused by the test
flow, so I'm adding retries to see if that helps.
Instead of using runReplicatedPodOnEachNode method
which is suited to a small number of nodes,
distribute pods on the nodes with desired load
using RCs that eat up all the space we want to be
empty after distribution.
Automatic merge from submit-queue (batch tested with PRs 51193, 51154, 42689, 51189, 51200)
Re-enable OIR e2e tests.
Re-enabling test skeleton for opaque integer resources originally submitted as part of #41870. The e2e was disabled since it was flaky. This is the first step toward re-enabling them. Currently all cases are skipped, so this exercises only the BeforeEach behavior and the deferred removal of OIRs from a node.
cc @timothysc