Automatic merge from submit-queue (batch tested with PRs 51984, 51351, 51873, 51795, 51634)
Revert to using isolated PID namespaces in Docker
**What this PR does / why we need it**: Reverts to the previous docker default of using isolated PID namespaces for containers in a pod. There exist container images that expect always to be PID 1 which we want to support unmodified in 1.8.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#48937
**Special notes for your reviewer**:
**Release note**:
```release-note
Sharing a PID namespace between containers in a pod is disabled by default in 1.8. To enable for a node, use the --docker-disable-shared-pid=false kubelet flag. Note that PID namespace sharing requires docker >= 1.13.1.
```
Automatic merge from submit-queue (batch tested with PRs 51186, 50350, 51751, 51645, 51837)
Enabling aggregator functionality on kubemark, gce
Enabling full functionality aggregator functionality in kubemark tests.
This includes configuring it to work in gce (we seem to assume gce in our kubemark tests)
It also includes setting up the relevant security and auth config.
**What this PR does / why we need it**: Configure aggregator properly on kubemark tests.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#48428
**Special notes for your reviewer**:
**Release note**:
```release-note NONE
```
Enabling full functionality aggregator functionality in kubemark tests.
This includes configuring it to work in gce (we seem to assume gce in our kubemark tests)
It also includes setting up the relevant security and auth config.
Removing unneeded reference to CA key for MHBauer.
Fixed to pull the "parsed" values for the certs.
Fix from shyamjvs.
Automatic merge from submit-queue (batch tested with PRs 51915, 51294, 51562, 51911)
Remove OutOfDisk from controllers
This is one of the working items for #48843 for 1.8.
This changes the scheduler and daemonset controllers to no longer respect the OutOfDisk condition. The kubelet has not published OutOfDisk=True since 1.5.
This still preserves the Toleration for the OutOfDisk condition, as (I think?) this is required for backwards compatibility. I added TODOs to remove this in 1.10.
Automatic merge from submit-queue (batch tested with PRs 51833, 51936)
Changed volume IO e2e test to verify file hash instead of content.
**What this PR does / why we need it**: The existing way of verifying file content takes too much memory, causing processes to be OOM killed.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/kubernetes/kubernetes/issues/51717
**Release note**:
```release-note
NONE
```
/sig storage
/release-note-none
/assign @jeffvance @rootfs
/cc @msau42
Automatic merge from submit-queue (batch tested with PRs 51845, 51868, 51864)
Update sys spec to support docker 1.11-1.13 and overlay2.
Fixes https://github.com/kubernetes/kubernetes/issues/32536.
Update docker spec to:
1) Support overlay2;
2) Support docker version 1.11-1.13.
@dchen1107 @yguo0905 @luxas
/cc @kubernetes/sig-node-pr-reviews
```release-note
Kubernetes 1.8 supports docker version 1.11.x, 1.12.x and 1.13.x. And also supports overlay2.
```
Automatic merge from submit-queue
Enable batch/v1beta1.CronJobs by default
This PR re-applies the cronjobs->beta back (https://github.com/kubernetes/kubernetes/pull/51720) with the fix from @shyamjvs.
Fixes#51692
@apelisse @dchen1107 @smarterclayton ptal
@janetkuo @erictune fyi
Automatic merge from submit-queue
Remove deprecated init-container in annotations
fixes#50655fixes#51816closes#41004fixes#51816
Builds on #50654 and drops the initContainer annotations on conversion to prevent bypassing API server validation/security and targeting version-skewed kubelets that still honor the annotations
```release-note
The deprecated alpha and beta initContainer annotations are no longer supported. Init containers must be specified using the initContainers field in the pod spec.
```
Automatic merge from submit-queue
Job failure policy controller support
**What this PR does / why we need it**:
Start implementing the support of the "Backoff policy and failed pod limit" in the ```JobController``` defined in https://github.com/kubernetes/community/pull/583.
This PR depends on a previous PR #48075 that updates the K8s API types.
TODO:
* [X] Implement ```JobSpec.BackoffLimit``` support
* [x] Rebase when #48075 has been merged.
* [X] Implement end2end tests
implements https://github.com/kubernetes/community/pull/583
**Special notes for your reviewer**:
**Release note**:
```release-note
Add backoff policy and failed pod limit for a job
```
Automatic merge from submit-queue (batch tested with PRs 51805, 51725, 50925, 51474, 51638)
Flexvolume dynamic plugin discovery: Prober unit tests and basic e2e test.
**What this PR does / why we need it**: Tests for changes introduced in PR #50031 .
As part of the prober unit test, I mocked filesystem, filesystem watch, and Flexvolume plugin initialization.
Moved the filesystem event goroutine to watcher implementation.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#51147
**Special notes for your reviewer**:
First commit contains added functionality of the mock filesystem.
Second commit is the refactor for moving mock filesystem into a common util directory.
Third commit is the unit and e2e tests.
**Release note**:
```release-note
NONE
```
/release-note-none
/sig storage
/assign @saad-ali @liggitt
/cc @mtaufen @chakri-nelluri @wongma7
Automatic merge from submit-queue
bump QEMU version to v2.9.1
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
xref #38067
**Special notes for your reviewer**:
/assign @luxas
**Release note**:
```release-note
update QEMU version to v2.9.1
```
Job failure policy integration in JobController. From the
JobSpec.BackoffLimit the JobController will define the backoff
duration between Job retry.
It use the ```workqueue.RateLimitingInterface``` to store the number of
"retry" as "requeue" and the default Job backoff initial duration is set
during the initialization of the ```workqueue.RateLimiter.
Since the number of retry for each job is store in a local structure
"JobController.queue" if the JobController restarts the number of retries
will be lost and the backoff duration will be reset to 0.
Add e2e test for Job backoff failure policy
Automatic merge from submit-queue (batch tested with PRs 50602, 51561, 51703, 51748, 49142)
Use arm32v7|arm64v8 images instead of the deprecated armhf|aarch64 image organizations
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#50601
**Special notes for your reviewer**:
/assign @ixdy @jbeda @zmerlynn
**Release note**:
```release-note
Use arm32v7|arm64v8 images instead of the deprecated armhf|aarch64 image organizations
```
Automatic merge from submit-queue (batch tested with PRs 51583, 51283, 51374, 51690, 51716)
Unify initializer name validation
Unify the validation rules on initializer names. Fix https://github.com/kubernetes/kubernetes/issues/51843.
```release-note
Action required: validation rule on metadata.initializers.pending[x].name is tightened. The initializer name needs to contain at least three segments separated by dots. If you create objects with pending initializers, (i.e., not relying on apiserver adding pending initializers according to initializerconfiguration), you need to update the initializer name in existing objects and in configuration files to comply to the new validation rule.
```
Automatic merge from submit-queue (batch tested with PRs 50832, 51119, 51636, 48921, 51712)
add reconcile command to kubectl auth
This pull exposes the RBAC reconcile commands through `kubectl auth reconcile -f FILE`. When passed a file which contains RBAC roles, rolebindings, clusterroles, or clusterrolebindings, it will compute covers and add the missing rules.
The logic required to properly "apply" rbac permissions is more complicated that a json merge since you have to compute logical covers operations between rule sets. This means that we cannot use `kubectl apply` to update rbac roles without risking breaking old clients (like controllers).
To solve this problem, RBAC created reconcile functions to use during startup for "stock" roles. We want to offer this power to users who are running their own controllers and extension servers.
This is an intersection between @kubernetes/sig-auth-misc and @kubernetes/sig-cli-misc
Automatic merge from submit-queue (batch tested with PRs 51335, 51364, 51130, 48075, 50920)
[API] Feature/job failure policy
**What this PR does / why we need it**: Implements the Backoff policy and failed pod limit defined in https://github.com/kubernetes/community/pull/583
**Which issue this PR fixes**:
fixes#27997, fixes#30243
**Special notes for your reviewer**:
This is a WIP PR, I updated the api batchv1.JobSpec in order to prepare the backoff policy implementation in the JobController.
**Release note**:
```release-note
Add backoff policy and failed pod limit for a job
```
Automatic merge from submit-queue
Use the right image for the right platform in the e2e tests
**What this PR does / why we need it**:
This PR is for enabling kubernetes tests for multi architecture platform
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#38067
**Special notes for your reviewer**:
This will enable conformance tests for all the supported architectures.
**Release note**:
```release-note
Make all e2e tests lookup image to use from a centralized place. In that centralized place, add support for multiple platforms.
```
x-ref #38067
Automatic merge from submit-queue (batch tested with PRs 45724, 48051, 46444, 51056, 51605)
Add selfsubjectrulesreview in authorization
**What this PR does / why we need it**:
**Which issue this PR fixes**: fixes#47834#31292
**Special notes for your reviewer**:
**Release note**:
```release-note
Add selfsubjectrulesreview API for allowing users to query which permissions they have in a given namespace.
```
/cc @deads2k @liggitt
Automatic merge from submit-queue (batch tested with PRs 50381, 51307, 49645, 50995, 51523)
Address TestEtcdStoragePath flakes
- Wait for the master to be healthy
- Wait longer for the master to start
- Fail gracefully if starting the master panics
Signed-off-by: Monis Khan <mkhan@redhat.com>
```release-note
NONE
```
Fixes#49423
@kubernetes/sig-api-machinery-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 50381, 51307, 49645, 50995, 51523)
Remove deprecated and experimental fields from KubeletConfiguration
As we work towards providing a stable (v1) kubeletconfig API,
we cannot afford to have deprecated or "experimental" (alpha) fields
living in the KubeletConfiguration struct. This removes all existing
experimental or deprecated fields, and places them in KubeletFlags
instead.
I'm going to send another PR after this one that organizes the remaining
fields into substructures for readability. Then, we should try to move
to v1 ASAP (maybe not v1 in 1.8, given how close we are, but definitely in 1.9).
It makes far more sense to focus on a clean API in kubeletconfig v2,
than to try and further clean up the existing "API" that everyone
already depends on.
fixes: #51657
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51632, 51055, 51676, 51560, 50007)
test/e2e/auth: fix audit log test format parsing
Fixes https://github.com/kubernetes/kubernetes/issues/51556
```release-note
NONE
```
cc @CaoShuFeng
Still need to figure out how to run this test locally.
Automatic merge from submit-queue (batch tested with PRs 51628, 51637, 51490, 51279, 51302)
Fix pod local ephemeral storage usage calculation
We use podDiskUsage to calculate pod local ephemeral storage which is not correct, because podDiskUsage also contains HostPath volume which is considered as persistent storage
This pr fixes it
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#51489
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
/assign @jingxu97 @vishh
cc @ddysher
Automatic merge from submit-queue (batch tested with PRs 51513, 51515, 50570, 51482, 51448)
Add PVCRef to VolumeStats
**What this PR does / why we need it**:
For pod volumes that reference a PVC, add a PVCRef to the corresponding
volume stat. This allows metrics to be indexed/queried by PVC name
which is more user-friendly than Pod reference
**Which issue this PR fixes** : [#363](https://github.com/kubernetes/features/issues/363)
**Special notes for your reviewer**:
**Release note**:
```
`VolumeStats` reported by the kubelet stats summary API
(http://<node>:10255/stats/summary) now include a PVCRef
field describing the PVC referenced by the volume (if any).
```
Automatic merge from submit-queue (batch tested with PRs 51513, 51515, 50570, 51482, 51448)
Removes redundant prefix in cluster-lifecycle e2e test names
**What this PR does / why we need it**:
Removes redundant prefix in cluster-lifecycle e2e test names
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Umbrella issue #49161
xref: #50054
**Special notes for your reviewer**:
/cc @jbeda
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50719, 51216, 50212, 51408, 51381)
Make selector immutable for v1beta2 deployment, replicaset and daemonset prior update
**What this PR does / why we need it**:
This PR ensures controller selector is immutable for deployment and replicaset prior update by ignoring any change to `Spec`.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#50808
**Special notes for your reviewer**:
This will be a breaking change.
**Release note**:
```release-note
For Deployment, ReplicaSet, and DaemonSet, selectors are now immutable when updating via the new `apps/v1beta2` API. For backward compatibility, selectors can still be changed when updating via `apps/v1beta1` or `extensions/v1beta1`.
```
Automatic merge from submit-queue (batch tested with PRs 51707, 51662, 51723, 50163, 51633)
update GC controller to wait until controllers have been initialized …
fixes#51013
Alternative to https://github.com/kubernetes/kubernetes/pull/51492 which keeps those few controllers (only one) from starting the informers early.
Automatic merge from submit-queue (batch tested with PRs 51707, 51662, 51723, 50163, 51633)
Change SizeLimit to a pointer
This PR fixes issue #50121
```release-note
The `emptyDir.sizeLimit` field is now correctly omitted from API requests and responses when unset.
```
Automatic merge from submit-queue (batch tested with PRs 51707, 51662, 51723, 50163, 51633)
Adding vishh to test/ reviewers and approvers
Rationale: Reviewing/Shepherding lots of features/PRs around node and resource management.
Automatic merge from submit-queue (batch tested with PRs 50775, 51397, 51168, 51465, 51536)
Enable batch/v1beta1.CronJobs by default
This PR moves to CronJobs beta entirely, enabling `batch/v1beta1` by default.
Related issue: #41039
@erictune @janetkuo ptal
```release-note
Promote CronJobs to batch/v1beta1.
```
Automatic merge from submit-queue (batch tested with PRs 50775, 51397, 51168, 51465, 51536)
Allow bearer requests to be proxied by kubectl proxy
Use a fake transport to capture changes to the request and then surface
them back to the end user.
Fixes#50466
@liggitt no tests yet, but works locally
As we work towards providing a stable (v1) kubeletconfig API,
we cannot afford to have deprecated or "experimental" (alpha) fields
living in the KubeletConfiguration struct. This removes all existing
experimental or deprecated fields, and places them in KubeletFlags
instead.
I'm going to send another PR after this one that organizes the remaining
fields into substructures for readability. Then, we should try to move
to v1 ASAP.
It makes far more sense to focus on a clean API in kubeletconfig v2,
than to try and further clean up the existing "API" that everyone
already depends on.
Automatic merge from submit-queue
e2e: Add tests for network tiers in GCE
This test depends on #51301, which adds the new feature. Only the `e2e: Add tests for network tiers in GCE` commit is new.
#51301 should pass this new test.
Automatic merge from submit-queue (batch tested with PRs 51439, 51361, 51140, 51539, 51585)
Enable alpha GCE disk API
This PR builds on top of #50467 to allow the GCE disk API to use either the alpha or stable APIs.
CC @freehan
Automatic merge from submit-queue (batch tested with PRs 47054, 50398, 51541, 51535, 51545)
Switch away from gcloud deprecated flags in compute resource listings
**What is fixed**
Remove deprecated `gcloud compute` flags, see linked issue.
**Which issue this PR fixes**:
fixes#49673
**Special notes for your reviewer**:
The change in `gcloudComputeResourceList` in `test/e2e/framework/ingress_utils.go` isn't strictly needed as currently no affected resources are called on within that file, however the function has the _potential_ to access affected resources so I covered it as well. Happy to change if deemed unnecessary.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51228, 50185, 50940, 51544, 51543)
Add upgrades tests for kube-proxy daemonset migration path
**What this PR does / why we need it**:
From #23225, this is a part of setting up CIs to validate the kube-proxy migration path (static pods -> daemonset and reverse).
The other part of the works (adding real CIs that run these tests) will be in a separate PR against [kubernetes/test-infra](https://github.com/kubernetes/test-infra).
Though this is currently blocked by #50705.
**Special notes for your reviewer**:
cc @roberthbailey @pwittrock
**Release note**:
```release-note
NONE
```
For pod volumes that reference a PVC, add a PVCRef to the corresponding
volume stat. This allows metrics to be indexed/queried by PVC name
which is more user-friendly than Pod reference
Automatic merge from submit-queue (batch tested with PRs 51377, 46580, 50998, 51466, 49749)
Adding e2e SELinux test for local storage
Adding e2e test for SELinux enabled local storage
/sig storage
Closes#45054
Automatic merge from submit-queue (batch tested with PRs 51377, 46580, 50998, 51466, 49749)
Use the pre-built docker binaries on Ubuntu for benchmark tests
- Tested manually.
- The `ubuntu-init-docker.yaml` is copied from `cos-init-docker.yaml` with the following changes needed by Ubuntu. This change is temporary -- we will remove the script and the tests once we know the performance of using the pre-built Docker 1.12 on Ubuntu.
```
71,72c71,72
< mount --bind "${install_location}"/docker-containerd /usr/bin/docker-containerd
< mount --bind "${install_location}"/docker-containerd-shim /usr/bin/docker-containerd-shim
---
> mount --bind "${install_location}"/docker-containerd /usr/bin/containerd
> mount --bind "${install_location}"/docker-containerd-shim /usr/bin/containerd-shim
75c75
< mount --bind "${install_location}"/docker-runc /usr/bin/docker-runc
---
> mount --bind "${install_location}"/docker-runc /usr/sbin/runc
88c88
< local requested_version="$(get_metadata "gci-docker-version")"
---
> local requested_version="$(get_metadata "ubuntu-docker-version")"
93,98d92
< # Check if we have the requested version installed.
< if check_installed /usr/bin/docker "${requested_version}"; then
< echo "Requested version already installed. Exiting."
< exit 0
< fi
<
100c94
< /usr/bin/systemctl stop docker
---
> systemctl stop docker
106c100
< /usr/bin/systemctl start docker && exit $rc
---
> systemctl start docker && exit $rc
```
- Updated all tests to use the latest Ubuntu image.
**Release note**:
```
None
```
/assign @Random-Liu
Automatic merge from submit-queue (batch tested with PRs 49961, 50005, 50738, 51045, 49927)
Add cluster e2es to verify scheduler local storage support
Add cluster e2es to verify scheduler local storage support and remove some unused private functions
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
part of #50818
**Release note**:
```release-note
Add cluster e2es to verify scheduler local ephemeral storage support
```
/assign @jingxu97
/cc @ddysher
Automatic merge from submit-queue (batch tested with PRs 44719, 48454)
check job ActiveDeadlineSeconds
**What this PR does / why we need it**:
enqueue a sync task after ActiveDeadlineSeconds
**Which issue this PR fixes** *:
fixes#32149
**Special notes for your reviewer**:
**Release note**:
```release-note
enqueue a sync task to wake up jobcontroller to check job ActiveDeadlineSeconds in time
```
Automatic merge from submit-queue
Added an end-to-end test ensuring that Cluster Autoscaler does not scale up when all pending pods are unschedulable
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50919, 51410, 50099, 51300, 50296)
GCE: Read networkProjectID param
Fixes#48515
/assign bowei
The first commit is the original PR cherrypicked. The master's kubelet isn't provided a cloud config path, so the project is retrieved via instance metadata. In the GKE case, this project cannot be retrieved by the master and caused an error.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51471, 50561, 50435, 51473, 51436)
Feature gate initializers field
The metadata.initializers field should be feature gated and disabled by default while in alpha, especially since enforcement of initializer permission that keeps users from submitting objects with their own initializers specified is done via an admission plugin most clusters do not enable yet.
Not gating the field and tests caused tests added in https://github.com/kubernetes/kubernetes/issues/51429 to fail on clusters that don't enable the admission plugin.
This PR:
* adds an `Initializers` feature gate, auto-enables the feature gate if the admission plugin is enabled
* clears the `metadata.initializers` field of objects on create/update if the feature gate is not set
* marks the e2e tests as feature-dependent (will follow up with PR to test-infra to enable the feature and opt in for GCE e2e tests)
```release-note
Use of the alpha initializers feature now requires enabling the `Initializers` feature gate. This feature gate is auto-enabled if the `Initialzers` admission plugin is enabled.
```
Automatic merge from submit-queue
Moved node condition filter into a predicates.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#50360
**Release note**:
```release-note
A new predicates, named 'CheckNodeCondition', was added to replace node condition filter. 'NetworkUnavailable', 'OutOfDisk' and 'NotReady' maybe reported as a reason when failed to schedule pods.
```
Automatic merge from submit-queue (batch tested with PRs 50953, 51082)
Fix mergekey of initializers; Repair invalid update of initializers
Fix https://github.com/kubernetes/kubernetes/issues/51131
The PR did two things to make parallel patching `metadata.initializers.pending` possible:
* Add mergekey to initializers.pending
* Let the initializer admission plugin set the `metadata.intializers` to nil if an update makes the `pending` and the `result` both nil, instead of returning a validation error. Otherwise if multiple initializer controllers sending the patch removing themselves from `pending` at the same time, one of them will get a validation error.
```release-note
The patch to remove the last initializer from metadata.initializer.pending will result in metadata.initializer to be set to nil (assuming metadata.initializer.result is also nil), instead of resulting in an validation error.
```
Automatic merge from submit-queue
Fix forbidden message format
Before this change:
$ kubectl get pods --as=tom
Error from server (Forbidden): pods "" is forbidden: User "tom" cannot list pods in the namespace "default".
After this change:
$ kubectl get pods --as=tom
Error from server (Forbidden): pods is forbidden: User "tom" cannot list pods in the namespace "default".
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```
Fix forbidden message format, remove extra ""
```
Automatic merge from submit-queue
Let the quota evaluator handle mutating specs of pod & pvc
### Background
The final goal is to address https://github.com/kubernetes/kubernetes/issues/47837, which aims to allow more mutation for uninitialized objects.
To do that, we [decided](https://github.com/kubernetes/kubernetes/issues/47837#issuecomment-321462433) to let the admission controllers to handle mutation of uninitialized objects.
### Issue
#50399 attempted to fix all admission controllers so that can handle mutating uninitialized objects. It was incomplete. I didn't realize although the resourcequota admission plugin handles the update operation, the underlying evaluator didn't. This PR updated the evaluators to handle updates of uninitialized pods/pvc.
### TODO
We still miss another piece. The [quota replenish controller](https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/resourcequota/replenishment_controller.go) uses the sharedinformer, which doesn't observe the deletion of uninitialized pods at the moment. So there is a quota leak if a pod is deleted before it's initialized. It will be addressed with https://github.com/kubernetes/kubernetes/issues/48893.
Automatic merge from submit-queue
Make coreos test images sshd not allow password login.
This will prevent security scanners from triggering.
Configuration is verbatim from:
https://coreos.com/os/docs/latest/customizing-sshd.html
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51054, 51101, 50031, 51296, 51173)
Dynamic Flexvolume plugin discovery, probing with filesystem watch.
**What this PR does / why we need it**: Enables dynamic Flexvolume plugin discovery. This model uses a filesystem watch (fsnotify library), which notifies the system that a probe is necessary only if something changes in the Flexvolume plugin directory.
This PR uses the dependency injection model in https://github.com/kubernetes/kubernetes/pull/49668.
**Release Note**:
```release-note
Dynamic Flexvolume plugin discovery. Flexvolume plugins can now be discovered on the fly rather than only at system initialization time.
```
/sig-storage
/assign @jsafrane @saad-ali
/cc @bassam @chakri-nelluri @kokhang @liggitt @thockin
Automatic merge from submit-queue (batch tested with PRs 50889, 51347, 50582, 51297, 51264)
Change eviction manager to manage one single local storage resource
**What this PR does / why we need it**:
We decided to manage one single resource name, eviction policy should be modified too.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: part of #50818
**Special notes for your reviewer**:
**Release note**:
```release-note
Change eviction manager to manage one single local ephemeral storage resource
```
/assign @jingxu97
Before this change:
# kubectl get pods --as=tom
Error from server (Forbidden): pods "" is forbidden: User "tom" cannot list pods in the namespace "default".
After this change:
# kubectl get pods --as=tom
Error from server (Forbidden): pods is forbidden: User "tom" cannot list pods in the namespace "default".
Automatic merge from submit-queue
Fixed gke auth update wait condition.
Lookup whoami on gke using gcloud auth list.
Make sure we do not run the test on any cluster older than 1.7.
**What this PR does / why we need it**: Fixes issue with aggregator e2e test on GKE
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#50945
**Special notes for your reviewer**: There is a TODO, follow up will be provided when the immediate problem is resolved.
**Release note**: ```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51134, 51122, 50562, 50971, 51327)
Made the tests ensure that Cluster Autoscaler is on before running.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Configuration is based on:
https://coreos.com/os/docs/latest/customizing-sshd.html
The specific SSHD config is:
# Use most defaults for sshd configuration.
UsePrivilegeSeparation sandbox
Subsystem sftp internal-sftp
ClientAliveInterval 180
UseDNS no
UsePAM yes
PrintLastLog no # handled by PAM
PrintMotd no # handled by PAM
AuthenticationMethods publickey
This will prevent security scanners from triggering.
Automatic merge from submit-queue
AllowedNotReadyNodes allowed to be not ready for absolutely *any* reason
It's as good as we allow those many nodes to be not part of the cluster at all, ever.
Btw - currently our 5k-node correctness test fails if "kubelet stopped posting node status" or "route not created", etc (ref: https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scale-correctness/3/build-log.txt)
cc @kubernetes/sig-scalability-misc
Automatic merge from submit-queue (batch tested with PRs 51244, 50559, 49770, 51194, 50901)
Distribute pods efficiently in CA scalability tests
**What this PR does / why we need it**:
Instead of using runReplicatedPodOnEachNode method
which is suited to a small number of nodes,
distribute pods on the nodes with desired load
using RCs that eat up all the space we want to be
empty after distribution.
**Release note**:
```
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50213, 50707, 49502, 51230, 50848)
StatefulSet: Deflake e2e `kubectl exec` commands.
This may help with another source of flakiness found while investigating #48031.
We seem to get a lot of flakes due to "connection refused" while running `kubectl exec`. I can't find any reason this would be caused by the test flow, so I'm adding retries to see if that helps.
Automatic merge from submit-queue (batch tested with PRs 51224, 51191, 51158, 50669, 51222)
Enable overlay2 on cos-m60 in node e2e tests
Ref: https://github.com/kubernetes/kubernetes/issues/42926
- Restart docker with `-s overlay2` in cloud-init before running all node e2e tests. I have to copy the systemd unit file to `/etc/systemd/system` because the `/usr/lib/systemd/system/` is read only.
- Updated node e2e tests to use the new cos-m60 image.
- The name of the cloud init file (`cos-init-live-restore.yaml`) does not indicate overlay2 will be enabled, but I can't just change the name in this PR, since it's referenced in test-infra.
**Release note**:
```
None
```
/assign @Random-Liu
Automatic merge from submit-queue (batch tested with PRs 51224, 51191, 51158, 50669, 51222)
StatefulSet: Deflake e2e "restart" phase.
This addresses another source of flakiness found while investigating #48031.
The test used to scale the StatefulSet down to 0, wait for ListPods to return 0 matching Pods, and then scale the StatefulSet back up.
This was prone to a race in which StatefulSet was told to scale back up before it had observed its own deletion of the last Pod, as evidenced by logs showing the creation of Pod ss-1 prior to the creation of the replacement Pod ss-0.
Instead, we now wait for the controller to observe all deletions before scaling it back up. This should fix flakes of the form:
```
Too many pods scheduled, expected 1 got 2
```
We seem to get a lot of flakes due to "connection refused" while running
`kubectl exec`. I can't find any reason this would be caused by the test
flow, so I'm adding retries to see if that helps.
Instead of using runReplicatedPodOnEachNode method
which is suited to a small number of nodes,
distribute pods on the nodes with desired load
using RCs that eat up all the space we want to be
empty after distribution.
Automatic merge from submit-queue (batch tested with PRs 51193, 51154, 42689, 51189, 51200)
Re-enable OIR e2e tests.
Re-enabling test skeleton for opaque integer resources originally submitted as part of #41870. The e2e was disabled since it was flaky. This is the first step toward re-enabling them. Currently all cases are skipped, so this exercises only the BeforeEach behavior and the deferred removal of OIRs from a node.
cc @timothysc
Automatic merge from submit-queue (batch tested with PRs 51108, 51035, 50539, 51160, 50947)
Auto-calculate CLUSTER_IP_RANGE based on cluster size
In preparation for eliminating CLUSTER_IP_RANGE env var from job configs, making it less error prone while folks try to start their own large cluster tests (https://github.com/kubernetes/kubernetes/issues/50907).
/cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek
Automatic merge from submit-queue (batch tested with PRs 51113, 46597, 50397, 51052, 51166)
Add statefulset upgrade tests to cluster_upgrade
**What this PR does / why we need it**:
Adds already created statefulset upgrade tests to cluster_upgrade.go. With further test infra changes, this will allow them to be continuously run, giving better signals.
Detect and prevent issues like https://github.com/kubernetes/kubernetes/issues/48327
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51113, 46597, 50397, 51052, 51166)
implement proposal 34058: hostPath volume type
**What this PR does / why we need it**:
implement proposal #34058
**Which issue this PR fixes** : fixes#46549
**Special notes for your reviewer**:
cc @thockin @luxas @euank PTAL
Automatic merge from submit-queue (batch tested with PRs 50489, 51070, 51011, 51022, 51141)
update to rbac v1 in yaml file
**What this PR does / why we need it**:
ref to https://github.com/kubernetes/kubernetes/pull/49642
ref https://github.com/kubernetes/features/issues/2
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
cc @liggitt
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Skip "Simple pod should support exec through kubectl proxy" test
As reported in https://github.com/kubernetes/kubernetes/issues/50466,
this test doesn't work in GKE because it uses a bearer token and the feature only works with client certs.
As the feature that is broken in GKE is new and didn't work before, it
is safe to juste ignore the test and consider the feature as "still not
working" in GKE.
**What this PR does / why we need it**: Fixes the broken test in https://k8s-testgrid.appspot.com/release-master-blocking#gke
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: works-around #50466
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
The test used to scale the StatefulSet down to 0, wait for ListPods to
return 0 matching Pods, and then scale the StatefulSet back up.
This was prone to a race in which StatefulSet was told to scale back up
before it had observed its own deletion of the last Pod, as evidenced by
logs showing the creation of Pod ss-1 prior to the creation of the
replacement Pod ss-0.
We now wait for the controller to observe all deletions before
scaling it back up. This should fix flakes of the form:
```
Too many pods scheduled, expected 1 got 2
```
Automatic merge from submit-queue (batch tested with PRs 50257, 50247, 50665, 50554, 51077)
Replace hard-code "cpu" and "memory" to consts
**What this PR does / why we need it**:
There are many places using hard coded "cpu" and "memory" as resource name. This PR replace them to consts.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
/kind cleanup
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50980, 46902, 51051, 51062, 51020)
Remove seemingly obsolete binaries
It's hard to tell if these are safe to remove. Let CI tell me.
Automatic merge from submit-queue (batch tested with PRs 51039, 50512, 50546, 50965, 50467)
Kubectl: Plumb openapi validation (disabled by default)
**What this PR does / why we need it**: Creates a new flag '--openapi' and plumb in the validation code so that it can be used by default to validate objects against the openapi schema.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: partially https://github.com/kubernetes/kubectl/issues/49
**Special notes for your reviewer**:
This is not complete, the name of the variable must change for example.
**Release note**:
```release-note
Kubectl uses openapi for validation. If OpenAPI is not available on the server, it defaults back to the old Swagger.
```
Automatic merge from submit-queue
StatefulSet: Deflake e2e "Saturate" phase.
This should reduce one source of flakiness found while investigating #48031.
The "Saturate" phase of StatefulSet e2e tests verifies orderly startup by controlling when each Pod is allowed to report Ready. If a Pod unexepectedly goes down during the test, the replacement Pod
created by the controller will forget if it was already allowed to report Ready.
After this change, the signal that allows each Pod to report Ready is persisted in the Pod's PVC. Thus, the replacement Pod will remember that it was already told to proceed to a Ready state.
Automatic merge from submit-queue (batch tested with PRs 51102, 50712, 51037, 51044, 51059)
[sig-network-e2e] Remove redundant sig prefix from tests
**What this PR does / why we need it**:
Remove redundant sig prefix from:
```
[sig-network] Networking [sig-network] Granular Checks: Services [Slow] should function for endpoint-Service: http
[sig-network] Networking [sig-network] Granular Checks: Services [Slow] should function for endpoint-Service: udp
[sig-network] Networking [sig-network] Granular Checks: Services [Slow] should function for node-Service: http
[sig-network] Networking [sig-network] Granular Checks: Services [Slow] should function for node-Service: udp
[sig-network] Networking [sig-network] Granular Checks: Services [Slow] should function for pod-Service: http
[sig-network] Networking [sig-network] Granular Checks: Services [Slow] should function for pod-Service: udp
[sig-network] Networking [sig-network] Granular Checks: Services [Slow] should update endpoints: http
[sig-network] Networking [sig-network] Granular Checks: Services [Slow] should update endpoints: udp
[sig-network] Networking [sig-network] Granular Checks: Services [Slow] should update nodePort: http [Slow]
[sig-network] Networking [sig-network] Granular Checks: Services [Slow] should update nodePort: udp [Slow]
[sig-network] Loadbalancing: L7 [sig-network] GCE [Slow] [Feature:Ingress] should conform to Ingress spec
[sig-network] Loadbalancing: L7 [sig-network] GCE [Slow] [Feature:Ingress] should create ingress with given static-ip
```
Umbrella issue #49161
**Special notes for your reviewer**:
cc @xiangpengzhao
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50967, 50505, 50706, 51033, 51028)
Fix GC integration test race
During TestCreateWithNonExistentOwner, when creating a pod with a
non-existent owner, assume it's possible the pod will be deleted before
we start checking for the pod's existence. Assuming that the pod still
exists immediately after Create returns is flaky if the GC reacts very
quickly.
```release-note
NONE
```
Might fix https://github.com/kubernetes/kubernetes/issues/50943; without the additional test context provided by this PR, it's not entirely possible to assess the root cause of the reported failure (as we don't know whether the original assertion failure was due to there being 0 or >1 pods).
/cc @caesarxuchao
Automatic merge from submit-queue (batch tested with PRs 50967, 50505, 50706, 51033, 51028)
Revert "Merge pull request #51008 from kubernetes/revert-50789-fix-scheme"
I'm spinning up a cluster right now to test this fix, but I'm pretty sure this was the problem.
There doesn't seem to be a way to confirm from logs, because AFAICT the logs from the hollow kubelet containers are not collected as part of the kubemark test.
**What this PR does / why we need it**:
This reverts commit f4afdecef8, reversing
changes made to e633a1604f.
This also fixes a bug where Kubemark was still using the core api scheme
to manipulate the Kubelet's types, which was the cause of the initial
revert.
**Which issue this PR fixes**: fixes#51007
**Release note**:
```release-note
NONE
```
/cc @shyamjvs @wojtek-t
As reported in https://github.com/kubernetes/kubernetes/issues/50466,
this test doesn't work in GKE because the transport layer doesn't work
with dialing.
As the feature that is broken in GKE is new and didn't work before, it
is safe to juste ignore the test and consider the feature as "still not
working" in GKE.
Automatic merge from submit-queue
Should generate files before scheduler perf
**What this PR does / why we need it**:
For a newly cloned project, generated files are not included. Then scheduler_perf will fail:
```
undefined: openapi.GetOpenAPIDefinitions
```
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
fixes: #51090
**Special notes for your reviewer**:
Automatic merge from submit-queue (batch tested with PRs 50893, 50913, 50963, 50629, 50640)
Increase latency threshold for list api calls
This is only a short-term solution to make our density test green. In the long-term, we should measure as per our new SLIs.
From @wojtek-t's [doc](https://docs.google.com/document/d/1Q5qxdeBPgTTIXZxdsFILg7kgqWhvOwY8uROEf0j5YBw) on the new SLIs/SLOs, we have the following SLO for list calls:
```
SLO1: In default Kubernetes installation, 99th percentile of SLI2 per cluster-day:
<= 1s if total number of objects of the same type as resource in the system <= X
<= 5s if total number of objects of the same type as resource in the system <= Y
<= 30s if total number of objects of the same types as resource in the system <= Z
```
I would guess that 170,000 pods would fall into the 2nd bracket (at least) and hence the new value of 5s. WDYT?
cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek
Automatic merge from submit-queue
Revert #50362.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: part of #50884
**Release note**:
```release-note
None
```
Automatic merge from submit-queue (batch tested with PRs 50693, 50831, 47506, 49119, 50871)
Add instance metadata from flag even when using image config.
Also add instance metadata from flag even when we are using image config.
* Sometimes we need to dynamically generate instance metadata, it's troublesome to put them into image config.
* Sometimes we want to apply instance metadata to all images, it's duplicated to add them to each image in the image config.
/assign @yguo0905 Could you help me review this?
The "Saturate" phase of StatefulSet e2e tests verifies orderly startup
by controlling when each Pod is allowed to report Ready.
If a Pod unexepectedly goes down during the test, the replacement Pod
created by the controller will forget if it was already allowed to
report Ready.
After this change, the signal that allows each Pod to report Ready is
persisted in the Pod's PVC. Thus, the replacement Pod will remember that
it was already told to proceed to a Ready state.
This reverts commit f4afdecef8, reversing
changes made to e633a1604f.
This also fixes a bug where Kubemark was still using the core api scheme
to manipulate the Kubelet's types, which was the cause of the initial
revert.
Automatic merge from submit-queue (batch tested with PRs 47896, 50678, 50620, 50631, 51005)
Made the difference between scale-up timeout and cluster set-up timeout explicit.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
During TestCreateWithNonExistentOwner, when creating a pod with a
non-existent owner, assume it's possible the pod will be deleted before
we start checking for the pod's existence. Assuming that the pod still
exists immediately after Create returns is flaky if the GC reacts very
quickly.
- Wait for the master to be healthy
- Wait longer for the master to start
- Fail gracefully if starting the master panics
Signed-off-by: Monis Khan <mkhan@redhat.com>
Automatic merge from submit-queue (batch tested with PRs 46512, 50146)
Make metav1.(Micro)?Time functions take pointers
Is there any reason for those functions not to be on pointers?
Automatic merge from submit-queue (batch tested with PRs 50904, 50691)
Stackdriver Logging e2e: Explicitly check for docker and kubelet logs presence
Check for kubelet and docker logs explicitly in the Stackdriver Logging e2e tests
Automatic merge from submit-queue
Add enj to OWNERS for test/integration/etcd/etcd_storage_path_test.go
@deads2k is the bot smart enough to not spam me with every test change? Perhaps I should create an `OWNERS` file in `test/integration/etcd`?
**Release note**:
```release-note
NONE
```
@kubernetes/sig-api-machinery-pr-reviews
Automatic merge from submit-queue
CollisionCount should have type int32 across controllers that use it for collision avoidance
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#50530
**Special notes for your reviewer**:
/cc @liyinan926
/assign @kow3ns @thockin @janetkuo
**Release note**:
```release-note
Change CollisionCount from int64 to int32 across controllers
```
Automatic merge from submit-queue (batch tested with PRs 50277, 50823, 50376, 50867)
Move e2e taints test file to sig-scheduling
**What this PR does / why we need it**:
Move taint test file to e2e scheduling and add sig-scheduling prefix.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Ref Umbrella issue #49161
**Special notes for your reviewer**:
**Release note**:
none
Automatic merge from submit-queue
Change API version of statefulset scale subresource e2e test to v1beta2
**What this PR does / why we need it**:
This PR changes API version of statefulset scale subresource e2e test from `v1beta1` to `v1beta2`.
`apps/v1beta2` has been enabled.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: xref #50109
**Release note**:
```release-note
NONE
```