Automatic merge from submit-queue
More owners from SIG-CLI
Adds SIG-CLI as reviewers and approvers of `cmd/clicheck/` and adds me + @pwittrock as approvers in `hack/` (mostly for `test-cmd` and some `hack/verify*.sh` and `hack/update*.sh` scripts).
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Make real proxier in hollow-proxy optional (default=true)
Ref https://github.com/kubernetes/kubernetes/pull/45622
This allows using real proxier for hollow proxy, but we use the fake one by default.
cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek
Automatic merge from submit-queue
Remove the deprecated --babysit-daemons kubelet flag
```release-note
Removes the deprecated kubelet flag --babysit-daemons
```
This flag has been deprecated for over a year (git blame says marked deprecated on March 1, 2016).
Relatively easy removal - nothing in the Kubelet relies on it anymore.
There was still some stuff in the provisioning scripts. It was easy to rip out, but in general we probably need to be more disciplined about updating the provisioning scripts at the same time that we initially mark things deprecated.
Automatic merge from submit-queue
Move all API related annotations into annotation_key_constants.go
Separate from #45869. See https://github.com/kubernetes/kubernetes/pull/45869#discussion_r116839411 for details.
This PR does nothing but move constants around :)
/assign @caesarxuchao
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Disabling service-dns controller for federation kubectl tests
**What this PR does / why we need it**:
DNS was unnecessary to do kubectl tests against federation, but it was required earlier as service-controller would not start without initializing dns-provider. Now since we have the capability to disable service-dns controller, we no longer need to initialize federation-controller-manger with DNS specific stuff. So removing it.
Ref: https://github.com/kubernetes/kubernetes/pull/43136#issuecomment-287242198
**Release note**:
```
NONE
```
/cc @nikhiljindal @marun
@kubernetes/sig-federation-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 45709, 41939)
Add an AEAD encrypting transformer for storing secrets encrypted at rest
Tweak the ValueTransformer interface slightly to support additional
context information (to allow authenticated data to be generated by the
store and passed to the transformer). Add a prefix transformer that
looks for known matching prefixes and uses them. Add an AES GCM
transformer that performs AEAD on the values coming in and out of the
store.
Implementation of https://docs.google.com/document/d/1lFhPLlvkCo3XFC2xFDPSn0jAGpqKcCCZaNsBAv8zFdE/edit# and https://github.com/kubernetes/kubernetes/issues/12742
Automatic merge from submit-queue (batch tested with PRs 45884, 45879, 45912, 45444, 45874)
Use patched version of Go 1.8.1 to eliminate performance regression
Ref https://github.com/kubernetes/kubernetes/issues/45216
Until this is solved in Go (it's still unclear whether there will be patch release with the fix or not), this is solving the problem on our side.
Automatic merge from submit-queue (batch tested with PRs 45247, 45810, 45034, 45898, 45899)
Apiregistration v1alpha1→v1beta1
Promoting apiregistration api from v1alpha1 to v1beta1.
API Registration is responsible for registering an API `Group`/`Version` with
another kubernetes like API server. The `APIService` holds information
about the other API server in `APIServiceSpec` type as well as general
`TypeMeta` and `ObjectMeta`. The `APIServiceSpec` type have the main
configuration needed to do the aggregation. Any request coming for
specified `Group`/`Version` will be directed to the service defined by
`ServiceReference` (on port 443) after validating the target using provided
`CABundle` or skipping validation if development flag `InsecureSkipTLSVerify`
is set. `Priority` is controlling the order of this API group in the overall
discovery document.
The return status is a set of conditions for this aggregation. Currently
there is only one condition named "Available", if true, it means the
api/server requests will be redirected to specified API server.
```release-note
API Registration is now in beta.
```
Automatic merge from submit-queue (batch tested with PRs 45374, 44537, 45739, 44474, 45888)
Allow kcm and scheduler to lock on ConfigMaps.
**What this PR does / why we need it**:
Plumbs through the ability to lock on ConfigMaps through the kcm and scheduler.
**Which issue this PR fixes**
Fixes: #44857
Addresses issues with: #45415
**Special notes for your reviewer**:
**Release note**:
```
Add leader-election-resource-lock support to kcm and scheduler to allow for locking on ConfigMaps as well as Endpoints(default)
```
/cc @kubernetes/sig-cluster-lifecycle-pr-reviews @jamiehannaford @bsalamat @mikedanese
Automatic merge from submit-queue (batch tested with PRs 45860, 45119, 44525, 45625, 44403)
[Federation] Move annotations and related parsing code as common code
This PR moves some code, which was duplicate, around as common code.
Changes the names of structures used for annotations to common names.
s/FederatedReplicaSetPreferences/ReplicaAllocationPreferences/
s/ClusterReplicaSetPreferences/PerClusterPreferences/
This can be reused in job controller and hpa controller code.
**Special notes for your reviewer**:
@kubernetes/sig-federation-misc
**Release note**:
```NONE
```
Automatic merge from submit-queue (batch tested with PRs 45860, 45119, 44525, 45625, 44403)
Support running StatefulSetBasic e2e tests with local-up-cluster
**What this PR does / why we need it**:
Currently StatefulSet(s) fail when you use local-up-cluster without
setting a cloud provider. In this PR, we use set the
kubernetes.io/host-path provisioner as the default provisioner when
there CLOUD_PROVIDER is not specified. This enables e2e test(s)
(specifically StatefulSetBasic) to work.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
This plugin acquires a fresh access token for apiserver from Azure Active
Directory using the device code flow. The access token is saved in the
configuration in order to be reused for upcomming accesses to appiserver.
In additon the access token is automatically refreshed when expired.
Automatic merge from submit-queue
add --as-group option to cli
The usecase of this change:
When a super user grant some RBAC permissions to a group, he can use
--as--group to test whether the group get the permissions.
Note that now we support as-groups, as-user-extra in kubeconfig file after this change.
**Release note**:
```NONE
```
@liggitt
Automatic merge from submit-queue
Internal audit API
Forked from https://github.com/kubernetes/kubernetes/pull/45315
I pulled out only the internal audit API types and would like to merge this to unblock the audit implementation work while we figure out the versioned types and code generation.
I will continue to iterate on https://github.com/kubernetes/kubernetes/pull/45315, but lets get this internal type submitted.
/cc @ericchiang @ihmccreery
Automatic merge from submit-queue (batch tested with PRs 41331, 45591, 45600, 45176, 45658)
Detect and prevent new vendor cycles
I see that we have added a dependency with a cyclic reference to
kubernetes. This makes life much harder, we should not do it. This
script should prevent any more offenders while we fix the existing one.
Automatic merge from submit-queue (batch tested with PRs 41331, 45591, 45600, 45176, 45658)
Move client/unversioned/remotecommand to client-go
Module remotecommand originally part of kubernetes/pkg/client/unversioned was moved
to client-go/tools, and will be used as authoritative in kubectl, e2e and other places.
Module remotecommand relies on util/exec module which was copied to client-go/pkg/util
Module remotecommand originally part of kubernetes/pkg/client/unversioned was moved
to client-go/tools, and will be used as authoritative in kubectl, e2e and other places.
Module remotecommand relies on util/exec module which will be copied to client-go/pkg/util
The usecase of this change:
When a super user grant some RBAC permissions to a group, he can use
--as-group to test whether the group get the permissions.
Note that now we support as-groups, as-user-extra in kubeconfig file
after this change.
I see that we have added a dependency with a cyclic reference to
kubernetes. This makes life much harder, we should not do it. This
script should prevent any more offenders while we fix the existing one.
Change-Id: Iabc22ee7a2c7c54697991d2f8c24e80acb0281e4
Automatic merge from submit-queue (batch tested with PRs 45691, 45667, 45698, 45715)
Make update_owners.py also emit a JSON sig-owners list.
This should experience less churn in general!
I'm going to make the triage page use this list to let sigs have individualized dashboards.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Remove the deprecated `--enable-cri` flag
Except for rkt, CRI is the default and only integration point for
container runtimes.
```release-note
Remove the deprecated `--enable-cri` flag. CRI is now the default,
and the only way to integrate with kubelet for the container runtimes.
```
Automatic merge from submit-queue (batch tested with PRs 45382, 45384, 44781, 45333, 45543)
Copy internal types to metrics
Supersedes #45306.
#45306 removed the internal types and suggested whoever needs the internal types should define their own copy, and use the code-gen tools to generated the conversion functions. Per offline discussion with @DirectXMan12, we wanted to go that direction but it's not clear where to put the internal types yet. Hence, as a temporary solution, we decided copy the referred client-go/pkg/api types into metrics api to avoid the dependency.
The commit "remove need of registry from custom_metrics/client.go" is similar to what I did to the fake client in an earlier PR. Let me know if you want to put the commit in another PR.
Automatic merge from submit-queue
Enable shared PID namespace by default for docker pods
**What this PR does / why we need it**: This PR enables PID namespace sharing for docker pods by default, bringing the behavior of docker in line with the other CRI runtimes when used with docker >= 1.13.1.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: ref #1615
**Special notes for your reviewer**: cc @dchen1107 @yujuhong
**Release note**:
```release-note
Kubernetes now shares a single PID namespace among all containers in a pod when running with docker >= 1.13.1. This means processes can now signal processes in other containers in a pod, but it also means that the `kubectl exec {pod} kill 1` pattern will cause the pod to be restarted rather than a single container.
```
This commit turns on client-gen for k8s.io/metrics. Clients are
generated for `k8s.io/metrics/pkg/apis/metrics` (both internal and
v1alpha1). `k8s.io/metrics/pkg/apis/custom_metrics` uses a bespoke
client due to the unique nature of its paths.
Automatic merge from submit-queue (batch tested with PRs 45481, 45463)
ThirdPartyResource example: added watcher example, code cleanup
**NOTE**: This is a cleaned and updated version of PR https://github.com/kubernetes/kubernetes/pull/43027
**What this PR does / why we need it**:
An example of using go-client for watching on ThirdPartyResource events (create/update/delete).
Automatic merge from submit-queue (batch tested with PRs 45508, 44258, 44126, 45441, 45320)
Removed 'default' row from test_owners.csv and Updated update_owners.py
**What this PR does / why we need it**:
Removes the 'default' row from test_owners.csv and the validation/update logic associated with it in update_owners.py.
The 'default' row is being removed because it results in too many issues being assigned to the default test owners when issues are automatically generated.
**Release note**:
```release-note
NONE
```
/assign
Automatic merge from submit-queue (batch tested with PRs 45508, 44258, 44126, 45441, 45320)
cloud initialize node in external cloud controller
@thockin This PR adds support in the `cloud-controller-manager` to initialize nodes (instead of kubelet, which did it previously)
This also adds support in the kubelet to skip node cloud initialization when `--cloud-provider=external`
Specifically,
Kubelet
1. The kubelet has a new flag called `--provider-id` which uniquely identifies a node in an external DB
2. The kubelet sets a node taint - called "ExternalCloudProvider=true:NoSchedule" if cloudprovider == "external"
Cloud-Controller-Manager
1. The cloud-controller-manager listens on "AddNode" events, and then processes nodes that starts with that above taint. It performs the cloud node initialization steps that were previously being done by the kubelet.
2. On addition of node, it figures out the zone, region, instance-type, removes the above taint and updates the node.
3. Then periodically queries the cloudprovider for node addresses (which was previously done by the kubelet) and updates the node if there are new addresses
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 44727, 45409, 44968, 45122, 45493)
Separate healthz server from metrics server in kube-proxy
From #14661, proposal is on kubernetes/community#552.
Couple bullet points as in commit:
- /healthz will be served on 0.0.0.0:10256 by default.
- /metrics and /proxyMode will be served on port 10249 as before.
- Healthz handler will verify timestamp in iptables mode.
/assign @nicksardo @bowei @thockin
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
util/iptables: grab iptables locks if iptables-restore doesn't support --wait
When iptables-restore doesn't support --wait (which < 1.6.2 don't), it may
conflict with other iptables users on the system, like docker, because it
doesn't acquire the iptables lock before changing iptables rules. This causes
sporadic docker failures when starting containers.
To ensure those don't happen, essentially duplicate the iptables locking
logic inside util/iptables when we know iptables-restore doesn't support
the --wait option.
Unfortunately iptables uses two different locking mechanisms, one until
1.4.x (abstract socket based) and another from 1.6.x (/run/xtables.lock
flock() based). We have to grab both locks, because we don't know what
version of iptables-restore exists since iptables-restore doesn't have
a --version option before 1.6.2. Plus, distros (like RHEL) backport the
/run/xtables.lock patch to 1.4.x versions.
Related: https://github.com/kubernetes/kubernetes/pull/43575
See also: https://github.com/openshift/origin/pull/13845
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1417234
@kubernetes/rh-networking @kubernetes/sig-network-misc @eparis @knobunc @danwinship @thockin @freehan
Automatic merge from submit-queue (batch tested with PRs 45182, 45429)
Coverage: shasum command not supported on CentOS
Centos has sha1sum, instead of "shasum -a1". Modified script to
check for existence fo shasum, and if not present, use sha1sum
for coverage test processing.
**What this PR does / why we need it**: Allows coverage test to run under CentOS. Needed for development using that OS.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#45425
**Special notes for your reviewer**:
**Release note**:
```NONE
```
Automatic merge from submit-queue (batch tested with PRs 45182, 45429)
CustomResources in separate API server
Builds on https://github.com/kubernetes/kubernetes/pull/45115.
This adds a basic handler for custom resources. No status handling, no finalizers, no controllers, but basic CRUD runs to allow @enisoc and others to start considering migration.
@kubernetes/sig-api-machinery-misc
Centos has sha1sum, instead of "shasum -a1". Modified script to
check for existence fo shasum, and if not present, use sha1sum
for coverage test processing. If neither are available, an error
will be reported and processing stopped.
Automatic merge from submit-queue
add set rolebinding/clusterrolebinding command
add command to set user/group/serviceaccount in rolebinding/clusterrolebinding /cc @liggitt @deads2k
Automatic merge from submit-queue (batch tested with PRs 45272, 45115)
initial types for TPRs
This pull starts creating the types described by https://github.com/kubernetes/community/blob/master/contributors/design-proposals/thirdpartyresources.md . In the initial pull different names were suggested. I've started this pull with `CustomResource.apiextensions.k8s.io`.
The structure begins as a separate API server to facilitate rapid prototyping and experimentation, but the end result will be added to the end of the `kube-apiserver` chain as described in https://github.com/kubernetes/community/blob/master/sig-api-machinery/api-extensions-position-statement.md .
Because it is separate to start (not included in any default server), I don't think we need a perfect name, but I'd like to be close.
@kubernetes/sig-api-machinery-misc @enisoc @smarterclayton @erictune
Automatic merge from submit-queue (batch tested with PRs 45239, 45230)
Create a single CA for both client and server certs
**What this PR does / why we need it**:
The following test:
"Certificates API [It] should support building a client with a CSR"
fails with local-up-cluster, but works in the existing CI jobs. This
is because the other CI jobs use a single CA cert while local-up-cluster
can use 2 different sets of CA(s). We need a way to mimic the other
CI jobs (or alternatively change everything to have separate CA's). Just
updating local-up-cluster with a flag seems to be the easy route.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
The following test:
"Certificates API [It] should support building a client with a CSR"
fails with local-up-cluster, but works in the existing CI jobs. This
is because the other CI jobs use a single CA cert while local-up-cluster
can use 2 different sets of CA(s). We need a way to mimic the other
CI jobs (or alternatively change everything to have separate CA's). Just
updating local-up-cluster with a flag seems to be the easy route.
Automatic merge from submit-queue
Use munged semantic version for side-loaded docker tag
**What this PR does / why we need it**: rather than using the md5sum of the dockerized binary for each side-loaded docker image, use the semantic version (with `+`s replaced with `_`s) for the side-loaded docker images.
The use of the md5sum for the docker tag dates to #6326 2 years ago. I'm not sure why that was chosen, short of it being fairly unique.
My main motivation for changing this is that it makes building the docker images using Bazel's docker rules easier, since the semantic version doesn't depend on the build output.
An added benefit is that the list of images on a running kubernetes cluster is also more straightfoward; rather than a list of opaque, meaningless hexadecimal strings, you get something that indicates the provenance of the image. It'd also be clearer that all of the images came from the same build.
I was able to start a cluster with this change on GCE using both `make quick-release` and `make bazel-release`.
Note that this change has no effect on the tag that's pushed to gcr.io during releases; that's still controlled via `KUBE_IMAGE_DOCKER_TAG`, though we may want to merge this functionality at some point.
@kubernetes/sig-node-pr-reviews is there any reason to stick with using the md5sum strategy? @dchen1107 do you remember why we went with md5sums originally?
cc @spxtr @mikedanese
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 45077, 45180, 34727, 45079, 45177)
Allow specifying cluster signing ca/key
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 45077, 45180, 34727, 45079, 45177)
Refactor kube-proxy configuration
This is a proof of concept refactoring of the configuration and startup of kube-proxy. Most flags have been removed and replaced by a single config file, specified by `--config`. This is in regards to the component configuration improvement suggestions listed in #32215.
Also during this effort, I discovered that Hyperkube is roughly reimplementing portions of cobra, and that the current cobra command definitions are solely used to generated docs and man pages. I would like to move the individual commands as well as Hyperkube to using cobra, but that is a separate issue and discussion.
cc @mikedanese @liggitt @deads2k @eparis @sttts @smarterclayton @dgoodwin @timothysc
Automatic merge from submit-queue (batch tested with PRs 45077, 45180, 34727, 45079, 45177)
Move k8s.io/metrics to staging/
This is to break the cyclic dependency in our code base: kubernetes depends on k8s.io/metrics, which depends on kubernetes/staging/client-go.
@DirectXMan12 i actually moved it to staging because we will need the flexibility to update metrics code directly to do many planned refactors, so the copy of metrics in kubernetes has to be the source of truth.
client-gen is not enabled for the code yet, we can enable it after you port your changes to client-gen.
`make generated_files` is enabled for metrics.
Automatic merge from submit-queue
don't use build tags to mark integration tests
This is a bad pattern that leads to checked in code that isn't check for compilation. We should avoid this if it doesn't provide value, which it seems like it doesn't.
Automatic merge from submit-queue (batch tested with PRs 41583, 45117, 45123)
Implement shared PID namespace in the dockershim
**What this PR does / why we need it**: Defaults the Docker CRI to using a shared PID namespace for pods. Implements proposal in https://github.com/kubernetes/community/pull/207 tracked by #1615.
//cc @dchen1107 @vishh @timstclair
**Special notes for your reviewer**: none
**Release note**:
```release-note
Some container runtimes share a process (PID) namespace for all containers in a pod. This will become the default for Docker in a future release of Kubernetes. You can preview this functionality if running with the CRI and Docker 1.13.1 by enabling the --experimental-docker-enable-shared-pid kubelet flag.
```
Automatic merge from submit-queue (batch tested with PRs 44044, 44766, 44930, 45109, 43301)
e2e test: test azure disk volume
**What this PR does / why we need it**:
E2E test Azure disk volume
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
@kubernetes/sig-testing-pr-reviews
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 44044, 44766, 44930, 45109, 43301)
Fixes get -oname for unstructured objects
Fixes https://github.com/kubernetes/kubernetes/issues/44832
Make sure we display kind in `kubectl get -o name` for unknown resource types.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Prune examples and e2es per discussion on sig-testing
**What this PR does / why we need it**:
Prune k8petstore from examples and e2es per discussion on sig-testing
**Special notes for your reviewer**:
This can live elsewhere outside the main repository.
**Release note**:
```
NONE
```
/cc @jayunit100 @fejta @kubernetes/sig-testing-pr-reviews
Currently StatefulSet(s) fail when you use local-up-cluster without
setting a cloud provider. In this PR, we use set the
kubernetes.io/host-path provisioner as the default provisioner when
there CLOUD_PROVIDER is not specified. This enables e2e test(s)
(specifically StatefulSetBasic) to work.
Automatic merge from submit-queue
Don't check in zz_generated.openapi.go.
`zz_generated.openapi.go` is the file that causes the most merge conflicts of all. In #33440, @thockin updated the makefile to support generating these files on demand, but that didn't play well with bazel/gazel.
In this PR, I add a new build macro that will generate this file with a `go_genrule`. I added support for keeping the BUILD file up to date in mikedanese/gazel#34.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 44591, 44549)
Update repo-infra bazel dependency and use new gcs_upload rule
This PR provides similar functionality to push-build.sh entirely within Bazel rules (though it relies on gsutil).
It's an alternative to #44306.
Depends on https://github.com/kubernetes/repo-infra/pull/13.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
OpenAPI support for kubectl
Support for openapi spec in kubectl.
Includes:
- downloading and caching openapi spec to a local file
- parsing openapi spec into binary serializable datastructures (10x faster load times 600ms -> 40ms)
- caching parsed openapi spec in memory for each command
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 41287, 41636, 44881, 44826)
Enable default signer implementation
**What this PR does / why we need it**:
The Kubernetes controller manager provides a default implementation
of a signer. We should enable it by passing the --cluster-signing-cert-file
and --cluster-signing-key-file parameters to the controller manager
with paths to your Certificate Authority’s keypair. Hoping this will
help pass the "Certificates API [It] should support building a client with a CSR"
e2e test when run against k8s started using local-up-cluster.sh
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
bump bazel build to go1.8.1 and remove invalid unit tests
part of https://github.com/kubernetes/kubernetes/issues/38228
I firmly believe that unit tests that check error strings are incorrect unit tests. If we care about what type of error is returned, we need to use public error types. Anywhere we are using generic errors, we don't care other then that we saw an error.
Automatic merge from submit-queue (batch tested with PRs 44601, 44842, 44893, 44491, 44588)
kubeadm: add flag to skip token print out
**What this PR does / why we need it**: When kubeadm init is used in an automated context, it still prints the token to standard out. When standard output ends up in a log file, it can be considered that the token is leaked there and can be compromised. This PR adds a flag you can select to not have it print out and explicitly disable this behavior.
This is a continuation from https://github.com/kubernetes/kubernetes/pull/42823 since it had to be closed.
**Which issue this PR fixes** : fixes #https://github.com/kubernetes/kubeadm/issues/160
**Special notes for your reviewer**: /cc @luxas @errordeveloper
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 40060, 44860, 44865, 44825, 44162)
Validate etcd only when expecting to run etcd
If running kubelet only, there is no need to validate etcd as the script will not attempt to start etcd. In fact, validating etcd here may cause the script to fail when one wants to run "nokubelet" right before "kubeletonly" because etcd will definitely be running
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 44862, 42241, 42101, 43181, 44147)
Feature/hpa upscale downscale delay configurable
**What this PR does / why we need it**:
Makes "upscale forbidden window" and "downscale forbidden window" duration configurable in arguments of kube-controller-manager. Those are options of horizontal pod autoscaler.
**Special notes for your reviewer**:
Please have a look @DirectXMan12 , the PR as discussed in Slack.
**Release note**:
```
Make "upscale forbidden window" and "downscale forbidden window" duration configurable in arguments of kube-controller-manager. Those are options of horizontal pod autoscaler. Right now are hardcoded 3 minutes for upscale, and 5 minutes to downscale. But sometimes cluster administrator might want to change this for his own needs.
```
The Kubernetes controller manager provides a default implementation
of a signer. We should enable it by passing the --cluster-signing-cert-file
and --cluster-signing-key-file parameters to the controller manager
with paths to your Certificate Authority’s keypair. Hoping this will
help pass the "Certificates API [It] should support building a client with a CSR"
e2e test when run against k8s started using local-up-cluster.sh
The docker image is nowhere to be found, so lets remove it.
There have been a request for the Dockerfile here [1], but nobody
seems to care.
redis-proxy is replaced with redis-master in test-cmd-util.sh, to
ensure that the tests still works.
The redis-proxy pod in test/fixtures/doc-yaml/user-guide/multi-pod.yaml
is replaced with valid-pod from test/fixtures/doc-yaml/admin/limitrange/valid-pod.yaml,
so redis-proxy is removed every where.
[1] https://github.com/kubernetes/kubernetes/issues/4914#issuecomment-77209779
Automatic merge from submit-queue (batch tested with PRs 43500, 44073)
[Federation] Add option to retrieve e2e cluster config from secrets
Previously the federation e2e setup was reading member cluster configuration from the test run's kubeconfig. This change removes that dependency in favor of reading member cluster configuration from secrets in the hosting cluster, and caches the configuration to avoid having to read it separately for each test.
cc: @kubernetes/sig-federation-pr-reviews @perotinus
Automatic merge from submit-queue
[Federation] Add simple upgrade test
This PR adds a simple upgrade test that targets all registered federated types.
cc: @kubernetes/sig-federation-pr-reviews @perotinus
Automatic merge from submit-queue (batch tested with PRs 44222, 44614, 44292, 44638)
Optionally deploy kubernetes dashboard in local-up cluster
**What this PR does / why we need it**:
Enable users of local up cluster to optionally deploy the kubernetes dashboard.
**Special notes for your reviewer**:
The dashboard is especially useful when working on k8s + service catalog at the same time.
Automatic merge from submit-queue (batch tested with PRs 43000, 44500, 44457, 44553, 44267)
Add Kubernetes 1.6 support to Juju charms
**What this PR does / why we need it**:
This adds Kubernetes 1.6 support to Juju charms.
This includes some large architectural changes in order to support multiple versions of Kubernetes with a single release of the charms. There are a few bug fixes in here as well, for issues that we discovered during testing.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
Thanks to @marcoceppi, @ktsakalozos, @jacekn, @mbruzek, @tvansteenburgh for their work in this feature branch as well!
**Release note**:
```release-note
Add Kubernetes 1.6 support to Juju charms
Add metric collection to charms for autoscaling
Update kubernetes-e2e charm to fail when test suite fails
Update Juju charms to use snaps
Add registry action to the kubernetes-worker charm
Add support for kube-proxy cluster-cidr option to kubernetes-worker charm
Fix kubernetes-master charm starting services before TLS certs are saved
Fix kubernetes-worker charm failures in LXD
Fix stop hook failure on kubernetes-worker charm
Fix handling of juju kubernetes-worker.restart-needed state
Fix nagios checks in charms
```
Add the option to configure e2e access to member clusters from the
same secrets in the host cluster used by the federation control plane.
The default behavior will continue to be sourcing this configuration
from the e2e kubeconfig. The optional behavior can be enabled by
passing --federation-config-from-cluster=true as an argument to
ginkgo.
Automatic merge from submit-queue (batch tested with PRs 44424, 44026, 43939, 44386, 42914)
remove defaulting from conversion path
follow up for #42764
* remove call to defaulting from conversion path (defaulting is a separate step from conversion)
* remove non-top-level-object defaulting registration (unused after conversion call is removed)
* generate missing top-level defaults for some api groups:
* autoscaling/v2alpha1
* policy/v1alpha1
* policy/v1beta1
* register top-level defaults for some api groups that were missing them:
* autoscaling/v2alpha1
* settings/v1alpha1
Automatic merge from submit-queue (batch tested with PRs 44406, 41543, 44071, 44374, 44299)
Enable service account token lookup by default
Fixes#24167
```release-note
kube-apiserver: --service-account-lookup now defaults to true, requiring the Secret API object containing the token to exist in order for a service account token to be valid. This enables service account tokens to be revoked by deleting the Secret object containing the token.
```
Automatic merge from submit-queue
Do etcd semver validation using posix only
this is a follow-up to https://github.com/kubernetes/kubernetes/pull/44352, can't use sort -V because not everybody has that
```release-note
NONE
```
Automatic merge from submit-queue
local up dns defaults/Privileged defaults so that [Conformance] sets mostly pass on local clusters.
Fixes#43651 So that only 4 tests fail out of the box.
Automatic merge from submit-queue
Adding load balancer src cidrs to GCE cloudprovider
**What this PR does / why we need it**:
As of January 31st, 2018, GCP will be sending health checks and l7 traffic from two CIDRs and legacy health checks from three CIDS. This PR moves them into the cloudprovider package and provides a flag for override.
Another PR will need to be address firewall rule creation for external L4 network loadbalancing #40778
**Which issue this PR fixes**
Step one of #40778
Step one of https://github.com/kubernetes/ingress/issues/197
**Release note**:
```release-note
Add flags to GCE cloud provider to override known L4/L7 proxy & health check source cidrs
```
Automatic merge from submit-queue
Make the dockershim root directory configurable
Make the dockershim root directory configurable so things like
integration tests (e.g. in OpenShift) can run as non-root.
cc @sttts @derekwaynecarr @yujuhong @Random-Liu @kubernetes/sig-node-pr-reviews @kubernetes/rh-cluster-infra
Automatic merge from submit-queue
In 'kubectl describe', find controllers with ControllerRef, instead of showing the original creator
@enisoc @kargakis @kubernetes/sig-apps-pr-reviews @kubernetes/sig-cli-pr-reviews
```release-note
In 'kubectl describe', find controllers with ControllerRef, instead of showing the original creator.
```
Automatic merge from submit-queue
Add support for IP aliases for pod IPs (GCP alpha feature)
```release-note
Adds support for allocation of pod IPs via IP aliases.
# Adds KUBE_GCE_ENABLE_IP_ALIASES flag to the cluster up scripts (`kube-{up,down}.sh`).
KUBE_GCE_ENABLE_IP_ALIASES=true will enable allocation of PodCIDR ips
using the ip alias mechanism rather than using routes. This feature is currently
only available on GCE.
## Usage
$ CLUSTER_IP_RANGE=10.100.0.0/16 KUBE_GCE_ENABLE_IP_ALIASES=true bash -x cluster/kube-up.sh
# Adds CloudAllocator to the node CIDR allocator (kubernetes-controller manager).
If CIDRAllocatorType is set to `CloudCIDRAllocator`, then allocation
of CIDR allocation instead is done by the external cloud provider and
the node controller is only responsible for reflecting the allocation
into the node spec.
- Splits off the rangeAllocator from the cidr_allocator.go file.
- Adds cloudCIDRAllocator, which is used when the cloud provider allocates
the CIDR ranges externally. (GCE support only)
- Updates RBAC permission for node controller to include PATCH
```
Automatic merge from submit-queue
Add tests for semantically equal DaemonSet updates
Tests for #43337, depends on #43337. The last commit is already reviewed in #43337.
@liggitt @kargakis @lukaszo @kubernetes/sig-apps-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 43866, 42748)
hack/cluster: download cfssl if not present
hack/local-up-cluster.sh uses cfssl to generate certificates and
will exit it cfssl is not already installed. But other cluster-up
mechanisms (GCE) that generate certs just download cfssl if not
present. Make local-up-cluster.sh do that too so users don't have
to bother installing it from somewhere.
Automatic merge from submit-queue (batch tested with PRs 43870, 30302, 42722, 43736)
Admission plugin to merge pod and namespace tolerations for restricting pod placement on nodes
```release-note
This admission plugin checks for tolerations on the pod being admitted and its namespace, and verifies if there is any conflict. If there is no conflict, then it merges the pod's namespace tolerations with the the pod's tolerations and it verifies them against its namespace' whitelist of tolerations and returns. If a namespace does not have its default or whitelist tolerations specified, then cluster level default and whitelist is used. An example of its versioned config:
apiVersion: apiserver.k8s.io/v1alpha1
kind: AdmissionConfiguration
plugins:
- name: "PodTolerationRestriction"
configuration:
apiVersion: podtolerationrestriction.admission.k8s.io/v1alpha1
kind: Configuration
default:
- Key: key1
Value: value1
- Key: key2
Value: value2
whitelist:
- Key: key1
Value: value1
- Key: key2
Value: value2
```
Automatic merge from submit-queue (batch tested with PRs 41758, 44137)
Removed hostname/subdomain annotation.
fixes#44135
```release-note
Remove `pod.beta.kubernetes.io/hostname` and `pod.beta.kubernetes.io/subdomain` annotations.
Users should use `pod.spec.hostname` and `pod.spec.subdomain` instead.
```
Automatic merge from submit-queue
Add dockershim only mode
This PR added a `experimental-dockershim` hidden flag in kubelet to run dockershim only.
We introduce this flag mainly for cri validation test. In the future we should compile dockershim into another binary.
@yujuhong @feiskyer @xlgao-zju
/cc @kubernetes/sig-node-pr-reviews
Automatic merge from submit-queue
[Federation] Fix Running service controller in federation kubectl tests
Fixes: #42607
cc @nikhiljindal @kubernetes/sig-federation-bugs
Automatic merge from submit-queue
Scheduler can recieve its policy configuration from a ConfigMap
**What this PR does / why we need it**: This PR adds the ability to scheduler to receive its policy configuration from a ConfigMap. Before this, scheduler could receive its policy config only from a file. The logic to watch the ConfigMap object will be added in a subsequent PR.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```Add the ability to the default scheduler to receive its policy configuration from a ConfigMap object.
```
Automatic merge from submit-queue
Add shashidharatd and madhusudancs as hack/ approvers.
Federation team has a bunch of scripts in `hack/` directory and for iterating quickly we need approvers permission in this directory. Both @shashidharatd and I have made number of changes to the scripts in this directory and generally understand the scripts here well.
**Release note**:
```release-note
NONE
```
/assign @fejta
Automatic merge from submit-queue
Add test for provisioning with storage class
This PR re-introduces e2e test for dynamic provisioning with storage classes.
It adds the same test as it was merged in PR #32485 with an extra patch adding region to AWS calls. It works well on my AWS setup, however I'm using shared company account and I can't run kube-up.sh and run the tests in the "official" way.
@zmerlynn, can you please try to run tests that led to #34961?
@justinsb, you're my AWS guru, would there be a way how to introduce fully initialized AWS cloud provider into e2e test framework? It would simplify everything. GCE has it there, but it's easier to initialize, I guess. See https://github.com/kubernetes/kubernetes/blob/master/test/e2e/pd.go#L486 for example - IMO tests should not talk to AWS directly.
Automatic merge from submit-queue (batch tested with PRs 44191, 44117, 44072)
Default FEDERATION_KUBE_CONTEXT to FEDERATION_NAME in federation e2e up/down scripts.
This is consistent with how kubefed creates kubeconfig contexts.
**Release note**:
-->
```release-note
NONE
```
Automatic merge from submit-queue
leader election lock based on scheduler name
**What this PR does / why we need it**:
This pr changed the leader election lock based on scheduler name.
**Which issue this PR fixes** :
fixes#39032
**Special notes for your reviewer**:
**Release note**:
```
[scheduling]Fix a bug for multiple-schedulers that you cannot start a second scheduler without disabling leader-elect if the default scheduler has leader-elect enabled(default). We changed the leader election lock based on scheduler name.
```
Automatic merge from submit-queue
[Federation] Remove FEDERATIONS_DOMAIN_MAP references
Remove all references to FEDERATIONS_DOMAIN_MAP as this method is no longer is used and is replaced by adding federation domain map to kube-dns configmap.
cc @madhusudancs @kubernetes/sig-federation-pr-reviews
**Release note**:
```
[Federation] Mechanism of adding `federation domain maps` to kube-dns deployment via `--federations` flag is superseded by adding/updating `federations` key in `kube-system/kube-dns` configmap. If user is using kubefed tool to join cluster federation, adding federation domain maps to kube-dns is already taken care by `kubefed join` and does not need further action.
```
```release-note
kube-apiserver: --service-account-lookup now defaults to true. This enables service account tokens to be revoked by deleting the Secret object containing the token.
```
hack/local-up-cluster.sh uses cfssl to generate certificates and
will exit it cfssl is not already installed. But other cluster-up
mechanisms (GCE) that generate certs just download cfssl if not
present. Make local-up-cluster.sh do that too.
Automatic merge from submit-queue
test/e2e_node: prepull images with CRI
Part of https://github.com/kubernetes/kubernetes/issues/40739
- This PR builds on top of #40525 (and contains one commit from #40525)
- The second commit contains a tiny change in the `Makefile`.
- Third commit is a patch to be able to prepull images using the CRI (as opposed to run `docker` to pull images which doesn't make sense if you're using CRI most of the times)
Marked WIP till #40525 makes its way into master
@Random-Liu @lucab @yujuhong @mrunalp @rhatdan
Switch from hard coded "/tmp" to a user specified location for
the logs. Default to "/tmp" to keep the current behavior and
not break any current users or jobs.
Automatic merge from submit-queue
Make RBAC post-start hook conditional on RBAC authorizer being used
Makes the RBAC post-start hook (and reconciliation) conditional on the RBAC authorizer being used
Ensures we don't set up unnecessary objects.
```release-note
RBAC role and rolebinding auto-reconciliation is now performed only when the RBAC authorization mode is enabled.
```
Per Clayton's suggestion, move stuff from cluster/lib/util.sh to
hack/lib/util.sh. Also consolidate ensure-temp-dir and use the
hack/lib/util.sh implementation rather than cluster/common.sh.
Automatic merge from submit-queue (batch tested with PRs 38741, 41301, 43645, 43779, 42337)
If we already have the right version of gazel installed, use it
**What this PR does / why we need it**: makes the bazel scripts marginally faster by only downloading/building gazel when necessary. Also removes the `followRedirects` config setting for gopkg.in which is no longer needed.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Fix the ginkgo nocolor parameter
When running the e2e tests in parallel, the ginkgo nocolor is not
honored and produces a colored output. This change fixes this issue
(#42793).
**What this PR does / why we need it**:
It fixes the ginkgo color parameter when running e2e tests in parallel.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#42793
**Special notes for your reviewer**:
It seems there is a missing bit in ginkgo-e2e.sh so I added it.
**Release note**:
```release-note
```
Automatic merge from submit-queue
addressing issue #39427 adding a flag --output to 'kubectl version'
**What this PR does / why we need it**:
Addressing Issue https://github.com/kubernetes/kubernetes/issues/39427 we all
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#39427
**Release note**:
```
kubectl version has new flag --output (=json or yaml) allowing result of the command to be parsed in either json format or yaml.
```
Automatic merge from submit-queue (batch tested with PRs 40885, 43623, 43735)
Use "hack/godep-restore.sh" instead of "godep restore"
Now we get errors when run "godep restore".
So we need to update the help message.
@derekwaynecarr
**Special notes for your reviewer**:
**Release note**:
```NONE
```
updating with PR changes requested.
latest changes to having short for human readable only, and error cases moved a bit to the end.
rebase fixes
latest pr. changes.
small change moving return nil out of switch.
updated the nil check for the error in the humanreadable case.
more optimization in humanreadable code.
pushed up current test changes, this is purely temporary
finished writing tests
updated test and function names.
changed output extensions from .sh to output.
updated version, version struct now just called Version and not VersionObj.
made a few changes to testing.
fixed testing issues, created better test and cleanup
go format change.
When running the e2e tests in parallel, the ginkgo nocolor is not
honored and produces a colored output. This change adds the
GINKGO_NO_COLOR environment variable.
This PR implements a standard admission plugin initializer for the generic API server.
The initializer accepts external clientset, external informers and the authorizer.
Automatic merge from submit-queue
Add plugin/pkg/scheduler to linted packages
**What this PR does / why we need it**:
Adds plugin/pkg/scheduler to linted packages to improve style correctness.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#41868
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
local-up-cluster.sh should create a default storage class
To make dynamic provisioning working out of the box in local cluster a default
storage class needs to be instantiated.
```release-note
NONE
```
Automatic merge from submit-queue
Ignore GoVersion and GodepVersion in staging-godep diff
**What this PR does / why we need it**:
This modifies the hack/update-staging-godeps.sh script to ignore GoVersion and GodepVersion in the diff it makes, similar to what hack/verify-godeps.sh does
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 43149, 41399, 43154, 43569, 42507)
allow etcd2/3 choice when bringing up a local cluster, default to etcd3
**What this PR does / why we need it**: local-up-cluster currently doesn't allow you to select which etcd version to use, here we allow you to select one, since k8s is moving towards etcd3 we suggest it to be the default.
**Special notes for your reviewer**: Note, i didnt realize this until i had used https://github.com/kubernetes/kubernetes/pull/42656 which made it immediately clear.
**Release note**:
```
NONE
```
Automatic merge from submit-queue (batch tested with PRs 43429, 43416, 43312, 43141, 43421)
add singular resource names to discovery
Adds the singular resource name to our resource for discovery. This is something we've discussed to remove our pseudo-pluralization library which is unreliable even for english and really has no hope of properly handling other languages or variations we can expect from TPRs and aggregated API servers.
This pull simply adds the information to discovery, it doesn't not re-wire any RESTMappers.
@kubernetes/sig-cli-misc @kubernetes/sig-apimachinery-misc @kubernetes/api-review
```release-note
API resource discovery now includes the `singularName` used to refer to the resource.
```
Automatic merge from submit-queue (batch tested with PRs 42998, 42902, 42959, 43020, 42948)
Export godep patch files to artifacts
**What this PR does / why we need it**:
If a godep patch file is created, and a `${WORKSPACE}/_artifacts` directory exists, copy the patch file out to it.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
cc: @fejta
Automatic merge from submit-queue
add local option to APIService
APIServices need an option to avoid proxying in cases where the groupversion is handled later in the chain. This will allow a coherent and complete set of APIServices, but won't require extra connections.
@kubernetes/sig-api-machinery-misc @ncdc @cheftako
Automatic merge from submit-queue (batch tested with PRs 43642, 43170, 41813, 42170, 41581)
Cleanup make test-integration
``make test-integration`` was using the first positional arg passed to ``WHAT`` to filter the list of integration test packages. This PR switches to passing ``WHAT`` verbatim to be consistent with how ``make test`` works. That means the new way to scope execution to a single integration package will be:
```bash
make test-integration WHAT="./test/integration/auth" KUBE_TEST_ARGS="-run=^TestKindAuthorization$"
```
Instead of:
```bash
make test-integration WHAT="auth -test.run=^TestKindAuthorization$"
```
This PR also ensures that the script exits after running a single test case and that etcd cleanup is not done twice at the end of a successful test run. Both were issues encountered while diagnosing the scoping issue.
cc: @thockin @deads2k @stevekuznetsov @ncdc @derekwaynecarr
Automatic merge from submit-queue (batch tested with PRs 43642, 43170, 41813, 42170, 41581)
Be able to specify the timeout to wait for pod for kubectl logs/attach
Fixes https://github.com/kubernetes/kubernetes/issues/41786
current flag is `get-pod-timeout`, we can have a discussion if you have better one, default unit is seconds, above 0
@soltysh @kargakis ptal, thanks
@kubernetes/sig-cli-feature-requests
Automatic merge from submit-queue (batch tested with PRs 41139, 41186, 38882, 37698, 42034)
create configmap from-env-file
Allow ConfigMaps to be created from Docker based env files.
See proposal https://github.com/kubernetes/community/issues/165
**Release-note:**
```release-note
1. create configmap has a new option --from-env-file that populates a configmap from file which follows a key=val format for each line.
2. create secret has a new option --from-env-file that populates a configmap from file which follows a key=val format for each line.
```
Automatic merge from submit-queue
Fix local cluster up script for kube-public namespace error.
Fix local cluster up script for kube-public namespace error:
```
Error from server (AlreadyExists): namespaces "kube-public" already exists
```
Automatic merge from submit-queue
Retry kubectl test replace on conflict
Since kubectl is doing a resource-version-constrained replace, it is subject to conflicts on a contentious resource (like a node managed by the node controller)
Fixes#41892 (the specific flake, not the watch cache issue)
Automatic merge from submit-queue (batch tested with PRs 43180, 42928)
Verify generated protobuf script should fail on staging/ changes too
fixes#35486
Automatic merge from submit-queue (batch tested with PRs 40964, 42967, 43091, 43115)
Update hack scripts to use godep v79 and ensure_godep_version
**What this PR does / why we need it**:
Based on #42965 and https://github.com/kubernetes/kubernetes/pull/42958#discussion_r105568318, this pins the godep version at v79, which should fix some issues when running godep in go1.8 local environments.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#42817
**Special notes for your reviewer**:
This should likely get the v1.6 milestone so that it can be merged into master. While I'm setting a default godep version, I'm continuing to use the local pins per this comment: https://github.com/kubernetes/kubernetes/pull/42965#issuecomment-285962723 .
**Release note**:
```release-note
NONE
```
cc: @sttts
Automatic merge from submit-queue (batch tested with PRs 40404, 43134, 43117)
Improve code coverage for scheduler/api/validation
**What this PR does / why we need it**:
Improve code coverage for scheduler/api/validation from #39559
Thanks
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 43018, 42713, 42819)
Update startup scripts for kube-dns ConfigMap and ServiceAccount
Follow up PR of #42757. This PR changes all existing startup scripts to support default kube-dns ConfigMap and ServiceAccount.
@bowei
cc @liggitt
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 42942, 42935)
[Bug] Handle container restarts and avoid using runtime pod cache while allocating GPUs
Fixes#42412
**Background**
Support for multiple GPUs is an experimental feature in v1.6.
Container restarts were handled incorrectly which resulted in stranding of GPUs
Kubelet is incorrectly using runtime cache to track running pods which can result in race conditions (as it did in other parts of kubelet). This can result in same GPU being assigned to multiple pods.
**What does this PR do**
This PR tracks assignment of GPUs to containers and returns pre-allocated GPUs instead of (incorrectly) allocating new GPUs.
GPU manager is updated to consume a list of active pods derived from apiserver cache instead of runtime cache.
Node e2e has been extended to validate this failure scenario.
**Risk**
Minimal/None since support for GPUs is an experimental feature that is turned off by default. The code is also isolated to GPU manager in kubelet.
**Workarounds**
In the absence of this PR, users can mitigate the original issue by setting `RestartPolicyNever` in their pods.
There is no workaround for the race condition caused by using the runtime cache though.
Hence it is worth including this fix in v1.6.0.
cc @jianzhangbjz @seelam @kubernetes/sig-node-pr-reviews
Replaces #42560