Automatic merge from submit-queue (batch tested with PRs 57700, 59954). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Index PVs by StorageClass in assume cache
**What this PR does / why we need it**:
Performance optimization for delayed binding in the scheduler to only search for PVs with a matching StorageClass name. This means that if you prebind the PV to a PVC, the PV must have a matching StorageClass name. This behavior is different from when you prebind with immediate binding, which doesn't care about StorageClass.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#56102
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Improve the error message.
**What this PR does / why we need it**:
Makes the error message more descriptive and less scary. Previously it
is far from obvious whether connection kill is a symptom or cause of the
problem, see for example https://github.com/kubernetes/kubernetes/issues/55779#issuecomment-353582852
In paricular the crucial missing piece of information is that this is a
way of handling a timeout.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
csi: Remove stale volume path
**What this PR does / why we need it**:
The CSI mounter creates the following paths during SetUp():
* .../pods/\<podID\>/volumes/kubernetes.io~csi/\<specVolId\>/mount/
* .../pods/\<podID\>/volumes/kubernetes.io~csi/\<specVolId\>/volume_data.json
During TearDown(), it does not remove the `.../kubernetes.io~csi/<specVolId>/`
directory, leaving behind orphan volumes: method cleanupOrphanedPodDirs()
complains with 'Orphaned pod found, but volume paths are still present
on disk'.
Fix that by removing the above directory in removeMountDir().
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
add an admission decorator chain
Admission decorators are good wrappers for general function, but we logically need a chain of them. This builds a chain similar to admission.
/assign @sttts
@kubernetes/sig-api-machinery-pr-reviews
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add jsafrane as AWS approver.
**What this PR does / why we need it**:
I contrinbuted several PRs in AWS storage and I'm willing to share review/approval duty.
**Release note**:
```release-note
NONE
```
/assign @justinsb
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add AWS cloud provider option for IAM role
**What this PR does / why we need it**:
Adds the option to provide an IAM role ARN in the AWS cloud provider config file that should be assumed when communicating with the AWS APIs.
For example, this allows running Controller Manager in a account separate from the worker nodes, but still allows all resources created to interact with the workers. ELBs created would be in the same account as the worker nodes for instance.
**Which issue(s) this PR fixes** *(optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged)*:
Fixes#59526
**Special notes for your reviewer**:
None
**Release note**:
```release-note
Add AWS cloud provider option to use an assumed IAM role
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix cluster autoscaler test to support regional clusters.
**What this PR does / why we need it**:
Fixes cluster autoscaler e2e tests to work with regional clusters.
**Release note**:
```NONE```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
use prometheus-to-sd 0.2.4 and fluentd-gcp-image 2.0.16
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
cc @tallclair
Automatic merge from submit-queue (batch tested with PRs 59809, 59955). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Updating kubemci e2e test to not add kubeconfig flag for get-status
Follow up to https://github.com/kubernetes/kubernetes/pull/59234
Updating RunKubemciCmd to not add the --kubeconfig flag and adding a RunKubemciWithKubeconfig method that adds the kubeconfig param before calling RunKubemciCmd
And Updating get-status to use RunKubemciCmd instead of RunKubemciWithKubeconfig.
```release-note
NONE
```
cc @MrHohn @G-Harmon @madhusudancs
Automatic merge from submit-queue (batch tested with PRs 59809, 59955). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubectl port-forward should resolve service port to target port
**What this PR does / why we need it**:
Continues on the work in #59705, this PR adds additional support for looking up targetPort for a service, as well as enable using svc/name to select a pod.
**Which issue(s) this PR fixes**:
Fixes#15180Fixes#59733
**Special notes for your reviewer**:
I decided to create pkg/kubectl/util/service_port.go to contain two functions that might be re-usable.
**Release note**:
```release-note
`kubectl port-forward` now supports specifying a service to port forward to: `kubectl port-forward svc/myservice 8443:443`
```
Currently the AWS cloud provider uses the EC2 instance role when
interacting with AWS APIs. This change gives the option to provide and IAM
role that the cloud provider will assume before calling the APIs. All
resources created by the role will be owned by that account instead of
the account where the EC2 instance is running.
Automatic merge from submit-queue (batch tested with PRs 59873, 59933, 59923, 59944, 59953). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix pod scheduled.
Fix `PodScheduled` condition.
The test `[k8s.io] EquivalenceCache [Serial] validates pod affinity works properly when new replica pod is scheduled` for cri-containerd is flaky.
The reason is that it assume all existing pods should have `PodScheduled` condition, but it is not the case:
```
Feb 15 15:31:01.359: INFO: with-label-390d246e-1265-11e8-beb8-0a580a3c7b55 bootstrap-e2e-minion-group-l6qw Running [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2018-02-15 15:30:59 +0000 UTC } {Ready True 0001-01-01 00:00:00 +0000 UTC 2018-02-15 15:31:00 +0000 UTC } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2018-02-15 15:30:59 +0000 UTC }]
Feb 15 15:31:01.359: INFO: calico-node-7mzxc bootstrap-e2e-minion-group-hztx Running [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2018-02-15 14:17:05 +0000 UTC } {Ready True 0001-01-01 00:00:00 +0000 UTC 2018-02-15 14:17:59 +0000 UTC }]
Feb 15 15:31:01.359: INFO: calico-node-kvrsx bootstrap-e2e-minion-group-l6qw Running [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2018-02-15 15:24:54 +0000 UTC } {Ready True 0001-01-01 00:00:00 +0000 UTC 2018-02-15 15:25:20 +0000 UTC }]
Feb 15 15:31:01.359: INFO: calico-node-llwjh
```
I'm not sure why this doesn't happen to docker. One theory is that we don't prepull image in cri-containerd, and we do start pod a bit faster for cri-containerd, and that exposes the race condition.
/cc @kubernetes/sig-node-bugs
Signed-off-by: Lantao Liu <lantaol@google.com>
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
none
```
Automatic merge from submit-queue (batch tested with PRs 59873, 59933, 59923, 59944, 59953). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Update reviewers for sig-scheduling.
@bsalamat @timothysc @kubernetes/sig-scheduling-misc
**Release note**:
```release-note
None
```
Automatic merge from submit-queue (batch tested with PRs 59873, 59933, 59923, 59944, 59953). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Rework volume manager log levels
- all normal logs to go to level 4
- too frequent / duplicate logs go to level 5 (e.g. when something else logged similar message not too far away).
I checked that there is no excessive spam in the log - reconciler runs every 100ms, but it does not log anything if there is nothing to do.
**What this PR does / why we need it**:
This will help us debug flakes. E2e tests do not log levels 10-12 used in volume manager
**Release note**:
```release-note
NONE
```
/sig storage
/sig node
cc: @jingxu97 @sjenning
Automatic merge from submit-queue (batch tested with PRs 59873, 59933, 59923, 59944, 59953). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix DownwardAPI refresh race.
WaitForAttachAndMount should mark only pod in DesiredStateOfWorldPopulator (DSWP) and DSWP should mark the volume to be remounted only when the new pod has been processed.
Otherwise DSWP and reconciler race who gets the new pod first. If it's reconciler, then DownwardAPI and Projected volumes of the pod are not refreshed with new content and they are updated after the next periodic sync (60-90 seconds).
Fixes#59813
/assign @jingxu97 @saad-ali
/sig storage
/sig node
```release-note
None
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix kubelet PVC stale metrics
**What this PR does / why we need it**:
Volumes on each node changes, we should not only add PVC metrics into
gauge vector. It's better use a collector to collector metrics from internal
stats.
Currently, if a PV (bound to a PVC `testpv`) is attached and used by node A, then migrated to node B or just deleted from node A later. `testpvc` metrics will not disappear from kubelet on node A. After a long running time, `kubelet` process will keep a lot of stale volume metrics in memory.
For these dynamic metrics, it's better to use a collector to collect metrics from a data source (`StatsProvider` here), like [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics) scraping metrics from kube-apiserver.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/kubernetes/issues/57686
**Special notes for your reviewer**:
**Release note**:
```release-note
Fix kubelet PVC stale metrics
```
Automatic merge from submit-queue (batch tested with PRs 59939, 59830). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Azure - ARM Read/Write rate limiting
**What this PR does / why we need it**:
Azure cloud provider currently runs with:
1. Single ARM rate limiter for both `read [put/post/delete]` and `write` operations, while ARM provide [different rates for read/write] (https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-manager-request-limits). This causes write operation to stop even if there is available write request quotas.
2. Cloud provider uses rate limiter's `Accept()` instead of `TryAccept()` This causes control loop to wait for prolonged tike `in case of no request quota available` for **all** requests even for those does not require ARM interaction. A case for that the `Service` control loop will wait for a prolonged time trying to create `LoadBalancer` service even though it can fail and work on the next service which is `ClusterIP`. This PR moves cloud provider tp `TryAccept()`
**Which issue(s) this PR fixes**:
Fixes # https://github.com/kubernetes/kubernetes/issues/58770
**Special notes for your reviewer**:
`n/a`
**Release note**:
```release-note
- Separate current ARM rate limiter into read/write
- Improve control over how ARM rate limiter is used within Azure cloud provider
```
cc @jackfrancis (need your help carefully reviewing this one) @brendanburns @jdumars
Automatic merge from submit-queue (batch tested with PRs 59939, 59830). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Avoid call to get cloud instances
**What this PR does / why we need it**:
if a node does not have the taint, we really don't need to make calls
to get the list of instances from the cloud provider
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
Found when reviewing code for #59887
**Release note**:
```release-note
NONE
```
With d7ddcca231, we lost the logging
of the flags. We should at least log what the command line flags
were used to start processes as those incredibly useful for trouble shooting.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add a reviewer to addon-manager
**What this PR does / why we need it**:
Would like to keep an eye on this until it goes away.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #NONE
**Special notes for your reviewer**:
/assign @mikedanese
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Process existing cloud nodes in CCM
**What this PR does / why we need it**:
This is a timing issue. If kubelet(s) get started before the CCM is
started, the shared informer event handler does not process them at
all. So we should loop through these before. We run this in a
go wait.Until loop to tolerate errors when listing the nodes and
giving an opportunity for any scripts that may need to setup RBAC
roles etc.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#58613
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 59353, 59905, 53833). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Graduate kubeletconfig API group to beta
Regarding https://github.com/kubernetes/features/issues/281, this PR moves the kubeletconfig API group to beta.
After #53088, the KubeletConfiguration type should not contain any deprecated or experimental fields, and we should not have to remove any more fields from the type before graduating it to beta.
We need the community to double check for two things, however:
1. Are there any fields currently in the KubeletConfiguration type that you were going to mark deprecated this quarter, but haven't yet?
2. Are there any fields currently in the KubeletConfiguration type that are experimental or alpha, but were not explicitly denoted as such?
Please comment on this PR if you can answer "yes" to either of those two questions. Please cc anyone with a stake in the kubeletconfig API, so we get as much coverage as possible.
/cc @thockin @dchen1107 @Random-Liu @yujuhong @dashpole @tallclair @vishh @abw @freehan @dnardo @bowei @MrHohn @luxas @liggitt @ncdc @derekwaynecarr @mikedanese
@kubernetes/sig-network-pr-reviews, @kubernetes/sig-node-pr-reviews
```release-note
action required: The `kubeletconfig` API group has graduated from alpha to beta, and the name has changed to `kubelet.config.k8s.io`. Please use `kubelet.config.k8s.io/v1beta1`, as `kubeletconfig/v1alpha1` is no longer available.
```
**TODO:**
- [x] Move experimental/non-gated-alpha/soon-to-be-deprecated fields to `KubeletFlags`
- [x] #53088
- [x] #54154
- [x] #54160
- [x] #55562
- [x] #55983
- [x] #57851
- [x] Lift embedded structure out of strings
- [x] #53025
- [x] #54643
- [x] #54823
- [x] #55254
- [x] Resolve relative paths against the location config files are loaded from
- [x] #55648
- [x] Rename to `kubelet.config.k8s.io`
- [x] Comments
- [x] Make sure existing comments at least read sensibly.
- [x] Note default values in comments on the versioned struct.
- [x] Remove any reference to default values in comments on the internal struct.
- [x] Most fields should be `+optional` and `omitempty`. Add where necessary. ~Where omitted, explicitly comment.~ Edit: We should not distinguish between nil and empty, see below items.
- [x] Ensure defaults are specified via `pkg/kubelet/apis/kubelet.config.k8s.io/v1beta1/defaults.go`, not `cmd/kubelet/app/options/options.go`.
- [x] #57770
- [x] Ensure kubeadm does not persist v1alpha1 KubeletConfiguration objects (or feature-gates this functionality)
- [x] Don't make a distinction between empty and nil, because of #43203.
- [x] #59515
- [x] #59681
- [x] Take the opportunity to fix insecure Kubelet defaults @tallclair
- [x] #59666
- [x] Remove CAdvisorPort from KubeletConfiguration wrt #56523.
- [x] #59580
- [x] Hide `ConfigTrialDuration` until we're more sure what to do with it.
- [x] #59628
- [x] Fix `// default: x` comments after rebasing on recent changes.
Automatic merge from submit-queue (batch tested with PRs 59353, 59905, 53833). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Rename ConfigOK to KubeletConfigOk
This is a more accurate name for the condition, as it describes the
status of the Kubelet's configuration.
Also cleans up capitalization of internal names.
```release-note
The ConfigOK node condition has been renamed to KubeletConfigOk.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
add node shutdown taint
**What this PR does / why we need it**: we need node stopped taint in order to detach volumes immediately without waiting timeout. More info in issue ticket #58635
**Which issue(s) this PR fixes**
Fixes#58635
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```