Automatic merge from submit-queue (batch tested with PRs 59873, 59933, 59923, 59944, 59953). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Rework volume manager log levels
- all normal logs to go to level 4
- too frequent / duplicate logs go to level 5 (e.g. when something else logged similar message not too far away).
I checked that there is no excessive spam in the log - reconciler runs every 100ms, but it does not log anything if there is nothing to do.
**What this PR does / why we need it**:
This will help us debug flakes. E2e tests do not log levels 10-12 used in volume manager
**Release note**:
```release-note
NONE
```
/sig storage
/sig node
cc: @jingxu97 @sjenning
Automatic merge from submit-queue (batch tested with PRs 59873, 59933, 59923, 59944, 59953). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix DownwardAPI refresh race.
WaitForAttachAndMount should mark only pod in DesiredStateOfWorldPopulator (DSWP) and DSWP should mark the volume to be remounted only when the new pod has been processed.
Otherwise DSWP and reconciler race who gets the new pod first. If it's reconciler, then DownwardAPI and Projected volumes of the pod are not refreshed with new content and they are updated after the next periodic sync (60-90 seconds).
Fixes#59813
/assign @jingxu97 @saad-ali
/sig storage
/sig node
```release-note
None
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix kubelet PVC stale metrics
**What this PR does / why we need it**:
Volumes on each node changes, we should not only add PVC metrics into
gauge vector. It's better use a collector to collector metrics from internal
stats.
Currently, if a PV (bound to a PVC `testpv`) is attached and used by node A, then migrated to node B or just deleted from node A later. `testpvc` metrics will not disappear from kubelet on node A. After a long running time, `kubelet` process will keep a lot of stale volume metrics in memory.
For these dynamic metrics, it's better to use a collector to collect metrics from a data source (`StatsProvider` here), like [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics) scraping metrics from kube-apiserver.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/kubernetes/issues/57686
**Special notes for your reviewer**:
**Release note**:
```release-note
Fix kubelet PVC stale metrics
```
Automatic merge from submit-queue (batch tested with PRs 59939, 59830). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Azure - ARM Read/Write rate limiting
**What this PR does / why we need it**:
Azure cloud provider currently runs with:
1. Single ARM rate limiter for both `read [put/post/delete]` and `write` operations, while ARM provide [different rates for read/write] (https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-manager-request-limits). This causes write operation to stop even if there is available write request quotas.
2. Cloud provider uses rate limiter's `Accept()` instead of `TryAccept()` This causes control loop to wait for prolonged tike `in case of no request quota available` for **all** requests even for those does not require ARM interaction. A case for that the `Service` control loop will wait for a prolonged time trying to create `LoadBalancer` service even though it can fail and work on the next service which is `ClusterIP`. This PR moves cloud provider tp `TryAccept()`
**Which issue(s) this PR fixes**:
Fixes # https://github.com/kubernetes/kubernetes/issues/58770
**Special notes for your reviewer**:
`n/a`
**Release note**:
```release-note
- Separate current ARM rate limiter into read/write
- Improve control over how ARM rate limiter is used within Azure cloud provider
```
cc @jackfrancis (need your help carefully reviewing this one) @brendanburns @jdumars
Automatic merge from submit-queue (batch tested with PRs 59939, 59830). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Avoid call to get cloud instances
**What this PR does / why we need it**:
if a node does not have the taint, we really don't need to make calls
to get the list of instances from the cloud provider
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
Found when reviewing code for #59887
**Release note**:
```release-note
NONE
```
With d7ddcca231, we lost the logging
of the flags. We should at least log what the command line flags
were used to start processes as those incredibly useful for trouble shooting.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add a reviewer to addon-manager
**What this PR does / why we need it**:
Would like to keep an eye on this until it goes away.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #NONE
**Special notes for your reviewer**:
/assign @mikedanese
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Process existing cloud nodes in CCM
**What this PR does / why we need it**:
This is a timing issue. If kubelet(s) get started before the CCM is
started, the shared informer event handler does not process them at
all. So we should loop through these before. We run this in a
go wait.Until loop to tolerate errors when listing the nodes and
giving an opportunity for any scripts that may need to setup RBAC
roles etc.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#58613
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 59353, 59905, 53833). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Graduate kubeletconfig API group to beta
Regarding https://github.com/kubernetes/features/issues/281, this PR moves the kubeletconfig API group to beta.
After #53088, the KubeletConfiguration type should not contain any deprecated or experimental fields, and we should not have to remove any more fields from the type before graduating it to beta.
We need the community to double check for two things, however:
1. Are there any fields currently in the KubeletConfiguration type that you were going to mark deprecated this quarter, but haven't yet?
2. Are there any fields currently in the KubeletConfiguration type that are experimental or alpha, but were not explicitly denoted as such?
Please comment on this PR if you can answer "yes" to either of those two questions. Please cc anyone with a stake in the kubeletconfig API, so we get as much coverage as possible.
/cc @thockin @dchen1107 @Random-Liu @yujuhong @dashpole @tallclair @vishh @abw @freehan @dnardo @bowei @MrHohn @luxas @liggitt @ncdc @derekwaynecarr @mikedanese
@kubernetes/sig-network-pr-reviews, @kubernetes/sig-node-pr-reviews
```release-note
action required: The `kubeletconfig` API group has graduated from alpha to beta, and the name has changed to `kubelet.config.k8s.io`. Please use `kubelet.config.k8s.io/v1beta1`, as `kubeletconfig/v1alpha1` is no longer available.
```
**TODO:**
- [x] Move experimental/non-gated-alpha/soon-to-be-deprecated fields to `KubeletFlags`
- [x] #53088
- [x] #54154
- [x] #54160
- [x] #55562
- [x] #55983
- [x] #57851
- [x] Lift embedded structure out of strings
- [x] #53025
- [x] #54643
- [x] #54823
- [x] #55254
- [x] Resolve relative paths against the location config files are loaded from
- [x] #55648
- [x] Rename to `kubelet.config.k8s.io`
- [x] Comments
- [x] Make sure existing comments at least read sensibly.
- [x] Note default values in comments on the versioned struct.
- [x] Remove any reference to default values in comments on the internal struct.
- [x] Most fields should be `+optional` and `omitempty`. Add where necessary. ~Where omitted, explicitly comment.~ Edit: We should not distinguish between nil and empty, see below items.
- [x] Ensure defaults are specified via `pkg/kubelet/apis/kubelet.config.k8s.io/v1beta1/defaults.go`, not `cmd/kubelet/app/options/options.go`.
- [x] #57770
- [x] Ensure kubeadm does not persist v1alpha1 KubeletConfiguration objects (or feature-gates this functionality)
- [x] Don't make a distinction between empty and nil, because of #43203.
- [x] #59515
- [x] #59681
- [x] Take the opportunity to fix insecure Kubelet defaults @tallclair
- [x] #59666
- [x] Remove CAdvisorPort from KubeletConfiguration wrt #56523.
- [x] #59580
- [x] Hide `ConfigTrialDuration` until we're more sure what to do with it.
- [x] #59628
- [x] Fix `// default: x` comments after rebasing on recent changes.
Automatic merge from submit-queue (batch tested with PRs 59353, 59905, 53833). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Rename ConfigOK to KubeletConfigOk
This is a more accurate name for the condition, as it describes the
status of the Kubelet's configuration.
Also cleans up capitalization of internal names.
```release-note
The ConfigOK node condition has been renamed to KubeletConfigOk.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
add node shutdown taint
**What this PR does / why we need it**: we need node stopped taint in order to detach volumes immediately without waiting timeout. More info in issue ticket #58635
**Which issue(s) this PR fixes**
Fixes#58635
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Task 0: Added Alpha flag for NoDaemonSetScheduler feature.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Part of #59194
**Release note**:
```release-note
None
```
Automatic merge from submit-queue (batch tested with PRs 56394, 59921). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fix fluentd-gcp-scaler to look at correct fluentd-gcp version
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
/assign @crassirostris
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Try longer to fetch initial token.
**What this PR does / why we need it**:
Step towards fixing #56293
**Special notes for your reviewer**:
/kind bug
/priority critial-urgent
@kubernetes/sig-scalability-bugs
/cc @shyamjvs please add to v1.9
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Store labels and fields with object
We are already computing labels and fields before putting objects in watchcache.
And my tests show this is `PodToSelectableFields` is responsible for ~10% of memory allocations.
This PR is supposed to fix that - let's double check by running kubemark-big on it.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Enable scaling fluentd-gcp resources using ScalingPolicy.
See https://github.com/justinsb/scaler for more details about ScalingPolicy resource.
**What this PR does / why we need it**:
This is adding a way to override fluentd-gcp resources in a running cluster. The resources syncing for fluentd-gcp is decoupled from addon manager.
**Special notes for your reviewer**:
**Release note**:
```release-note
fluentd-gcp resources can be modified via a ScalingPolicy
```
cc @kawych @justinsb
The CSI mounter creates the following paths during SetUp():
* .../pods/<podID>/volumes/kubernetes.io~csi/<specVolId>/mount/
* .../pods/<podID>/volumes/kubernetes.io~csi/<specVolId>/volume_data.json
During TearDown(), it does not remove the `.../kubernetes.io~csi/<specVolId>/`
directory, leaving behind orphan volumes: method cleanupOrphanedPodDirs()
complains with 'Orphaned pod found, but volume paths are still present
on disk'.
Fix that by removing the above directory in removeMountDir().
Signed-off-by: Ilias Tsitsimpis <iliastsi@arrikto.com>
Existing nodes are sent via update and not via the add function,
so let's add an UpdateCloudNode and just forward it to the
AddCloudNode. This works fine as all we do is look for the cloud
taint and bail out if it is not present.
Automatic merge from submit-queue (batch tested with PRs 59877, 59886, 59892). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubelet: revert the status HostIP behavior
**What this PR does / why we need it**:
This PR partially revert #57106 to fix#59889.
The PR #57106 changed the behavior of `generateAPIPodStatus` when a **kubeClient** is nil.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 59877, 59886, 59892). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Partial revert of #59797
**What this PR does / why we need it**:
Need to do a partial revert of #59797. This PR introduced two problems:
- `INTERNAL_DIRS` has an unbound variable error
- The script introduces the use of mapfile which doesn't exist in bash versions prior to bash 4.x. Darwin (OS X) uses bash 3.x by default.
**Release note**:
```release-note
NONE
```