Automatic merge from submit-queue (batch tested with PRs 65052, 65594). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Derive kubelet serving certificate CSR template from node status addresses
xref https://github.com/kubernetes/features/issues/267fixes#55633
Builds on https://github.com/kubernetes/kubernetes/pull/65587
* Makes the cloud provider authoritative when recording node status addresses
* Makes the node status addresses authoritative for the kube-apiserver determining how to speak to a kubelet (stops paying attention to the hostname label when determining how to reach a kubelet, which was only done to support kubelets < 1.5)
* Updates kubelet certificate rotation to be driven from node status
* Avoids needing to compute node addresses a second time, and differently, in order to request serving certificates.
* Allows the kubelet to react to changes in its status addresses by updating its serving certificate
* Allows the kubelet to be driven by external cloud providers recording node addresses on the node status
test procedure:
```sh
# setup
export FEATURE_GATES=RotateKubeletServerCertificate=true
export KUBELET_FLAGS="--rotate-server-certificates=true --cloud-provider=external"
# cleanup from previous runs
sudo rm -fr /var/lib/kubelet/pki/
# startup
hack/local-up-cluster.sh
# wait for a node to register, verify it didn't set addresses
kubectl get nodes
kubectl get node/127.0.0.1 -o jsonpath={.status.addresses}
# verify the kubelet server isn't available, and that it didn't populate a serving certificate
curl --cacert _output/certs/server-ca.crt -v https://localhost:10250/pods
ls -la /var/lib/kubelet/pki
# set an address on the node
curl -X PATCH http://localhost:8080/api/v1/nodes/127.0.0.1/status \
-H "Content-Type: application/merge-patch+json" \
--data '{"status":{"addresses":[{"type":"Hostname","address":"localhost"}]}}'
# verify a csr was submitted with the right SAN, and approve it
kubectl describe csr
kubectl certificate approve csr-...
# verify the kubelet connection uses a cert that is properly signed and valid for the specified hostname, but NOT the IP
curl --cacert _output/certs/server-ca.crt -v https://localhost:10250/pods
curl --cacert _output/certs/server-ca.crt -v https://127.0.0.1:10250/pods
ls -la /var/lib/kubelet/pki
# set an hostname and IP address on the node
curl -X PATCH http://localhost:8080/api/v1/nodes/127.0.0.1/status \
-H "Content-Type: application/merge-patch+json" \
--data '{"status":{"addresses":[{"type":"Hostname","address":"localhost"},{"type":"InternalIP","address":"127.0.0.1"}]}}'
# verify a csr was submitted with the right SAN, and approve it
kubectl describe csr
kubectl certificate approve csr-...
# verify the kubelet connection uses a cert that is properly signed and valid for the specified hostname AND IP
curl --cacert _output/certs/server-ca.crt -v https://localhost:10250/pods
curl --cacert _output/certs/server-ca.crt -v https://127.0.0.1:10250/pods
ls -la /var/lib/kubelet/pki
```
```release-note
* kubelets that specify `--cloud-provider` now only report addresses in Node status as determined by the cloud provider
* kubelet serving certificate rotation now reacts to changes in reported node addresses, and will request certificates for addresses set by an external cloud provider
```
Automatic merge from submit-queue (batch tested with PRs 66076, 65792, 65649). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubernetes: fix printf format errors
These are all flagged by Go 1.11's
more accurate printf checking in go vet,
which runs as part of go test.
```release-note
NONE
```
These are all flagged by Go 1.11's
more accurate printf checking in go vet,
which runs as part of go test.
Lubomir I. Ivanov <neolit123@gmail.com>
applied ammend for:
pkg/cloudprovider/provivers/vsphere/nodemanager.go
Automatic merge from submit-queue (batch tested with PRs 65987, 65962). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix pod worker deadlock.
Preemption will stuck forever if `killPodNow` timeout once. The sequence is:
* `killPodNow` create the response channel (size 0) and send it to pod worker.
* `killPodNow` timeout and return.
* Pod worker finishes killing the pod, and tries to send back response via the channel.
However, because the channel size is 0, and the receiver has exited, the pod worker will stuck forever.
In @jingxu97's case, this causes a critical system pod (apiserver) unable to come up, because the csi pod can't be preempted.
I checked the history, and the bug was introduced 2 years ago 6fefb428c1.
I think we should at least cherrypick this to `1.11` since preemption is beta and enabled by default in 1.11.
@kubernetes/sig-node-bugs @derekwaynecarr @dashpole @yujuhong
Signed-off-by: Lantao Liu <lantaol@google.com>
```release-note
none
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Store the latest cloud provider node addresses
**What this PR does / why we need it**:
Buffer the recently retrieved node address so they can be used as soon as the next node status update is run.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#65814
**Special notes for your reviewer**:
**Release note**:
```release-note
None
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Adding traffic shaping support for CNI network driver
**What this PR does / why we need it**:
Adding traffic shaping support for CNI network driver - it's also a sub-task of kubenet deprecation work.
Design document is available here: https://github.com/kubernetes/community/pull/1893
**Which issue(s) this PR fixes**:
Fixes #
**Special notes for your reviewer**:
/cc @freehan @jingax10 @caseydavenport @dcbw
/sig network
/sig node
**Release note**:
```release-note
Support traffic shaping for CNI network driver
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Remove --cadvisor-port - has been deprecated since v1.10
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#56523
**Special notes for your reviewer**:
- Deprecated in https://github.com/kubernetes/kubernetes/pull/59827 (v1.10)
- Disabled in https://github.com/kubernetes/kubernetes/pull/63881 (v1.11)
**Release note**:
```release-note
[action required] The formerly publicly-available cAdvisor web UI that the kubelet started using `--cadvisor-port` is now entirely removed in 1.12. The recommended way to run cAdvisor if you still need it, is via a DaemonSet.
```
Automatic merge from submit-queue (batch tested with PRs 65582, 65480, 65310, 65644, 65645). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix test failure of truncated time
**What this PR does / why we need it**:
The test of `TestFsStoreAssignedModified` in `pkg/kubelet/kubeletconfig/checkpoint/store` fails in my environment like below.
```
$ make test WHAT=./pkg/kubelet/kubeletconfig/checkpoint/store/
Running tests for APIVersion: v1,admissionregistration.k8s.io/v1alpha1,admissionregistration.k8s.io/v1beta1,admission.k8s.io/v1beta1,apps/v1beta1,apps/v1beta2,apps/v1,authentication.k8s.io/v1,authentication.k8s.io/v1beta1,authorization.k8s.io/v1,authorization.k8s.io/v1beta1,autoscaling/v1,autoscaling/v2beta1,batch/v1,batch/v1beta1,batch/v2alpha1,certificates.k8s.io/v1beta1,coordination.k8s.io/v1beta1,extensions/v1beta1,events.k8s.io/v1beta1,imagepolicy.k8s.io/v1alpha1,networking.k8s.io/v1,policy/v1beta1,rbac.authorization.k8s.io/v1,rbac.authorization.k8s.io/v1beta1,rbac.authorization.k8s.io/v1alpha1,scheduling.k8s.io/v1alpha1,scheduling.k8s.io/v1beta1,settings.k8s.io/v1alpha1,storage.k8s.io/v1beta1,storage.k8s.io/v1,storage.k8s.io/v1alpha1,
+++ [0628 22:53:39] Running tests without code coverage
--- FAIL: TestFsStoreAssignedModified (0.00s)
fsstore_test.go:316: expect "2018-06-28T22:53:43+09:00" but got "2018-06-28T22:53:43+09:00"
FAIL
FAIL k8s.io/kubernetes/pkg/kubelet/kubeletconfig/checkpoint/store 0.236s
make: *** [test] Error 1
```
My environment is
OS: macOS Sierra Version 10.12.6
File System: Journaled HFS+
The error message confused me because the comparing times looked the same in the error log. If we know certain systems truncate times, I think we can just compare less precise times to avoid confusions in tests.
**Special notes for your reviewer**:
N/A
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 60150, 65467, 65487, 65595, 65374). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubelet: feature gate LSI capacity calculation
Currently if `cm.cadvisorInterface.RootFsInfo()` fails, the whole kubelet bails. If `/var/lib/kubelet` is on a tmpfs or bindmount, this can happen (this is the case for some of our CI envs https://github.com/openshift/origin/issues/19948).
We would be able to workaround this, in the short term, by disabling the LSI feature gate if the capacity calculate was protected by the gate, but currently it isn't.
This PR adds the gate check around setting the ephemeral storage capacity.
@liggitt @derekwaynecarr @dashpole
It might be a different discussion about whether or not this should be fatal. If it isn't fatal, seems that it would just prevent pods that had a ephemeral storage request from being scheduled.
/sig node
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Revert "certs: only append locally discovered addresses when we got none from the cloudprovider"
This reverts commit 7354bbe5ac.
https://github.com/kubernetes/kubernetes/pull/61869 caused a mismatch between the requested CSR and the addresses in node status.
Instead of computing addresses in two places, the cert manager should derive its CSR request from the addresses in node status. This would enable the kubelet to react to address changes, as well as be driven by an external cloud provider.
/cc @mikedanese
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add support for plugin directory hierarchy
**What this PR does / why we need it**:
Add hierarchy support for plugin directory, it traverses and
watch plugin directory and its sub directory recursively.
plugin socket file only need be unique within one directory,
```
plugin socket directory
|
---->sub directory 1
| |
| -----> socket1, socket2 ...
----->sub directory 2
|
------> socket1, socket2 ...
```
the design itself allow sub directory be anything,
but in practical, each plugin type could just use one sub directory.
**Which issue(s) this PR fixes**:
Fixes#64003
**Special notes for your reviewer**:
twos bonus changes added as below
1) propose to let pluginWatcher bookkeeping registered plugins,
to make sure plugin name is unique within one plugin type.
arguably, we could let each handler do the same work, but it requires
every handler repeat the same thing.
2) extract example handler out from test, it is easier to read the code with the
seperation.
**Release note**:
```release-note
N/A
```
/sig node
/cc @vikaschoudhary16 @jiayingz @RenaudWasTaken @vishh @derekwaynecarr @saad-ali @vladimirvivien @dchen1107 @yujuhong @tallclair @Random-Liu @anfernee @akutz
Automatic merge from submit-queue (batch tested with PRs 65453, 65523, 65513, 65560). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Cleanup verbose cAdvisor mocking in Kubelet unit tests
These tests had a lot of duplicate code to set up the cAdvisor mock, but weren't really depending on the mock functionality. By moving the tests to use the fake cAdvisor, most of the setup can be cleaned up.
/kind cleanup
/sig node
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 59214, 65330). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Migrate cpumanager to use checkpointing manager
**What this PR does / why we need it**:
This PR migrates `cpumanager` to use new kubelet level node checkpointing feature (#56040) to decrease code redundancy and improve consistency.
**Which issue(s) this PR fixes**:
Fixes#58339
**Notes**:
At point of submitting PR the most straightforward approach was used - `state_checkpoint` implementation of `State` interface was added. However, with checkpointing implementation there might be no point to keep `State` interface and just use single implementation with checkpoint backend and in case of different backend than filestore needed just supply `cpumanager` with custom `CheckpointManager` implementation.
/kind feature
/sig node
cc @flyingcougar @ConnorDoyle
it traverses and watch plugin directory and its sub directory recursively,
plugin socket file only need be unique within one directory,
- plugin socket directory
- |
- ---->sub directory 1
- | |
- | -----> socket1, socket2 ...
- ----->sub directory 2
- |
- ------> socket1, socket2 ...
the design itself allow sub directory be anything,
but in practical, each plugin type could just use one sub directory.
four bonus changes added as below
1. extract example handler out from test, it is easier to read the code
with the seperation.
2. there are two variables here: "Watcher" and "watcher".
"Watcher" is the plugin watcher, and "watcher" is the fsnotify watcher.
so rename the "watcher" to "fsWatcher" to make code easier to
understand.
3. change RegisterCallbackFn() return value order, it is
conventional to return error last, after this change,
the pkg/volume/csi is compliance with golint, so remove it
from hack/.golint_failures
4. refactor errors handling at invokeRegistrationCallbackAtHandler()
to make error message more clear.
Automatic merge from submit-queue (batch tested with PRs 61330, 64793, 64675, 65059, 65368). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fixes data races for pkg/kubelet/config/file_linux_test.go
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#64655
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 65290, 65326, 65289, 65334, 64860). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
checkLimitsForResolvConf for the pod create and update events instead of checking period
**What this PR does / why we need it**:
- Check for the same at pod create and update events instead of checking continuously for every 30 seconds.
- Increase the logging level to 4 or higher since the event is not catastrophic to cluster health .
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#64849
**Special notes for your reviewer**:
@ravisantoshgudimetla
**Release note**:
```release-note
checkLimitsForResolvConf for the pod create and update events instead of checking period
```
Automatic merge from submit-queue (batch tested with PRs 65187, 65206, 65223, 64752, 65238). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Kubelet watches necessary secrets/configmaps instead of periodic polling