Automatic merge from submit-queue
Log that debug handlers have been turned on.
**What this PR does / why we need it**: PR allows user to have a message in logs that debug handlers are on. It should allow the operator to know and automate a check for the case where debug has been left on.
**Release note**:
```
NONE
```
Automatic merge from submit-queue (batch tested with PRs 39373, 41585, 41617, 41707, 39958)
Fix ConfigMaps for Windows
**What this PR does / why we need it**: ConfigMaps were broken for Windows as the existing code used linux specific file paths. Updated the code in `kubelet_getters.go` to use `path/filepath` to get the directories. Also reverted back the code in `secret.go` as updating `kubelet_getters.go` to use `path/filepath` also fixes `secrets`
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/kubernetes/kubernetes/issues/39372
```release-note
Fix ConfigMap for Windows Containers.
```
cc: @pires
Changes the kubelet so it doesn't use the cert/key files directly for
starting the TLS server. Instead the TLS server reads the cert/key from
the new CertificateManager component, which is responsible for
requesting new certificates from the Certificate Signing Request API on
the API Server.
Automatic merge from submit-queue (batch tested with PRs 41649, 41658, 41266, 41371, 41626)
Understand why kubelet cannot cleanup orphaned pod dirs
**What this PR does / why we need it**:
Understand if we are unable to clean up orphaned pod directories due to a failure to read the directory versus paths still existing to improve ability to debug error situations.
Automatic merge from submit-queue (batch tested with PRs 40505, 34664, 37036, 40726, 41595)
dockertools: call TearDownPod when GC-ing infra pods
The docker runtime doesn't tear down networking when GC-ing pods.
rkt already does so make docker do it too. To ensure this happens,
infra pods are now always GC-ed rather than gating them by
containersToKeep.
This prevents IPAM from leaking when the pod gets killed for
some reason outside kubelet (like docker restart) or when pods
are killed while kubelet isn't running.
Fixes: https://github.com/kubernetes/kubernetes/issues/14940
Related: https://github.com/kubernetes/kubernetes/pull/35572
Automatic merge from submit-queue (batch tested with PRs 38101, 41431, 39606, 41569, 41509)
Report node not ready on failed PLEG health check
Report node not ready if PLEG health check fails.
Automatic merge from submit-queue (batch tested with PRs 38101, 41431, 39606, 41569, 41509)
optimize killPod() and syncPod() functions
make sure that one of the two arguments must be non-nil: runningPod, status ,just like the function note says
and judge the return value in syncPod() function before setting podKilled
Automatic merge from submit-queue (batch tested with PRs 38101, 41431, 39606, 41569, 41509)
[hairpin] fix argument of nsenter
**Release note**:
```release-note
None
```
We should use:
nsenter --net=netnsPath -- -F some_command
instend of:
nsenter -n netnsPath -- -F some_command
Because "nsenter -n netnsPath" get an error output:
# nsenter -n /proc/67197/ns/net ip addr
nsenter: neither filename nor target pid supplied for ns/net
If we really want use -n, we need to use -n in such format:
# sudo nsenter -n/proc/67197/ns/net ip addr
Dead infra containers may still have network resources allocated to
them and may not be GC-ed for a long time. But allowing SyncPod()
to restart an infra container before the old one is destroyed
prevents network plugins from carrying the old network details
(eg IPAM) over to the new infra container.
The docker runtime doesn't tear down networking when GC-ing pods.
rkt already does so make docker do it too. To ensure this happens,
networking is always torn down for the container even if the
container itself is not deleted.
This prevents IPAM from leaking when the pod gets killed for
some reason outside kubelet (like docker restart) or when pods
are killed while kubelet isn't running.
Fixes: https://github.com/kubernetes/kubernetes/issues/14940
Related: https://github.com/kubernetes/kubernetes/pull/35572
We need to tear down networking when garbage collecting containers too,
and GC is run from a different goroutine in kubelet. We don't want
container network operations running for the same pod concurrently.
The PluginManager almost duplicates the network plugin interface, but
not quite since the Init() function should be called by whatever
actually finds and creates the network plugin instance. Only then
does it get passed off to the PluginManager.
The Manager synchronizes pod-specific network operations like setup,
teardown, and pod network status. It passes through all other
operations so that runtimes don't have to cache the network plugin
directly, but can use the PluginManager as a wrapper.
Automatic merge from submit-queue (batch tested with PRs 41466, 41456, 41550, 41238, 41416)
Delay Deletion of a Pod until volumes are cleaned up
#41436 fixed the bug that caused #41095 and #40239 to have to be reverted. Now that the bug is fixed, this shouldn't cause problems.
@vishh @derekwaynecarr @sjenning @jingxu97 @kubernetes/sig-storage-misc
Automatic merge from submit-queue (batch tested with PRs 41531, 40417, 41434)
Always detach volumes in operator executor
**What this PR does / why we need it**:
Instead of marking a volume as detached immediately in Kubelet's
reconciler, delegate the marking asynchronously to the operator
executor. This is necessary to prevent race conditions with other
operations mutating the same volume state.
An example of one such problem:
1. pod is created, volume is added to desired state of the world
2. reconciler process starts
3. reconciler starts MountVolume, which is kicked off asynchronously via
operation_executor.go
4. MountVolume mounts the volume, but hasn't yet marked it as mounted
5. pod is deleted, volume is removed from desired state of the world
6. reconciler reaches detach volume section, detects volume is no longer in desired state of world,
removes it from volumes in use
7. MountVolume tries to mark mount, throws an error because
volume is no longer in actual state of world list. After this, kubelet isn't aware of the mount
so doesn't try to unmount again.
8. controller-manager tries to detach the volume, this fails because it
is still mounted to the OS.
9. EBS gets stuck indefinitely in busy state trying to detach.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#32881, fixes ##37854 (maybe)
**Special notes for your reviewer**:
**Release note**:
```release-note
```
make sure that one of the two arguments must be non-nil: runningPod, status ,just like the function note says
and judge the return value in syncPod() function before setting podKilled
Automatic merge from submit-queue
kubeadm: Migrate to client-go
**What this PR does / why we need it**: Finish the migration for kubeadm to use client-go wherever possible
**Which issue this PR fixes**: fixes #https://github.com/kubernetes/kubeadm/issues/52
**Special notes for your reviewer**: /cc @luxas @pires
**Release note**:
```release-note
NONE
```
Some imports dont exist yet (or so it seems) in client-go (examples
being:
- "k8s.io/kubernetes/pkg/api/validation"
- "k8s.io/kubernetes/pkg/util/initsystem"
- "k8s.io/kubernetes/pkg/util/node"
one change in kubelet to import to client-go
Automatic merge from submit-queue
Allow multipe DNS servers as comma-seperated argument for kubelet --dns
This PR explores how kubectls "--dns" could be extended to specify multiple DNS servers for in-cluster PODs. Testing on the local libvirt-coreos cluster shows that multiple DNS server are injected without issues.
Specifying multiple DNS servers increases resilience against
- Packet drops
- Single server failure
I am debugging services that do 50+ DNS requests for a single incoming interactive request, thus highly increase the chance of a slowdown (+5s) due to a single packet drop. Switching to two DNS servers will reduce the impact of the issues (roughly +1s on glibc, 0s on musl, error-rate goes down to error-rate^2).
Note that there is no need to change any runtime related code as far as I know. In the case of "default" dns the /etc/resolv.conf is parsed and multiple DNS server are send to the backend anyway. This only adds the same capability for the clusterFirst case.
I've heard from @thockin that multiple DNS entries are somehow considered. I've no idea what was considered, though. This is what I would like to see for our production use, though.
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 41360, 41423, 41430, 40647, 41352)
kubelet: reduce extraneous logging for pods using host network
For pods using the host network, kubelet/shim should not log
error/warning messages when determining the pod IP address.
Automatic merge from submit-queue (batch tested with PRs 41196, 41252, 41300, 39179, 41449)
record ReduceCPULimits result err info if err returned
record ReduceCPULimits result err info if err returned for debug
Automatic merge from submit-queue
Fix bug in status manager TerminatePod
In TerminatePod, we previously pass pod.Status to updateStatusInternal. This is a bug, since it is the original status that we are given. Not only does it skip updates made to container statuses, but in some cases it reverted the pod's status to an earlier version, since it was being passed a stale status initially.
This was the case in #40239 and #41095. As shown in #40239, the pod's status is set to running after it is set to failed, occasionally causing very long delays in pod deletion since we have to wait for this to be corrected.
This PR fixes the bug, adds some helpful debugging statements, and adds a unit test for TerminatePod (which for some reason didnt exist before?).
@kubernetes/sig-node-bugs @vish @Random-Liu
Automatic merge from submit-queue
Make EnableCRI default to true
This change makes kubelet to use the CRI implementation by default,
unless the users opt out explicitly by using --enable-cri=false.
For the rkt integration, the --enable-cri flag will have no effect
since rktnetes does not use CRI.
Also, mark the original --experimental-cri flag hidden and deprecated,
so that we can remove it in the next release. If both flags are specified,
the --enable-cri flag overrides the --experimental-cri flag.
Automatic merge from submit-queue
fix comment
**What this PR does / why we need it**:
fix comment
Thanks.
**Special notes for your reviewer**:
**Release note**:
```release-note
```
This change makes kubelet to use the CRI implementation by default,
unless the users opt out explicitly by using --enable-cri=false.
For the rkt integration, the --enable-cri flag will have no effect
since rktnetes does not use CRI.
Also, mark the original --experimental-cri flag hidden and deprecated,
so that we can remove it in the next release.
To safely mark a volume detached when the volume controller manager is used.
An example of one such problem:
1. pod is created, volume is added to desired state of the world
2. reconciler process starts
3. reconciler starts MountVolume, which is kicked off asynchronously via
operation_executor.go
4. MountVolume mounts the volume, but hasn't yet marked it as mounted
5. pod is deleted, volume is removed from desired state of the world
6. reconciler detects volume is no longer in desired state of world,
removes it from volumes in use
7. MountVolume tries to mark volume in use, throws an error because
volume is no longer in actual state of world list.
8. controller-manager tries to detach the volume, this fails because it
is still mounted to the OS.
9. EBS gets stuck indefinitely in busy state trying to detach.