Automatic merge from submit-queue (batch tested with PRs 63314, 63884, 63799, 63521, 62242). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add memcg notifications for allocatable cgroup
**What this PR does / why we need it**:
Use memory cgroup notifications to trigger the eviction manager when the allocatable eviction threshold is crossed. This allows the eviction manager to respond more quickly when the allocatable cgroup's available memory becomes low. Evictions are preferable to OOMs in the cgroup since the kubelet can enforce its priorities on which pod is killed.
**Which issue(s) this PR fixes**:
Fixes https://github.com/kubernetes/kubernetes/issues/57901
**Special notes for your reviewer**:
This adds the alloctable cgroup from the container manager to the eviction config.
**Release note**:
```release-note
NONE
```
/sig node
/priority important-soon
/kind feature
I would like this to be included in the 1.11 release.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
track/close kubelet->API connections on heartbeat failure
xref #48638
xref https://github.com/kubernetes-incubator/kube-aws/issues/598
we're already typically tracking kubelet -> API connections and have the ability to force close them as part of client cert rotation. if we do that tracking unconditionally, we gain the ability to also force close connections on heartbeat failure as well. it's a big hammer (means reestablishing pod watches, etc), but so is having all your pods evicted because you didn't heartbeat.
this intentionally does minimal refactoring/extraction of the cert connection tracking transport in case we want to backport this
* first commit unconditionally sets up the connection-tracking dialer, and moves all the cert management logic inside an if-block that gets skipped if no certificate manager is provided (view with whitespace ignored to see what actually changed)
* second commit plumbs the connection-closing function to the heartbeat loop and calls it on repeated failures
follow-ups:
* consider backporting this to 1.10, 1.9, 1.8
* refactor the connection managing dialer to not be so tightly bound to the client certificate management
/sig node
/sig api-machinery
```release-note
kubelet: fix hangs in updating Node status after network interruptions/changes between the kubelet and API server
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Report node DNS info with --node-ip
**What this PR does / why we need it**:
This PR adds `ExternalDNS`, `InternalDNS`, and `ExternalIP` info for kubelets with the `--nodeip` flag enabled.
**Which issue(s) this PR fixes**
Fixes#63158
**Special notes for your reviewer**:
I added a field to the Kubelet to make IP validation more testable (`validateNodeIP` relies on the `net` package and the IP address of the host that is executing the test.) I also converted the test to use a table so new cases could be added more easily.
**Release Notes**
```release-note
Report node DNS info with --node-ip flag
```
@andrewsykim
@nckturner
/sig node
/sig network
Automatic merge from submit-queue (batch tested with PRs 62448, 59317, 59947, 62418, 62352). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubelet: add configuration to optionally enable server tls bootstrap
right now if the RotateKubeletServerCertificate feature is enabled,
kubelet will bootstrap server tls. this is undesirable if the deployment
is not or cannot run an approver to handle these certificate signing
requests.
Fixes https://github.com/kubernetes/kubernetes/issues/62077
```release-note
NONE
```
right now if the RotateKubeletServerCertificate feature is enabled,
kubelet will bootstrap server tls. this is undesirable if the deployment
is not or cannot run an approver to handle these certificate signing
requests.
The code was added to support rktnetes and non-CRI docker integrations.
These legacy integrations have already been removed from the codebase.
This change removes the compatibility code existing soley for the
legacy integrations.
Automatic merge from submit-queue (batch tested with PRs 57658, 61304, 61560, 61859, 61870). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
certs: exclude more nonsensical addresses from SANs
I noticed this when I saw 169.254.* SANs using server TLS bootstrap.
This change excludes more nonsensical addresses from being requested as
SANs in that flow.
I noticed this when I saw 169.254.* SANs using server TLS bootstrap.
This change excludes more nonsensical addresses from being requested as
SANs in that flow.
rktnetes is scheduled to be deprecated in 1.10 (#53601). According to
the deprecation policy for beta CLI and flags, we can remove the feature
in 1.11.
Fixes#58721
Automatic merge from submit-queue (batch tested with PRs 61487, 58353, 61078, 61219, 60792). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
remove dead code in kubelet
clean up dead code
/kind cleanup
/sig node
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 59740, 59728, 60080, 60086, 58714). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubelet: make --cni-bin-dir accept a comma-separated list of CNI plugin directories
Allow CNI-related network plugin drivers (kubenet, cni) to search a list of
directories for plugin binaries instead of just one. This allows using an
administrator-provided path and fallbacks to others (like the previous default
of /opt/cni/bin) for backwards compatibility.
```release-note
kubelet's --cni-bin-dir option now accepts multiple comma-separated CNI binary directory paths, which are search for CNI plugins in the given order.
```
@kubernetes/rh-networking @kubernetes/sig-network-misc @freehan @pecameron @rajatchopra
Automatic merge from submit-queue (batch tested with PRs 51423, 53880). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Disable ImageGC when high threshold is set to 100
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
fixes#51268
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
The LocalStorageCapacityIsolation feature added a new resource type
ResourceEphemeralStorage "ephemeral-storage" so that this resource can
be allocated, limited, and consumed as the same way as CPU/memory. All
the features related to resource management (resource request/limit, quota, limitrange) are avaiable for local ephemeral storage.
This local ephemeral storage represents the storage for root file system, which will be consumed by containers' writtable layer and logs. Some volumes such as emptyDir might also consume this storage.
Allow CNI-related network plugin drivers (kubenet, cni) to search a list of
directories for plugin binaries instead of just one. This allows using an
administrator-provided path and fallbacks to others (like the previous default
of /opt/cni/bin) for backwards compatibility.
Automatic merge from submit-queue (batch tested with PRs 60157, 60337, 60246, 59714, 60467). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
backoff runtime errors in kubelet sync loop
The runtime health check can race with PLEG's first relist, and this
often results in an unnecessary 5 second wait during Kubelet bootstrap.
This change aims to improve the performance.
```release-note
NONE
```
The word 'manifest' technically refers to a container-group specification
that predated the Pod abstraction. We should avoid using this legacy
terminology where possible. Fortunately, the Kubelet's config API will
be beta in 1.10 for the first time, so we still had the chance to make
this change.
I left the flags alone, since they're deprecated anyway.
I changed a few var names in files I touched too, but this PR is the
just the first shot, not the whole campaign
(`git grep -i manifest | wc -l -> 1248`).
The runtime health check can race with PLEG's first relist, and this
often results in an unnecessary 5 second wait during Kubelet bootstrap.
This change aims to improve the performance.
Automatic merge from submit-queue (batch tested with PRs 54191, 59374, 59824, 55032, 59906). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Adding per container stats for CRI runtimes
**What this PR does / why we need it**
This commit aims to collect per container log stats. The change was proposed as a part of #55905. The change includes change the log path from /var/pod/<pod uid>/containername_attempt.log to /var/pod/<pod uid>/containername/containername_attempt.log. The logs are collected by reusing volume package to collect metrics from the log path.
Fixes#55905
**Special notes for your reviewer:**
cc @Random-Liu
**Release note:**
```
Adding container log stats for CRI runtimes.
```
This commit aims to collect per container log stats. The
change was proposed as a part of #55905. The change includes
change of the log path from /var/pod/<pod uid>/containername_attempt.log
to /var/pod/<pod uid>/containername/containername_attempt.log.
The logs are collected by reusing volume package to collect
metrics from the log path.
Signed-off-by: abhi <abhi@docker.com>
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Collect ephemeral storage capacity on initialization
**What this PR does / why we need it**:
We have had some node e2e flakes where a pod can be rejected if it requests ephemeral storage. This is because we don't set capacity and allocatable for ephemeral storage on initialization.
This PR causes cAdvisor to do one round of stats collection during initialization, which will allow it to get the disk capacity when it first sets the node status.
It also sets the node to NotReady if capacities have not been initialized yet.
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
/assign @jingxu97 @Random-Liu
/sig node
/kind bug
/priority important-soon
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix kubelet PVC stale metrics
**What this PR does / why we need it**:
Volumes on each node changes, we should not only add PVC metrics into
gauge vector. It's better use a collector to collector metrics from internal
stats.
Currently, if a PV (bound to a PVC `testpv`) is attached and used by node A, then migrated to node B or just deleted from node A later. `testpvc` metrics will not disappear from kubelet on node A. After a long running time, `kubelet` process will keep a lot of stale volume metrics in memory.
For these dynamic metrics, it's better to use a collector to collect metrics from a data source (`StatsProvider` here), like [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics) scraping metrics from kube-apiserver.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/kubernetes/issues/57686
**Special notes for your reviewer**:
**Release note**:
```release-note
Fix kubelet PVC stale metrics
```
Automatic merge from submit-queue (batch tested with PRs 59276, 51042, 58973, 59377, 59472). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubelet: only register api source when connecting
**What this PR does / why we need it**:
before this change, an api source was always registered, even when there
was no kubeclient. this lead to some operations blocking waiting for
podConfig.SeenAllSources to pass, which it never would.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#59275
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
This adds context to all the relevant cloud provider interface signatures.
Callers of those APIs are currently satisfied using context.TODO().
There will be follow on PRs to push the context through the stack.
For an idea of the full scope of this change please look at PR #58532.
before this change, an api source was always registered, even when there
was no kubeclient. this lead to some operations blocking waiting for
podConfig.SeenAllSources to pass, which it never would.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Only rotate certificates in the background
Change the Kubelet to not block until the first certs have rotated (we didn't act on it anyway) and fall back to the bootstrap cert if the most recent rotated cert is expired on startup.
The certificate manager originally had a "block on startup" rotation behavior to ensure at least one rotation happened on startup. However, since rotation may not succeed within the first time window the code was changed to simply print the error rather than return it. This meant that the blocking rotation has no purpose - it cannot cause the kubelet to fail, and it *does* block the kubelet from starting static pods before the api server becomes available.
The current block behavior causes a bootstrapped kubelet that is also set to run static pods to wait several minutes before actually launching the static pods, which means self-hosted masters using static pods have a pointless delay on startup.
Since blocking rotation has no benefit and can't actually fail startup, this commit removes the blocking behavior and simplifies the code at the same time. The goroutine for rotation now completely owns the deadline, the shouldRotate() method is removed, and the method that sets rotationDeadline now returns it. We also explicitly guard against a negative sleep interval and omit the message.
Should have no impact on bootstrapping except the removal of a long delay on startup before static pods start.
The other change is that an expired certificate from the cert manager is *not* considered a valid cert, which triggers an immediate rotation. This causes the cert manager to fall back to the original bootstrap certificate until a new certificate is issued. This allows the bootstrap certificate on masters to be "higher powered" and allow the node to function prior to initial approval, which means someone configuring the masters with a pre-generated client cert can be guaranteed that the kubelet will be able to communicate to report self-hosted static pod status, even if the first client rotation hasn't happened. This makes master self-hosting more predictable for static configuration environments.
```release-note
When using client or server certificate rotation, the Kubelet will no longer wait until the initial rotation succeeds or fails before starting static pods. This makes running self-hosted masters with rotation more predictable.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Remove setInitError.
**What this PR does / why we need it**:
Removes setInitError, it's not sure it was ever really used, and it causes the kubelet to hang and get wedged.
**Which issue(s) this PR fixes**
Fixes#46086
**Special notes for your reviewer**:
If `initializeModules()` in `kubelet.go` encounters an error, it calls `runtimeState.setInitError(...)`
47d61ef472/pkg/kubelet/kubelet.go (L1339)
The trouble with this is that `initError` is never cleared, which means that `runtimeState.runtimeErrors()` always returns this `initError`, and thus pods never start sync-ing.
In normal operation, this is expected and desired because eventually the runtime is expected to become healthy, but in this case, `initError` is never updated, and so the system just gets wedged.
47d61ef472/pkg/kubelet/kubelet.go (L1751)
We could add some retry to `initializeModules()` but that seems unnecessary, as eventually we'd want to just die anyway. Instead, just log fatal and die, a supervisor will restart us.
Note, I'm happy to add some retry here too, if that makes reviewers happier.
**Release note**:
```release-note
Prevent kubelet from getting wedged if initialization of modules returns an error.
```
@feiskyer @dchen1107 @janetkuo
@kubernetes/sig-node-bugs
The certificate manager originally had a "block on startup" rotation
behavior to ensure at least one rotation happened on startup. However,
since rotation may not succeed within the first time window the code was
changed to simply print the error rather than return it. This meant that
the blocking rotation has no purpose - it cannot cause the kubelet to
fail, and it *does* block the kubelet from starting static pods before
the api server becomes available.
The current block behavior causes a bootstrapped kubelet that is also
set to run static pods to wait several minutes before actually launching
the static pods, which means self-hosted masters using static pods have
a pointless delay on startup.
Since blocking rotation has no benefit and can't actually fail startup,
this commit removes the blocking behavior and simplifies the code at the
same time. The goroutine for rotation now completely owns the deadline,
the shouldRotate() method is removed, and the method that sets
rotationDeadline now returns it. We also explicitly guard against a
negative sleep interval and omit the message.
Should have no impact on bootstrapping except the removal of a long
delay on startup before static pods start.
Also add a guard condition where if the current cert in the store is
expired, we fall back to the bootstrap cert initially (we use the
bootstrap cert to communicate with the server). This is consistent with
when we don't have a cert yet.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add deprecation warnings for rktnetes flags
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#53601
**Special notes for your reviewer**:
**Release note**:
```release-note
rktnetes has been deprecated in favor of rktlet. Please see https://github.com/kubernetes-incubator/rktlet for more information.
```