Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
kubelet: remove the --docker-exec-handler flag
Stop supporting the "nsenter" exec handler. Only the Docker native exec
handler is supported.
The flag was deprecated in Kubernetes 1.6 and is safe to remove
in Kubernetes 1.9 according to the deprecation policy.
**What this PR does / why we need it**:
**Which issue this PR fixes** : fixes#40229
**Special notes for your reviewer**:
N/A
**Release note**:
```release-note
Remove the --docker-exec-handler flag. Only native exec handler is supported.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
dockershim: fine-tune network-ready handling on sandbox teardown and removal
If sandbox teardown results in an error, GC will periodically attempt
to again remove the sandbox. Until the sandbox is removed, pod
sandbox status calls will attempt to enter the pod's namespace and
retrieve the pod IP, but the first teardown attempt may have already
removed the network namespace, resulting in a pointless log error
message that the network namespace doesn't exist, or that nsenter
can't find eth0.
The network-ready mechanism originally attempted to suppress those
messages by ensuring that pod sandbox status skipped network checks
when networking was already torn down, but unfortunately the ready
value was cleared too early.
Also, don't tear down the pod network multiple times if the first
time we tore it down, it succeeded.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
bazel: build/test almost everything
**What this PR does / why we need it**: Miscellaneous cleanups and bug fixes. The main motivating idea here was to make `bazel build //...` and `bazel test //...` mostly work. (There's a few reasons these still don't work, but we're a lot closer.)
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
/assign @BenTheElder @mikedanese @spxtr
Automatic merge from submit-queue (batch tested with PRs 50890, 52484, 52542, 52567, 50672). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Do not set message when terminationMessagePath not found
If terminationMessagePath is set to a file that does not exist, we should not log an error message and instead try falling back to logs (based on the user's request).
This also slightly simplifies the terminationMessagePath processing.
Seen in #50499
```release-note
If a container does not create a file at the `terminationMessagePath`, no message should be output about being unable to find the file.
```
Automatic merge from submit-queue (batch tested with PRs 52168, 48939, 51889, 52051, 50396). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Add Windows Server Containers Stats and Metrics to Kubelet
**What this PR does / why we need it**:
This PR implements stats for Windows Server Containers. This adds the ability to monitor Windows Server containers via the existing stats/summary endpoint inside the kubelet. Windows metrics can now be ingested into heapster and monitored using existing tools (like Grafana).
Previously, the /stats/summary api would consistently crash the kubelet on Windows server containers. This PR implements a new package "winstats" which reads windows server metrics from a combination of windows specific perf counters as well as docker stats. The "winstats" package exports functions that return CAdvisor data structures, which the existing summary api can read.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#49398
This PR addresses my plan to implement windows server container stats https://github.com/kubernetes/kubernetes/issues/49398 .
**Release note**:
```release-note
Add monitoring of Windows Server containers metrics in the kubelet via the stats/summary endpoint.
```
Automatic merge from submit-queue (batch tested with PRs 52168, 48939, 51889, 52051, 50396). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Fix typo in kubelet kuberuntime container test
Changes "Expetected" to "Expected"
**What this PR does / why we need it**: Fixes a typo in a test
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50294, 50422, 51757, 52379, 52014). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Fix AnnotationProvidedIPAddr annotation for externalCloudProvider
**What this PR does / why we need it**:
In #44258, it introduced `AnnotationProvidedIPAddr`. When kubelet has 'node-ip' parameter set, and cloud provider not set, this annotation would be populated, and then will be validated by cloud-controller-manager:
https://github.com/kubernetes/kubernetes/pull/44258/files#diff-6b0808bd1afb15f9f77986f4459601c2R465
Later with #47152, externalCloudProvider is checked and func returns before that annotation got set. In this case, that annotation will not get populated.
This fix is to bring that annotation assignment to a proper location.
Please correct me if I have any misunderstanding.
@wlan0 @ublubu
**Which issue this PR fixes**
**Special notes for your reviewer**:
**Release note**:
Automatic merge from submit-queue (batch tested with PRs 52109, 52235, 51809, 52161, 50080). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Remove backward compatibility of hostportChainName
**What this PR does / why we need it**:
fix TODO.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
/assign @freehan
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 52240, 48145, 52220, 51698, 51777). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
dockershim: remove support for legacy containers
The code was first introduced in 1.6 to help pre-CRI-kubelet upgrade to
using the CRI implementation. They can safely be removed now.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
kubelet: fix inconsistent display of terminated pod IPs
PLEG and kubelet race when reading and sending pod status to the apiserver. PLEG
inserts status into a cache, and then signals kubelet. Kubelet then eventually
reads the status out of that cache, but in the mean time the status could have
been changed by PLEG.
When a pod exits, pod status will no longer include the pod's IP address because
the network plugin/runtime will report "" for terminated pod IPs. If this status
gets inserted into the PLEG cache before kubelet gets the status out of the cache,
kubelet will see a blank pod IP address. This happens in about 1/5 of cases when
pods are short-lived, and somewhat less frequently for longer running pods.
To ensure consistency for properties of dead pods, copy an old status update's
IP address over to the new status update if (a) the new status update's IP is
missing and (b) all sandboxes of the pod are dead/not-ready (eg, no possibility
for a valid IP from the sandbox).
Fixes: https://github.com/kubernetes/kubernetes/issues/47265
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1449373
@eparis @freehan @kubernetes/rh-networking @kubernetes/sig-network-misc
Stop supporting the "nsenter" exec handler. Only the Docker native exec
handler is supported.
The flag was deprecated in Kubernetes 1.6 and is safe to remove
in Kubernetes 1.9 according to the deprecation policy.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Fix volume remount on reboot
**What this PR does / why we need it**:
Check the mount is actually attached & mounted before marking actual state of world of Kubelet reconciler.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#51982
**Special notes for your reviewer**:
Added explicit check to make sure volumes are attached and are mounted before marking the state in actual state of world.
**Release note**:
NONE
If sandbox teardown results in an error, GC will periodically attempt
to again remove the sandbox. Until the sandbox is removed, pod
sandbox status calls will attempt to enter the pod's namespace and
retrieve the pod IP, but the first teardown attempt may have already
removed the network namespace, resulting in a pointless log error
message that the network namespace doesn't exist, or that nsenter
can't find eth0.
The network-ready mechanism originally attempted to suppress those
messages by ensuring that pod sandbox status skipped network checks
when networking was already torn down, but unfortunately the ready
value was cleared too early.
Also, don't tear down the pod network multiple times if the first
time we tore it down, it succeeded.
Automatic merge from submit-queue (batch tested with PRs 51337, 47080, 52646, 52635, 52666). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Fix CRI container/imagefs stats.
`ContainerStats`, `ListContainerStats` and `ImageFsInfo` are returning `not implemented` error now.
This PR fixes it.
@yujuhong @feiskyer @yguo0905
Automatic merge from submit-queue (batch tested with PRs 51337, 47080, 52646, 52635, 52666). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Map a resource to multiple signals in eviction manager
It is possible to have multiple signals that point to the same type of
resource, e.g., both SignalNodeFsAvailable and
SignalAllocatableNodeFsAvailable refer to the same resource NodeFs.
Change the map from map[v1.ResourceName]evictionapi.Signal to
map[v1.ResourceName][]evictionapi.Signal
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#52661
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Fixes a race in deviceplugin/manager_test.go and a race in deviceplug…
…in/manager.go.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
https://github.com/kubernetes/kubernetes/issues/52560
**Special notes for your reviewer**:
Tested with go test -count 50 -race k8s.io/kubernetes/pkg/kubelet/deviceplugin and all runs passed.
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 48970, 52497, 51367, 52549, 52541). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Recreate pod sandbox when the sandbox does not have an IP address.
**What this PR does / why we need it**:
Attempts to fix a bug where Pods do not receive networking when the kubelet restarts during pod creation.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
fixes # https://github.com/kubernetes/kubernetes/issues/48510
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 52176, 43152). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Eliminate hangs/throttling of node heartbeat
Fixes https://github.com/kubernetes/kubernetes/issues/48638Fixes#50304
Stops kubelet from wedging when updating node status if unable to establish tcp connection.
Notes that this only affects the node status loop. The pod sync loop would still hang until the dead TCP connections timed out, so more work is needed to keep the sync loop responsive in the face of network issues, but this change lets existing pods coast without the node controller trying to evict them
```release-note
kubelet to master communication when doing node status updates now has a timeout to prevent indefinite hangs
```
If terminationMessagePath is set to a file that does not exist, we
should not log an error message and instead try falling back to logs
(based on the user's request).
Automatic merge from submit-queue (batch tested with PRs 52452, 52115, 52260, 52290)
Fixes device plugin re-registration handling logic to make sure:
- If a device plugin exits, its exported resource will be removed.
- No capacity change if a new device plugin instance comes up to replace the old instance.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/kubernetes/kubernetes/issues/52510
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 52442, 52247, 46542, 52363, 51781)
Make CPU manager release CPUs when Pod enters completed phase.
**What this PR does / why we need it**: When CPU manager is enabled, this PR releases allocated CPUs when container is not running and is non-restartable.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#52351
**Special notes for your reviewer**:
This bug is only reproduced for pods with `restartPolicy` = `Never` or `OnFailure`. The following output is from a 4 CPU node. This bug can be reproduced as long >= half the cores are requested.
pod1.yaml:
```
apiVersion: v1
kind: Pod
metadata:
name: test-pod1
spec:
containers:
- image: ubuntu
command: ["/bin/bash"]
args: ["-c", "sleep 5"]
name: test-container1
resources:
requests:
cpu: 2
memory: 100Mi
limits:
cpu: 2
memory: 100Mi
restartPolicy: "Never"
```
pod2.yaml:
```
apiVersion: v1
kind: Pod
metadata:
name: test-pod2
spec:
containers:
- image: ubuntu
command: ["/bin/bash"]
args: ["-c", "sleep 5"]
name: test-container1
resources:
requests:
cpu: 2
memory: 100Mi
limits:
cpu: 2
memory: 100Mi
restartPolicy: "Never"
```
Run a local Kubernetes cluster with CPU manager enabled.
```sh
KUBELET_FLAGS='--feature-gates=CPUManager=true --cpu-manager-policy=static --cpu-manager-reconcile-period=1s --kube-reserved=cpu=500m' ./hack/local-up-cluster.sh
```
_Before:_
Create `test-pod1` using pod1.yaml.
```
./cluster/kubectl.sh create -f pod1.yaml
```
Wait for the pod to complete and wait another 90 seconds (give enough time for GC to kick-in).
Create `test-pod2` using pod2.yaml.
```
./cluster/kubectl.sh create -f pod2.yaml
```
Get all pods in the cluster.
```
./cluster/kubectl.sh get pods -a
NAME READY STATUS RESTARTS AGE
test-pod1 0/1 Completed 0 1m
test-pod2 0/1 not enough cpus available to satisfy request 0 9s
```
_After:_
Create `test-pod1` using pod1.yaml.
```
./cluster/kubectl.sh create -f pod1.yaml
```
Wait for the pod to complete and wait another 90 seconds (give enough time for GC to kick-in).
Create `test-pod2` using pod2.yaml.
```
./cluster/kubectl.sh create -f pod2.yaml
```
Get all pods in the cluster.
```
./cluster/kubectl.sh get pods -a
NAME READY STATUS RESTARTS AGE
test-pod1 0/1 Completed 0 1m
test-pod2 0/1 Completed 0 9s
```
Automatic merge from submit-queue (batch tested with PRs 52442, 52247, 46542, 52363, 51781)
Ignore pods for quota marked for deletion whose node is unreachable
**What this PR does / why we need it**:
Traditionally, we charge to quota all pods that are in a non-terminal phase. We have a user report that noted the behavior change in kube 1.5 for the node controller to no longer force delete pods whose nodes have been lost. Instead, the pod is marked for deletion, and the reason is updated to state that the node is unreachable. The user expected the quota to be released. If the user was at their quota limit, their application may not be able to create a new replica given the current behavior. As a result, this PR ignores pods marked for deletion that have exceeded their grace period.
**Which issue this PR fixes**
xref https://bugzilla.redhat.com/show_bug.cgi?id=1455743
fixes https://github.com/kubernetes/kubernetes/issues/52436
**Release note**:
```release-note
Ignore pods marked for deletion that exceed their grace period in ResourceQuota
```
- If a device plugin exits, its exported resource will be removed.
- No capacity change if a new device plugin instance comes up to replace the old instance.
This implements stats for windows nodes in a new package, winstats.
WinStats exports methods to get cadvisor like datastructures, however
with windows specific metrics. WinStats only gets node level metrics and
information, container stats will go via the CRI. This enables the
use of the summary api to get metrics for windows nodes.
Automatic merge from submit-queue (batch tested with PRs 52339, 52343, 52125, 52360, 52301)
dockershim: check if f.Sync() returns an error and surface it
```release-note
dockershim: check the error when syncing the checkpoint.
```
Automatic merge from submit-queue (batch tested with PRs 48226, 52046, 52231, 52344, 52352)
Log at higher verbosity levels some common SyncPod errors
This log message was 90% of all glog.Errorf level statements reported on a production cluster, hiding other more impactful errors. We already log it in start container, but for extra caution we continue to log it at v(3) here (the downside of not logging a start container error is worse than some log spam at higher levels).
HandleError() is intended only for unknown and unexpected errors.
```release-note
NONE
```
@derekwaynecarr @sjenning
Automatic merge from submit-queue (batch tested with PRs 48226, 52046, 52231, 52344, 52352)
[BugFix] Soft Eviction timer works correctly
fixes#51516
thresholdsMet should not exclude previously met thresholds when we do not have new stats for a threshold.
/assign @vishh @derekwaynecarr
cc @kubernetes/sig-node-bugs
Automatic merge from submit-queue (batch tested with PRs 51041, 52297, 52296, 52335, 52338)
Use cAdvisor constant for crio imagefs
**What this PR does / why we need it**:
code hygiene to use a constant from cAdvisor
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
fsync config checkpoint files after writing
@yujuhong brought up that it's possible for a hard reboot to result in empty checkpoint files, if they haven't been synced to disk yet. This PR ensures that Kubelet configuration checkpoints are synced after writing to avoid this issue.
fixes#52222
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 52264, 51870)
Use credentials from providers for docker sandbox image
**What this PR does / why we need it**:
Sandbox image lookup uses creds from docker config only; other credential providers are ignored. This is a regression introduced in dockershim.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#51293
**Special notes for your reviewer**:
Should also cherry-pick this to release-1.6 and release-1.7.
**Release note**:
```release-note
Fix credentials providers for docker sandbox image.
```