Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
kubelet: fix inconsistent display of terminated pod IPs
PLEG and kubelet race when reading and sending pod status to the apiserver. PLEG
inserts status into a cache, and then signals kubelet. Kubelet then eventually
reads the status out of that cache, but in the mean time the status could have
been changed by PLEG.
When a pod exits, pod status will no longer include the pod's IP address because
the network plugin/runtime will report "" for terminated pod IPs. If this status
gets inserted into the PLEG cache before kubelet gets the status out of the cache,
kubelet will see a blank pod IP address. This happens in about 1/5 of cases when
pods are short-lived, and somewhat less frequently for longer running pods.
To ensure consistency for properties of dead pods, copy an old status update's
IP address over to the new status update if (a) the new status update's IP is
missing and (b) all sandboxes of the pod are dead/not-ready (eg, no possibility
for a valid IP from the sandbox).
Fixes: https://github.com/kubernetes/kubernetes/issues/47265
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1449373
@eparis @freehan @kubernetes/rh-networking @kubernetes/sig-network-misc
PLEG and kubelet race when reading and sending pod status to the apiserver. PLEG
inserts status into a cache, and then signals kubelet. Kubelet then eventually
reads the status out of that cache, but in the mean time the status could have
been changed by PLEG.
When a pod exits, pod status will no longer include the pod's IP address because
the network plugin/runtime will report "" for terminated pod IPs. If this status
gets inserted into the PLEG cache before kubelet gets the status out of the cache,
kubelet will see a blank pod IP address. This happens in about 1/5 of cases when
pods are short-lived, and somewhat less frequently for longer running pods.
To ensure consistency for properties of dead pods, copy an old status update's
IP address over to the new status update if (a) the new status update's IP is
missing and (b) all sandboxes of the pod are dead/not-ready (eg, no possibility
for a valid IP from the sandbox).
Fixes: https://github.com/kubernetes/kubernetes/issues/47265
Automatic merge from submit-queue
fix pleg relist time
This PR fix pleg reslist time. According to current implementation, we have a `Healthy` method periodically check the relist time. If current timestamp subtracts latest relist time is longer than `relistThreshold`(default is 3 minutes), we should return an error to indicate the error of runtime.
`relist` method is also called periodically. If runtime(docker) hung, the relist method should return immediately without updating the latest relist time. If we update latest relist time no matter runtime(docker) hung(default timeout is 2 minutes), the `Healthy` method will never return an error.
```release-note
Kubelet PLEG updates the relist timestamp only after successfully relisting.
```
/cc @yujuhong @Random-Liu @dchen1107
Automatic merge from submit-queue
kubelet: remove the pleg health check from healthz
This prevents kubelet from being killed when docker hangs.
Also, kubelet will report node not ready if PLEG hangs (`docker ps` + `docker inspect`).
Automatic merge from submit-queue
Instruct PLEG to detect pod sandbox state changes
This PR adds a Sandboxes list in `kubecontainer.Pod`, so that PLEG can check
sandbox changes using `GetPods()` . The sandboxes are treated as regular
containers (type `kubecontainer.Container`) for now to avoid additional
changes in PLEG.
/cc @feiskyer @yifan-gu @euank
PLEG will treat them as if they are regular containers and detect changes the
same manner. Note that this makes an assumption that container IDs will not
collide with the podsandbox IDs.
Fix the following sequence of events:
1. relist call 1 successfully inspects a pod (just has infra container)
1. relist call 2 gets an error inspecting the same pod (has infra container and a transient
container that failed to create) and doesn't update the old/new pod records
1. relist calls 3+ don't inspect the pod any more (just has infra container so it doesn't look like
anything changed)
This change adds a new list that keeps track of pods that failed inspection and retries them the
next time relist is called. Without this change, a pod in this state would never be inspected again,
its entry in the status cache would never be updated, and the pod worker would never call syncPod
again because the most recent entry in the status cache has an error associated with it. Without
this change, pods in this state would be stuck Terminating forever, unless the user issued a
deletion with a grace period value of 0.
PLEG is reponsible for listing the pods running on the node. If it's hung
due to non-responsive container runtime or internal bugs, we should restart
kubelet.
Currently, pleg would report a event if a container transitions from running to
exited between relisting. However, if would not report any event if a container
gets stopped and removed between relisting. This event will eventually be
handled when the pod syncs periodically, but this is undesirable. This change
ensures that we detect all such events.
This change introduces pod lifecycle event generator (PLEG), and adds a generic
PLEG. The generic PLEG relies on relisting to discover container events, and is
container-runtime-agnostic. Both docker and rkt are changed to use generic
PLEG.