Set resyncInterval to one minute now that we rely on the generic pleg to trigger
pod syncs on container events. When there is an error during syncing, pod
workers need to wake up sooner to retry. Set the sync error backoff period to
10 second in this case.
This change introduces pod lifecycle event generator (PLEG), and adds a generic
PLEG. The generic PLEG relies on relisting to discover container events, and is
container-runtime-agnostic. Both docker and rkt are changed to use generic
PLEG.
+ Fix#14992
+ "When deploying a pod using an on-disk kubelet manifest (a la /etc/kubernetes/manifests), it appears that the network plugin setUpPod is notified of the new pod before the apiserver."
Push status updates as soon as readiness state changes for containers,
rather than waiting for the sync loop to update the status. In
particular, this should help new containers to come online faster.
Additionally, consolidates prober test helpers into a single file.
This commit fixes getting the logs from complete/failed pods after
a kubelet restart by falling back to the api server in case we fail
to resolve the pod status using the status cache.
- PeriodSeconds - How often to probe
- SuccessThreshold - Number of successful probes to go from failure to success state
- FailureThreshold - Number of failing probes to go from success to failure state
This commit includes to changes in behavior:
1. InitialDelaySeconds now defaults to 10 seconds, rather than the
kubelet sync interval (although that also defaults to 10 seconds).
2. Prober only retries on probe error, not failure. To compensate, the
default FailureThreshold is set to the maxRetries, 3.
This lays the groundwork for simple multizone capabilities.
In a cloud environment, nodes are typically created by the kubelet
registering with the API server. When creating a new node, we now query
the cloudprovider to see if it can provide Zone information, and if so
we add some well-known labels to the Node we are creating.
- status.Manager always deals with the local (static) pod, but gets the
mirror pod when syncing
- This lets components like the probe workers ignore mirror pods
Now that kubelet checks sources seen correctly, there is no need to enforce the
initial order of pod updates and housekeeping. Use a ticker for housekeeping to
simplify the code.
Currently kubelet syncs all pods every 10s. This is not preferred because
* Some pods may have been sync'd recently.
* This may cause all the pods to be sync'd at once, causing undesirable
CPU spikes.
This PR replaces the global syncs with independent, periodic pod syncs. At the
end of syncing, each pod worker will enqueue itslef with a future timestamp (
current time + sync interval), when it will be due for another sync.
* If the pod worker encoutners an sync error, it may requeue with a different
timestamp to retry sooner.
* If a sync is triggered by the update channel (events or spec changes), the
pod worker would enqueue a new sync time.
This change is necessary for moving to long or no periodic sync period once pod
lifecycle event generator is completed. We will still rely on the mechanism to
requeue the pod on sync error.
This change also makes sure that if a sync does not succeed (either due to
real error or the per-container backoff mechanism), an error would be propagated
back to the pod worker, which is responsible for requeuing.
Define a new out of disk node condition and use it to report when node
goes out of disk.
Make a copy of loop range clause variable in node listers so that it
is available outside the for loop.
Also update/implement unit tests.
This commit builds on previous work and creates an independent
worker for every liveness probe. Liveness probes behave largely the same
as readiness probes, so much of the code is shared by introducing a
probeType paramater to distinguish the type when it matters. The
circular dependency between the runtime and the prober is broken by
exposing a shared liveness ResultsManager, owned by the
kubelet. Finally, an Updates channel is introduced to the ResultsManager
so the kubelet can react to unhealthy containers immediately.
Change all references to the container ID in pkg/kubelet/... to the
strong type defined in pkg/kubelet/container: ContainerID
The motivation for this change is to make the format of the ID
unambiguous, specifically whether or not it includes the runtime
prefix (e.g. "docker://").
The current implementation considers a source seen when it receives a SET at
kubelet/config/config.go. However, the main kubelet sync loop may not have
received the pod update from the source via the channel. This change ensures
that kubelet would consider all sources are ready only after the sync loop has
seen all the sources.
Each container with a readiness has an individual go-routine which
handles periodic probing for that container. The results are cached, and
written to the status.Manager in the pod sync path.
Network configuration error message while setting Kubelet status was
being written to "reasons" slice. Write this message to "messages" slice
instead.
Also remove "reasons" slice entirely since it is not used anywhere.
Now that kubelet has switched to incremental updates, it has complete
information of the pod update type (create, update, sync). This change pipes
this information to pod workers so that they don't have to derive the type
again.
Increase the supported controls on pod logging. Add validaiton to pod
log options. Ensure the Kubelet is using a consistent, structured way to
process pod log arguments.
Add ?sinceSeconds=<durationInSeconds>, &sinceTime=<RFC3339>, ?timestamps=<bool>,
?tailLines=<number>, and ?limitBytes=<number>
In many cases clients may wish to view not ready addresses for endpoints
in order to do set membership prior to a pod being ready. For instance,
a pod that uses the service endpoints to connect to other pods under
the same service, but does not want to signal ready before it has
contacted at least a minimal number of other pods.
This is backwards compatible with old servers and clients. There is
an additional cost in size of endpoints before services ramp up, which
will add minor CPU and memory use for services that have a significant
number of pods which have not become ready.
The methods for registering a node and syncing node status to the apiserver
have grown large enough that it makes sense for them to live in a separate
place. This change adds a nodeManager to handle such interaction with the
apiserver.
This refactor is in preparation for moving more state handling to the
status manager. It will become the canonical cache for the latest
information on running containers and probe status, as part of the
prober refactoring.
1. Make reason field of StatusReport objects in kubelet in CamelCase format.
2. Add Message field for ContainerStateWaiting to describe detail about Reason.
3. Make reason field of Events in kubelet in CamelCase format.
4. Update swagger,deep-copy and so on.
A lot of packages use StringSet, but they don't use anything else from
the util package. Moving StringSet into another package will shrink
their dependency trees significantly.
Both GetNode and the cache.ListWatch listfunc in the
kubelet package call List unnecessary.
GetNodeInfo is sufficient for GetNode and makes looping
through a list of nodes to check for a matching name
unnecessary.
resolves#13476
1. Add EvnetRecordQps and EventBurst parameter in kubelet.
2. If EvnetRecordQps and EventBurst was set, rate limit events in kubelet
with a independent ratelimiter as setted.
PR #13293 added a safety check to not remove a pod directory if the child
volumes directory is not empty. This logic is faulty because kubelet may have
directory structures podUID/volumes/volumeKind/volumeName. E.g.,
`056db95d-50ee-11e5-a2e4-42010af0ba1d/volumes/kubernetes.io~empty-dir/default-token-al3r2`
This change fixes that by properly listing all volumes under a pod.
Before, kubelet performs global cleanup tasks every iteration. After the
PR #13003, kubelet performs the tasks on every sync internval (10 seconds).
This PR decouples the housekeeping period with the sync internval to ensure
that kubelet cleans up promptly, while not too often (no more than once every
minimum housekeeping period).
Allow the user to specify the resolver configuration file that is used
to determine the default DNS parameters. This defaults to the system's
/etc/resolv.conf.
Currently, whenever there is any update, kubelet would force all pod workers to
sync again, causing resource contention and hence performance degradation.
This commit flips kubelet to use incremental updates (as opposed to snapshots).
This allows us to know what pods have changed and send updates to those pod
workers only. The `SyncPods` function has been replaced with individual
handlers, each handling an operation (ADD, REMOVE, UPDATE). Pod workers are
still triggered periodically, and kubelet performs periodic cleanup as well.
This commit also spawns a new goroutine solely responsible for killing pods.
This is necessary because pod killing could hold up the sync loop for
indefinitely long amount of time now user can define the graceful termination
period in the container spec.
We chose to use podFullName (name_namespace) as key in the status manager
because mirror pod and static pod share the same status. This is no longer
needed because we do not store statuses for static pods anymore (we only
store statuses for their mirror pods). Also, reviously, a few fixes were
merged to ensure statuses are cleaned up so that a new pod with the same
name would not resuse an old status.
This change cleans up the code by using UID as key so that the code would
become less brittle.
The sync loop should check for terminated pods that are no longer
running and clear them. The status loop should never write status
if the pod UID changes. Mirror pods should be deleted immediately
rather than gracefully.
Avoid TTL by deleting pods immediately when they aren't
scheduled, and letting the Kubelet delete them otherwise.
Ensure the Kubelet uses pod.Spec.TerminationGracePeriodSeconds
when no pod.DeletionGracePeriodSeconds is available.
Getting the public IP a container is supposed to use is O(hard),
and usually involves ugly gyrations in python or with interfaces.
Using the downward API means that the IP Kube is announcing to
other endpoints is also visible inside the container for pods to
identify themselves.
Eventually we would like to replace the all-encompassing SyncPods function with
more well-defined, smaller functions. This would not only help with the
readability and profiling of the code, it'd also set in motion for the plans to
trigger pod worker individually based on the content of the pod updates.
This commit serves as the first step of that, while avoiding breaking all unit
tests by preserving the SyncPods function for the time being.