Change all references to the container ID in pkg/kubelet/... to the
strong type defined in pkg/kubelet/container: ContainerID
The motivation for this change is to make the format of the ID
unambiguous, specifically whether or not it includes the runtime
prefix (e.g. "docker://").
The current implementation considers a source seen when it receives a SET at
kubelet/config/config.go. However, the main kubelet sync loop may not have
received the pod update from the source via the channel. This change ensures
that kubelet would consider all sources are ready only after the sync loop has
seen all the sources.
Each container with a readiness has an individual go-routine which
handles periodic probing for that container. The results are cached, and
written to the status.Manager in the pod sync path.
Network configuration error message while setting Kubelet status was
being written to "reasons" slice. Write this message to "messages" slice
instead.
Also remove "reasons" slice entirely since it is not used anywhere.
Now that kubelet has switched to incremental updates, it has complete
information of the pod update type (create, update, sync). This change pipes
this information to pod workers so that they don't have to derive the type
again.
Increase the supported controls on pod logging. Add validaiton to pod
log options. Ensure the Kubelet is using a consistent, structured way to
process pod log arguments.
Add ?sinceSeconds=<durationInSeconds>, &sinceTime=<RFC3339>, ?timestamps=<bool>,
?tailLines=<number>, and ?limitBytes=<number>
In many cases clients may wish to view not ready addresses for endpoints
in order to do set membership prior to a pod being ready. For instance,
a pod that uses the service endpoints to connect to other pods under
the same service, but does not want to signal ready before it has
contacted at least a minimal number of other pods.
This is backwards compatible with old servers and clients. There is
an additional cost in size of endpoints before services ramp up, which
will add minor CPU and memory use for services that have a significant
number of pods which have not become ready.
The methods for registering a node and syncing node status to the apiserver
have grown large enough that it makes sense for them to live in a separate
place. This change adds a nodeManager to handle such interaction with the
apiserver.
This refactor is in preparation for moving more state handling to the
status manager. It will become the canonical cache for the latest
information on running containers and probe status, as part of the
prober refactoring.
1. Make reason field of StatusReport objects in kubelet in CamelCase format.
2. Add Message field for ContainerStateWaiting to describe detail about Reason.
3. Make reason field of Events in kubelet in CamelCase format.
4. Update swagger,deep-copy and so on.
A lot of packages use StringSet, but they don't use anything else from
the util package. Moving StringSet into another package will shrink
their dependency trees significantly.
Both GetNode and the cache.ListWatch listfunc in the
kubelet package call List unnecessary.
GetNodeInfo is sufficient for GetNode and makes looping
through a list of nodes to check for a matching name
unnecessary.
resolves#13476
1. Add EvnetRecordQps and EventBurst parameter in kubelet.
2. If EvnetRecordQps and EventBurst was set, rate limit events in kubelet
with a independent ratelimiter as setted.
PR #13293 added a safety check to not remove a pod directory if the child
volumes directory is not empty. This logic is faulty because kubelet may have
directory structures podUID/volumes/volumeKind/volumeName. E.g.,
`056db95d-50ee-11e5-a2e4-42010af0ba1d/volumes/kubernetes.io~empty-dir/default-token-al3r2`
This change fixes that by properly listing all volumes under a pod.
Before, kubelet performs global cleanup tasks every iteration. After the
PR #13003, kubelet performs the tasks on every sync internval (10 seconds).
This PR decouples the housekeeping period with the sync internval to ensure
that kubelet cleans up promptly, while not too often (no more than once every
minimum housekeeping period).
Allow the user to specify the resolver configuration file that is used
to determine the default DNS parameters. This defaults to the system's
/etc/resolv.conf.
Currently, whenever there is any update, kubelet would force all pod workers to
sync again, causing resource contention and hence performance degradation.
This commit flips kubelet to use incremental updates (as opposed to snapshots).
This allows us to know what pods have changed and send updates to those pod
workers only. The `SyncPods` function has been replaced with individual
handlers, each handling an operation (ADD, REMOVE, UPDATE). Pod workers are
still triggered periodically, and kubelet performs periodic cleanup as well.
This commit also spawns a new goroutine solely responsible for killing pods.
This is necessary because pod killing could hold up the sync loop for
indefinitely long amount of time now user can define the graceful termination
period in the container spec.
We chose to use podFullName (name_namespace) as key in the status manager
because mirror pod and static pod share the same status. This is no longer
needed because we do not store statuses for static pods anymore (we only
store statuses for their mirror pods). Also, reviously, a few fixes were
merged to ensure statuses are cleaned up so that a new pod with the same
name would not resuse an old status.
This change cleans up the code by using UID as key so that the code would
become less brittle.
The sync loop should check for terminated pods that are no longer
running and clear them. The status loop should never write status
if the pod UID changes. Mirror pods should be deleted immediately
rather than gracefully.
Avoid TTL by deleting pods immediately when they aren't
scheduled, and letting the Kubelet delete them otherwise.
Ensure the Kubelet uses pod.Spec.TerminationGracePeriodSeconds
when no pod.DeletionGracePeriodSeconds is available.
Getting the public IP a container is supposed to use is O(hard),
and usually involves ugly gyrations in python or with interfaces.
Using the downward API means that the IP Kube is announcing to
other endpoints is also visible inside the container for pods to
identify themselves.
Eventually we would like to replace the all-encompassing SyncPods function with
more well-defined, smaller functions. This would not only help with the
readability and profiling of the code, it'd also set in motion for the plans to
trigger pod worker individually based on the content of the pod updates.
This commit serves as the first step of that, while avoiding breaking all unit
tests by preserving the SyncPods function for the time being.
/runningpods returns a list of pods currently running on the kubelet. The list
is composed by examining the container runtime, and may be different from the
desired pods to run known by kubelet.
This is useful for tests to verify that pods are indeed deleted on nodes.
Add a new latency metric for the time from seeing the pod for the first time
to starting a pod worker for it.
Also, change PodStartLatency to include this initial processing latency.
Refactor GetNodeHostIP into pkg/util/node (instead of pkg/util to break import cycle).
Include internalIP in gce NodeAddresses. Remove NodeLegacyHostIP
Fixes#8569.
This requires the DNS server to be running kube2sky v1.6 or higher (part of
release 0.18). Users with older kube2sky MUST NOT update to this kubelet until
they upgrade DNS. Versions of kube2sky >= 1.6 support both old and new style
names. Old style names are deprectaed and will be removed around the time of
kubernetes v1.0 release.
This commit wires together the graceful delete option for pods
on the Kubelet. When a pod is deleted on the API server, a
grace period is calculated that is based on the
Pod.Spec.TerminationGracePeriodInSeconds, the user's provided grace
period, or a default. The grace period can only shrink once set.
The value provided by the user (or the default) is set onto metadata
as DeletionGracePeriod.
When the Kubelet sees a pod with DeletionTimestamp set, it uses the
value of ObjectMeta.GracePeriodSeconds as the grace period
sent to Docker. When updating status, if the pod has DeletionTimestamp
set and all containers are terminated, the Kubelet will update the
status one last time and then invoke Delete(pod, grace: 0) to
clean up the pod immediately.
Add support for pluggable Docker exec handlers. The default handler is
now Docker's native exec API call. The previous default, nsenter, can be
selected by passing --docker-exec-handler=nsenter when starting the
kubelet.
This generalizes the handling of containers in the
ContainerManager.
Also introduces the ability to determine how much
resources are reserved for those system containers.
The system container is a resource-only container which contains all
non-kernel processes that are not already part of a container. This will
allow monitoring of their resource usage and limiting it (eventually).
This patch substitutes the misleading reason "unknown" for the event
recording. For symmetry with kubelet's message "online" the conditions
Unknown and False are reported as "offline".
Signed-off-by: Federico Simoncelli <fsimonce@redhat.com>
- Delete nodes when they are no longer ready and don't exist in the
cloud provider.
- Label each node with it's hostname.
- Add flag to skip node registration.
- Add a test for registering an existing node.
We recently changed `SyncPods` to filter out terminated pods at the beginning
for two reasons:
* performance: kubelet no longer keeps goroutines to checks containers for
terminated pods.
* correctness: kubelet relies on inspecting dead containers to generate
pod status. Because dead containers may get garbage collected and
kubelet does not have checkpoints yet, syncing terminated pod could
lead to modifying the status of a terminated pod.
However, even though kubelet should not *sync* the terminated pods, it
should not attempt to remove the directories and volumes for such
pods as long as they have not been deleted. This change fixes aggresive
directory removal by passing all pods (including terminated pods) to the
cleanup functions.
Per-pod workers have sufficient knowledge to determine whether a pod has
exceeded the active deadline, and they set the status at the end of each sync.
Move the active deadline check to generatePodStatus so that per pod workers
can update the pod status directly. This eliminates the possibility of a race
condition where both SyncPods and the pod worker are updating the status, which
could lead to temporary erratic pod status behavior (pod phase: failed ->
running -> failed).
Pod statuses are periodically writtien to the status manager, and status
manager sets the start time of the pod. All non-status-modifying code should
perform cache lookup and should not attempt to generate pod status on its own.
Currently, kubelet doesn't filter out terminated pods before determining whether
a pod fits. This could lead to duplicated events for rejecting the pods. This
change fixes that.
This change also groups all related pod fitness checking functions into one
function to improve readability.
Kubelet will stop accepting new pods if it detects low disk space on root fs or fs holding docker images.
Running pods are not affected. low-diskspace-threshold-mb is used to configure the low diskspace threshold.