Automatic merge from submit-queue
rkt: Add post-start hook support.
This adds a poll-and-timeout procedure after the pod is
started, to make sure the post-start hooks execute when the
container is actually running.
This is a temporal workaround for implementing post-hooks,
a long term solution is to use lifecycle event to trigger
those hooks, see https://github.com/kubernetes/kubernetes/issues/23084.
Also this fixes a bug of getting container ID for a non-running
container when running pre-stop hook.
cc @sjpotter @euank @kubernetes/sig-node
Automatic merge from submit-queue
Collect and expose runtime's image storage usage via Kubelet's /stats/summary endpoint
This information is useful to users since docker images are typically not stored on the root filesystem.
Kubelet will also consume this feature in the future to decide is evicting images will help with disk usage on the nodes.
cc @kubernetes/sig-node
This has been supplanted by a) the DockerJSON.CreatedAt field and b) the
ContainerStatus.CreatedAt, where the first is used for creating the
second.
The `.Created` field was only written to as far as I can see.
This adds a poll-and-timeout procedure after the pod is
started, to make sure the post-start hooks execute when the
container is actually running.
This is a temporal workaround for implementing post-hooks,
a long term solution is to use lifecycle event to trigger
those hooks, see https://github.com/kubernetes/kubernetes/issues/23084.
Also this fixes a bug of getting container ID for a non-running
container when running pre-stop hook.
This is implemented via touching a file on stop as a hook in the systemd
unit. The ctime of this file is then used to get the `finishedAt` time
in the future.
In addition, this changes the `startedAt` and `createdAt` to use the api
server's results rather than the annotations it previously used.
It's possible we might want to move this into the api in the future.
Fixes#23887
Automatic merge from submit-queue
rkt: Fix hostnetwork.
Mount hosts' /etc/hosts, /etc/resolv.conf, set host's hostname
when running the pod in the host's network.
Fix#24235
cc @kubernetes/sig-node
Mount hosts' /etc/hosts, /etc/resolv.conf, set host's hostname
when running the pod in the host's network.
Besides, do not set the DNS flags when running in host's network.
Previously, the service file's name is 'k8s_${POD_UID}.service',
which means we need to `systemctl daemon-reload` if the we replace
the content of the service file (e.g. pod is restarted).
However this makes the journal in the previous pod get disconnected.
This PR solves the issue by using the unique rkt uuid as the service
file's name. After the change, the service file's name will be:
'k8s_${rkt_uuid}.service'.
Add GeneratePodHostNameAndDomain() to RuntimeHelper to
get the hostname of the pod from kubelet.
Also update the logging flag to change the journal match from
_HOSTNAME to _MACHINE_ID.
Using json makes this robust to ENTRYPOINT/CMD that contains space.
Also removed 'RemainAfterExit' option, originally this option is
useful when we implement GetPods() by 'systemctl list-units'.
However since we are using rkt API service now, it's no longer needed.
This enables rkt runtime to setup versions during creation,
this fixes a kubelet nil pointer panic when kubelet tries to get the
rkt versions but it's not set.
Use an array to store the pod IDs and use that to build the pod array with consistent ordering,
instead of map ordering, which is random and causes test flakes.
If 'TerminationMessagePath' in container spec is set, then
We will mount the termination message log into the container.
Also in GetPodStatus, if the container exits and the 'TerminationMessagePath'
is set, then the 'message' field in container state will be populated.
This address a TODO when collecting the node version information so it
will properly report the configured runtime and its version. Previously,
this was hardcoded to "docker://" and the docker version, and would show
"docker://1.9.1" even when the kubelet was configured to use rkt.
With this change, it will use the runtime's Type() and Version() data.
This also changes the container.Runtime interface to add an APIVersion()
method. This can be used when the runtime has separate versions for the
engine and the API, such as with Docker. The Docker minimum version
validation has been updated to use APIVersion(), and
DockerManager.Version() now returns the engine version.
The formatting function is used often in logging. This improves the readability
by shortening the length of the call. Also change the fomartted string to
include the pod UID.
This commit builds on previous work and creates an independent
worker for every liveness probe. Liveness probes behave largely the same
as readiness probes, so much of the code is shared by introducing a
probeType paramater to distinguish the type when it matters. The
circular dependency between the runtime and the prober is broken by
exposing a shared liveness ResultsManager, owned by the
kubelet. Finally, an Updates channel is introduced to the ResultsManager
so the kubelet can react to unhealthy containers immediately.
This enables more fine-grained control over the things we want to
output. Also by closing the stdout/stderr of the journalctl process
when user hits `Ctrl-C` after `kubectl logs $POD -f`, this enables
the journalctl process to exit.
Change all references to the container ID in pkg/kubelet/... to the
strong type defined in pkg/kubelet/container: ContainerID
The motivation for this change is to make the format of the ID
unambiguous, specifically whether or not it includes the runtime
prefix (e.g. "docker://").
Each container with a readiness has an individual go-routine which
handles periodic probing for that container. The results are cached, and
written to the status.Manager in the pod sync path.
Correct port-forward data copying logic so that the server closes its
half of the data stream when socat exits, and the client closes its half
of the data stream when it finishes writing.
Modify the client to wait for both copies (client->server,
server->client) to finish before it unblocks.
Fix race condition in the Kubelet's handling of incoming port forward
streams. Have the client generate a connectionID header to be used to
associate the error and data streams for a single connection, instead of
assuming that streams n and n+1 go together. Attempt to generate a
pseudo connectionID in the server in the event the connectionID header
isn't present (older clients); this is a best-effort approach that only
really works with 1 connection at a time, whereas multiple concurrent
connections will only work reliably with a newer client that is
generating connectionID.
Previously, GetPodStatus() will return error if the pod is never
created. However we've never seen the sync loop fail because in the
beginning of the loop, if the pod is not found, it will be created.
This works fine except the pod that keeps crashing. Because the above
logic will keep restarting the pod as if it's never created.
This PR fixes the bug.
Increase the supported controls on pod logging. Add validaiton to pod
log options. Ensure the Kubelet is using a consistent, structured way to
process pod log arguments.
Add ?sinceSeconds=<durationInSeconds>, &sinceTime=<RFC3339>, ?timestamps=<bool>,
?tailLines=<number>, and ?limitBytes=<number>
Add a HostIPC field to the Pod Spec to create containers sharing
the same ipc of the host.
This feature must be explicitly enabled in apiserver using the
option host-ipc-sources.
Signed-off-by: Federico Simoncelli <fsimonce@redhat.com>
The appc spec isn't currently clear about if both fields are required, and before rkt v0.8.1 if either field
was nil it would result in a crash. Currently rkt will ignore isolators that don't have both fields set, so
I think this is a reasonable approach to making sure isolators are actually used.
This commit wires together the graceful delete option for pods
on the Kubelet. When a pod is deleted on the API server, a
grace period is calculated that is based on the
Pod.Spec.TerminationGracePeriodInSeconds, the user's provided grace
period, or a default. The grace period can only shrink once set.
The value provided by the user (or the default) is set onto metadata
as DeletionGracePeriod.
When the Kubelet sees a pod with DeletionTimestamp set, it uses the
value of ObjectMeta.GracePeriodSeconds as the grace period
sent to Docker. When updating status, if the pod has DeletionTimestamp
set and all containers are terminated, the Kubelet will update the
status one last time and then invoke Delete(pod, grace: 0) to
clean up the pod immediately.