Automatic merge from submit-queue
Allow PVs to specify supplemental GIDs
Retry of https://github.com/kubernetes/kubernetes/pull/28691 . Adds a Kubelet helper function for getting extra supplemental groups
Automatic merge from submit-queue
rkt: Don't return if the service file doesn't exist when killing the pod
Remove an unused logic. Also this prevents the KillPod() from failing
when the service file doesn't exist. E.g., it can be removed by garbage
collection in a rare case:
1, There are already more than `gcPolicy.MaxContainers` containers running
on the host.
2, The new pod(A) starts to run but doesn't enter 'RUNNING' state yet.
3, GC is triggered, and it sees the pod(A) is in an inactive state (not running),
and the it needs to remove the pod to force the `gcPolicy.MaxContainers`.
4, GC fails to remove the pod because `rkt rm` fails when the pod is running,
but it removes the service file anyway.
5, Follow up KillPod() call will fail because it cannot find the service file
on disk.
Also this is possible only when the pod has been in prepared state for longer
than 1 min, which sounds like another issue.
cc @kubernetes/sig-rktnetes
Remove an unused logic. Also this prevents the KillPod() from failing
when the service file doesn't exist. E.g., it can be removed by garbage
collection in a rare case:
1, There are already more than `gcPolicy.MaxContainers` containers running
on the host.
2, The new pod(A) starts to run but doesn't enter 'RUNNING' state yet.
3, GC is triggered, and it sees the pod(A) is in an inactive state (not running),
and the it needs to remove the pod to force the `gcPolicy.MaxContainers`.
4, GC fails to remove the pod because `rkt rm` fails when the pod is running,
but it removes the service file anyway.
5, Follow up KillPod() call will fail because it cannot find the service file
on disk.
Also this is possible only when the pod has been in prepared state for longer
than 1 min, which sounds like another issue.
Automatic merge from submit-queue
rkt: Copy the /etc/hosts /etc/resolv.conf into pod dir before mounting.
rkt: Copy the /etc/hosts /etc/resolv.conf into pod dir before mounting.
This enables the container to modify the /etc/hosts/ /etc/resolv.conf without changing the host's ones.
With this PR, we now match the docker's behavior.
Fix https://github.com/kubernetes/kubernetes/issues/29022
cc @kubernetes/sig-rktnetes @quentin-m
This enables the container to modify the /etc/hosts/ /etc/resolv.conf
without changing the host's ones.
With this PR, we now match the docker's behavior.
Automatic merge from submit-queue
Support terminal resizing for exec/attach/run
```release-note
Add support for terminal resizing for exec, attach, and run. Note that for Docker, exec sessions
inherit the environment from the primary process, so if the container was created with tty=false,
that means the exec session's TERM variable will default to "dumb". Users can override this by
setting TERM=xterm (or whatever is appropriate) to get the correct "smart" terminal behavior.
```
Fixes#13585
Add support for terminal resizing for exec, attach, and run. Note that for Docker, exec sessions
inherit the environment from the primary process, so if the container was created with tty=false,
that means the exec session's TERM variable will default to "dumb". Users can override this by
setting TERM=xterm (or whatever is appropriate) to get the correct "smart" terminal behavior.
Previously when stage1 annotation is provided, we only checks if
the kubelet allows privileged, which is not useful as that is a global
setting.
Instead, we should check if the pod has explicitly set the privileged
security context to 'true'.
Automatic merge from submit-queue
rkt: Refactor grace termination period.
Add `TimeoutStopSec` service option to support grace termination.
Found we can improve the grace-period-termination by adding a systemd service option.
cc @kubernetes/sig-rktnetes
This enables rkt to use cached stage1 image instead of unpacking the
stage1 image every time for every pod.
After this change, users need to preload the stage1 images in order to
enable rkt to find the stage1 image with the name specified by this flag.
Since appc requires gid to be non-empty today (https://github.com/appc/spec/issues/623),
we have to error out when gid is empty instead of using the root gid.
Automatic merge from submit-queue
rkt: Replace 'journalctl' with rkt's GetLogs() API.
This replaced the `journactl` shell out with rkt's GetLogs() API.
Fixes#26997
To make this fully work, we need rkt to have this patch #https://github.com/coreos/rkt/pull/2763
cc @kubernetes/sig-node @euank @alban @iaguis @jonboulle
This fixed a panic where the returned pod network status is nil.
Also this makes lkvm stage1 able to run inside a user defined
network, where the network name needs to be 'rkt.kubernetes.io'.
Also fixed minor issues such as passing the wrong pod UID, ignoring
logging errors.
Automatic merge from submit-queue
rkt: Wrap exec errors as utilexec.ExitError
This is needed by the exec prober to distinguish error types and exit
codes correctly. Without this, the exec prober used for liveness probes
doesn't identify errors correctly and restarts aren't triggered. Fixes#26456
An alternative, and preferable solution would be to use utilexec
everywhere, but that change is much more involved and should come at a
later date. Unfortunately, until that change is made, writing tests for
this is quite difficult.
cc @yifan-gu @sjpotter
Automatic merge from submit-queue
rkt: Get logs via syslog identifier
This change works around https://github.com/coreos/rkt/issues/2630
Without this change, logs cannot reliably be collected for containers
with short lifetimes.
With this change, logs cannot be collected on rkt versions v1.6.0 and
before.
I'd like to also bump the required rkt version, but I don't want to do that until there's a released version that can be pointed to (so the next rkt release).
I haven't added tests (which were missing) because this code will be removed if/when logs are retrieved via the API. I have run E2E tests with this merged in and verified the tests which previously failed no longer fail.
cc @yifan-gu
Automatic merge from submit-queue
rkt: Add pod selinux support.
Currently only pod level selinux context is supported, besides when
running selinux, we will not be able to use the overlay fs, see:
https://github.com/coreos/rkt/issues/1727#issuecomment-173203129.
cc @kubernetes/sig-node @alban @mjg59 @pmorie
This is needed by the exec prober to distinguish error types and exit
codes correctly.
An alternative, and preferable solution would be to use utilexec
everywhere, but that change is much more involved and should come at a
later date. Unfortunately, until that change is made, writing tests for
this is quite difficult.
This change works around https://github.com/coreos/rkt/issues/2630
Without this change, logs cannot reliably be collected for containers
with short lifetimes.
With this change, logs cannot be collected on rkt versions v1.6.0 and
before.
With quotes, the service doesn't start for systemd 219 with the error
saying the path of the netns cannot be found.
This PR fixes the bug by removing the quotes surround the netns path.
Automatic merge from submit-queue
rkt: Fix panic in setting ReadOnlyRootFS
What the title says. I wish this method were broken out in a reasonably unit testable way. fixing this panic is more important for the second though, testing will come in a later commit.
I observed the panic in a `./hack/local-up-cluster.sh` run with rkt as the container runtime.
This is also the panic that's failing our jenkins against master ([recent run](https://console.cloud.google.com/m/cloudstorage/b/rktnetes-jenkins/o/logs/kubernetes-e2e-gce/1946/artifacts/jenkins-e2e-minion-group-qjh3/kubelet.log for the log output of a recent run))
cc @tmrts @yifan-gu
Automatic merge from submit-queue
rkt: Pass through podIP
This is needed for the /etc/hosts mount and the downward API to work.
Furthermore, this is required for the reported `PodStatus` to be
correct.
The `Status` bit mostly worked prior to #25062, and this restores that
functionality in addition to the new functionality.
In retrospect, the regression in status is large enough the prior PR should have included at least some of this; my bad for not realizing the full implications there.
#25902 is needed for downwards api stuff, but either merge order is fine as neither will break badly by itself.
cc @yifan-gu @dcbw
This replaces the previous creation of mounts from the `volumeGetter`
with mounts provided via RunContainerOptions.
This is motivated by the fact that the latter has a more complete set of
mounts (e.g. the `/etc/hosts` one created in kubelet.go).
Automatic merge from submit-queue
Use read-only root filesystem capabilities of rkt
Propagates `api.Container.SecurityContext.ReadOnlyRootFileSystem` flag to rkt container runtime.
cc @yifan-gu
Fixes#23837
This is needed for the /etc/hosts mount and the downward API to work.
Furthermore, this is required for the reported `PodStatus` to be
correct.
The `Status` bit mostly worked prior to #25062, and this restores that
functionality in addition to the new functionality.
This provides a basic implementation for setting a stage1 on a per-pod
basis via an annotation.
It's possible this feature should be gated behind additional knobs, such
as a kubelet flag to filter allowed stage1s, or a check akin to what
priviliged gets in the apiserver.
Currently, it checks `AllowPrivileged`, as a means to let people disable
this feature, though overloading it as stage1 and privileged isn't
ideal.
If a pod has not printed anything to stdout/stderr, it's expected
behaviour to get `-- No entries --`, even when requesting json output.
Prior to this change, a warning would be printed in such an occasion.
Automatic merge from submit-queue
Remove RunInContainer interface in Kubelet Runtime interface
According to #24689, we should merge RunInContainer and ExecInContainer in the container runtime interface.
@yujuhong @kubernetes/sig-node
Automatic merge from submit-queue
kubelet: Remove redundant `Container.Created`
As far as I can tell, this has been supplanted by a) the `DockerJSON.CreatedAt` field and b) the
`ContainerStatus.CreatedAt`, where the first is used for creating the
second.
The `.Created` field was only written to as far as I can see.
cc @yifan-gu & @Random-Liu
Is there any reason we might want to keep this around?
Automatic merge from submit-queue
rkt: Add post-start hook support.
This adds a poll-and-timeout procedure after the pod is
started, to make sure the post-start hooks execute when the
container is actually running.
This is a temporal workaround for implementing post-hooks,
a long term solution is to use lifecycle event to trigger
those hooks, see https://github.com/kubernetes/kubernetes/issues/23084.
Also this fixes a bug of getting container ID for a non-running
container when running pre-stop hook.
cc @sjpotter @euank @kubernetes/sig-node
Automatic merge from submit-queue
Fix use of docker removed ParseRepositoryTag() function
Docker has removed the ParseRepositoryTag() function in
leading to failures using the kubernetes Go client API.
Failure:
```
../k8s.io/kubernetes/pkg/util/parsers/parsers.go:30: undefined: parsers.ParseRepositoryTag
```
Automatic merge from submit-queue
Collect and expose runtime's image storage usage via Kubelet's /stats/summary endpoint
This information is useful to users since docker images are typically not stored on the root filesystem.
Kubelet will also consume this feature in the future to decide is evicting images will help with disk usage on the nodes.
cc @kubernetes/sig-node
This has been supplanted by a) the DockerJSON.CreatedAt field and b) the
ContainerStatus.CreatedAt, where the first is used for creating the
second.
The `.Created` field was only written to as far as I can see.
Docker has removed the ParseRepositoryTag() function in
leading to failures using the kubernetes Go client API.
Lets use github.com/docker/distribution reference.ParseNamed()
instead.
Failure:
../k8s.io/kubernetes/pkg/util/parsers/parsers.go:30: undefined: parsers.ParseRepositoryTag
Automatic merge from submit-queue
Refactor image related functions to use docker engine-api
ref #23563
Hopes can do some help, cc @Random-Liu
If it's ok, will add more work here.
This adds a poll-and-timeout procedure after the pod is
started, to make sure the post-start hooks execute when the
container is actually running.
This is a temporal workaround for implementing post-hooks,
a long term solution is to use lifecycle event to trigger
those hooks, see https://github.com/kubernetes/kubernetes/issues/23084.
Also this fixes a bug of getting container ID for a non-running
container when running pre-stop hook.
This is implemented via touching a file on stop as a hook in the systemd
unit. The ctime of this file is then used to get the `finishedAt` time
in the future.
In addition, this changes the `startedAt` and `createdAt` to use the api
server's results rather than the annotations it previously used.
It's possible we might want to move this into the api in the future.
Fixes#23887
Automatic merge from submit-queue
rkt: Fix hostnetwork.
Mount hosts' /etc/hosts, /etc/resolv.conf, set host's hostname
when running the pod in the host's network.
Fix#24235
cc @kubernetes/sig-node
Automatic merge from submit-queue
rkt: Use rkt pod's uuid as the systemd service file's name.
Previously, the service file's name is 'k8s_${POD_UID}.service',
which means we need to `systemctl daemon-reload` if the we replace
the content of the service file (e.g. pod is restarted).
However this makes the journal in the previous pod get disconnected.
This PR solves the issue by using the unique rkt uuid as the service
file's name. After the change, the service file's name will be:
'k8s_${rkt_uuid}.service'.
Fix#23691
Automatic merge from submit-queue
rkt: Update the directory path for saving auth config.
Since #23308 is merged, now we have more stable way to determine where to store the auth configs.
cc @yujuhong @sjpotter
Automatic merge from submit-queue
Allow lazy binding in credential providers; don't use it in AWS yet
This is step one for cross-region ECR support and has no visible effects yet.
I'm not crazy about the name LazyProvide. Perhaps the interface method could
remain like that and the package method of the same name could become
LateBind(). I still don't understand why the credential provider has a
DockerConfigEntry that has the same fields but is distinct from
docker.AuthConfiguration. I had to write a converter now that we do that in
more than one place.
In step two, I'll add another intermediate, lazy provider for each AWS region,
whose empty LazyAuthConfiguration will have a refresh time of months or years.
Behind the scenes, it'll use an actual ecrProvider with the usual ~12 hour
credentials, that will get created (and later refreshed) only when kubelet is
attempting to pull an image. If we simply turned ecrProvider directly into a
lazy provider, we would bypass all the caching and get new credentials for
each image pulled.
Mount hosts' /etc/hosts, /etc/resolv.conf, set host's hostname
when running the pod in the host's network.
Besides, do not set the DNS flags when running in host's network.
Previously, the service file's name is 'k8s_${POD_UID}.service',
which means we need to `systemctl daemon-reload` if the we replace
the content of the service file (e.g. pod is restarted).
However this makes the journal in the previous pod get disconnected.
This PR solves the issue by using the unique rkt uuid as the service
file's name. After the change, the service file's name will be:
'k8s_${rkt_uuid}.service'.
This is step one for cross-region ECR support and has no visible effects yet.
I'm not crazy about the name LazyProvide. Perhaps the interface method could
remain like that and the package method of the same name could become
LateBind(). I still don't understand why the credential provider has a
DockerConfigEntry that has the same fields but is distinct from
docker.AuthConfiguration. I had to write a converter now that we do that in
more than one place.
In step two, I'll add another intermediate, lazy provider for each AWS region,
whose empty LazyAuthConfiguration will have a refresh time of months or years.
Behind the scenes, it'll use an actual ecrProvider with the usual ~12 hour
credentials, that will get created (and later refreshed) only when kubelet is
attempting to pull an image. If we simply turned ecrProvider directly into a
lazy provider, we would bypass all the caching and get new credentials for
each image pulled.
Add GeneratePodHostNameAndDomain() to RuntimeHelper to
get the hostname of the pod from kubelet.
Also update the logging flag to change the journal match from
_HOSTNAME to _MACHINE_ID.
Using json makes this robust to ENTRYPOINT/CMD that contains space.
Also removed 'RemainAfterExit' option, originally this option is
useful when we implement GetPods() by 'systemctl list-units'.
However since we are using rkt API service now, it's no longer needed.
This enables rkt runtime to setup versions during creation,
this fixes a kubelet nil pointer panic when kubelet tries to get the
rkt versions but it's not set.
Use an array to store the pod IDs and use that to build the pod array with consistent ordering,
instead of map ordering, which is random and causes test flakes.
If 'TerminationMessagePath' in container spec is set, then
We will mount the termination message log into the container.
Also in GetPodStatus, if the container exits and the 'TerminationMessagePath'
is set, then the 'message' field in container state will be populated.