Automatic merge from submit-queue
WIP v0 NVIDIA GPU support
```release-note
* Alpha support for scheduling pods on machines with NVIDIA GPUs whose kubelets use the `--experimental-nvidia-gpus` flag, using the alpha.kubernetes.io/nvidia-gpu resource
```
Implements part of #24071 for #23587
I am not familiar with the scheduler enough to know what to do with the scores. Mostly punting for now.
Missing items from the implementation plan: limitranger, rkt support, kubectl
support and docs
cc @erictune @davidopp @dchen1107 @vishh @Hui-Zhi @gopinatht
Implements part of #24071
I am not familiar with the scheduler enough to know what to do with the scores. Punting for now.
Missing items from the implementation plan: limitranger, rkt support, kubectl
support and user docs
Automatic merge from submit-queue
kubelet: Remove redundant `Container.Created`
As far as I can tell, this has been supplanted by a) the `DockerJSON.CreatedAt` field and b) the
`ContainerStatus.CreatedAt`, where the first is used for creating the
second.
The `.Created` field was only written to as far as I can see.
cc @yifan-gu & @Random-Liu
Is there any reason we might want to keep this around?
This has been supplanted by a) the DockerJSON.CreatedAt field and b) the
ContainerStatus.CreatedAt, where the first is used for creating the
second.
The `.Created` field was only written to as far as I can see.
This is implemented via touching a file on stop as a hook in the systemd
unit. The ctime of this file is then used to get the `finishedAt` time
in the future.
In addition, this changes the `startedAt` and `createdAt` to use the api
server's results rather than the annotations it previously used.
It's possible we might want to move this into the api in the future.
Fixes#23887
Mount hosts' /etc/hosts, /etc/resolv.conf, set host's hostname
when running the pod in the host's network.
Besides, do not set the DNS flags when running in host's network.
Automatic merge from submit-queue
Kubelet: Remove nsinit related code and bump up minimum docker apiversion
Docker has native exec support after 1.3.x. We never need this code now.
As for the apiversion, because Kubernetes supports 1.8.x - 1.10.x now, we should bump up the minimum docker apiversion.
@yujuhong I checked the [changes](https://github.com/docker/engine-api/blob/master/types/versions/v1p20/types.go), we are not relying on any of those changes. So #23506 should work with docker 1.8.x+
Add GeneratePodHostNameAndDomain() to RuntimeHelper to
get the hostname of the pod from kubelet.
Also update the logging flag to change the journal match from
_HOSTNAME to _MACHINE_ID.
This change removes RuntimeCache in the pod workers and the syncPod() function.
Note that it doesn't deprecate RuntimeCache completely as other components
still rely on the cache.
This address a TODO when collecting the node version information so it
will properly report the configured runtime and its version. Previously,
this was hardcoded to "docker://" and the docker version, and would show
"docker://1.9.1" even when the kubelet was configured to use rkt.
With this change, it will use the runtime's Type() and Version() data.
This also changes the container.Runtime interface to add an APIVersion()
method. This can be used when the runtime has separate versions for the
engine and the API, such as with Docker. The Docker minimum version
validation has been updated to use APIVersion(), and
DockerManager.Version() now returns the engine version.
This cache will be used to stores the PodStatus of all pods/containers
visible on the node. This will elimiate the need for pod workers to query the
container runtime directly.
Currently, pleg would report a event if a container transitions from running to
exited between relisting. However, if would not report any event if a container
gets stopped and removed between relisting. This event will eventually be
handled when the pod syncs periodically, but this is undesirable. This change
ensures that we detect all such events.
Refactor Kubelet's server functionality into a server package. Most
notably, move pkg/kubelet/server.go into
pkg/kubelet/server/server.go. This will lead to better separation of
concerns and a more readable code hierarchy.
This feature is no longer useful pods don't sync as often. For batch
creation/deletions/syncs, the cache will be up-to-date for most pods since it
will be updated frequently. For other cases, continue updating for two more
seconds don't usually help, as temporal locality doesn't hold across pod syncs.
This change introduces pod lifecycle event generator (PLEG), and adds a generic
PLEG. The generic PLEG relies on relisting to discover container events, and is
container-runtime-agnostic. Both docker and rkt are changed to use generic
PLEG.
Change all references to the container ID in pkg/kubelet/... to the
strong type defined in pkg/kubelet/container: ContainerID
The motivation for this change is to make the format of the ID
unambiguous, specifically whether or not it includes the runtime
prefix (e.g. "docker://").
Each container with a readiness has an individual go-routine which
handles periodic probing for that container. The results are cached, and
written to the status.Manager in the pod sync path.
Container runtime interface currently doesn't support GetContainers and no
test should be using fakeRuntime.ContainerList. Remove it to prevent accidental
use.
Increase the supported controls on pod logging. Add validaiton to pod
log options. Ensure the Kubelet is using a consistent, structured way to
process pod log arguments.
Add ?sinceSeconds=<durationInSeconds>, &sinceTime=<RFC3339>, ?timestamps=<bool>,
?tailLines=<number>, and ?limitBytes=<number>
In many cases clients may wish to view not ready addresses for endpoints
in order to do set membership prior to a pod being ready. For instance,
a pod that uses the service endpoints to connect to other pods under
the same service, but does not want to signal ready before it has
contacted at least a minimal number of other pods.
This is backwards compatible with old servers and clients. There is
an additional cost in size of endpoints before services ramp up, which
will add minor CPU and memory use for services that have a significant
number of pods which have not become ready.
1. Make reason field of StatusReport objects in kubelet in CamelCase format.
2. Add Message field for ContainerStateWaiting to describe detail about Reason.
3. Make reason field of Events in kubelet in CamelCase format.
4. Update swagger,deep-copy and so on.
Since it takes a while (1-2mins) for kubelet to pulling a big image
(>500MB). Just showing "Pending" for pod status is not very helpful.
This commit introduces a "pulling" event, and inserts it before the
kubelet starts to pull an image.
/runningpods returns a list of pods currently running on the kubelet. The list
is composed by examining the container runtime, and may be different from the
desired pods to run known by kubelet.
This is useful for tests to verify that pods are indeed deleted on nodes.
This commit wires together the graceful delete option for pods
on the Kubelet. When a pod is deleted on the API server, a
grace period is calculated that is based on the
Pod.Spec.TerminationGracePeriodInSeconds, the user's provided grace
period, or a default. The grace period can only shrink once set.
The value provided by the user (or the default) is set onto metadata
as DeletionGracePeriod.
When the Kubelet sees a pod with DeletionTimestamp set, it uses the
value of ObjectMeta.GracePeriodSeconds as the grace period
sent to Docker. When updating status, if the pod has DeletionTimestamp
set and all containers are terminated, the Kubelet will update the
status one last time and then invoke Delete(pod, grace: 0) to
clean up the pod immediately.
* Start using FakeRuntime to replace FakeDockerClient in unit tests.
* Move and adapt docker-specific tests (e.g. creating/deleting infra
containers) to manager_test.go in dockertools.
If a pod worker sees stale pods from the runtime cache which were retrieved
before their last sync finished, it may think that the pod were not started
correctly, and attemp to fix that by killing/restarting containers.
There are two issues that may cause runtime cache to store stale pods:
1. The timstamp is recorded *after* getting the pods from the container
runtime. This may lead the consumer to think the pods are newer than they
actually are.
2. The cache updates are triggered by many goroutines (pod workers, and the
updating thread). There is no mechanism to enforece that the cache would
only be updated to newer pods.
This change fixes the above two issues by making sure one always record the
timestamp before getting pods from the container runtime, and updates the
cached pods only if the timestamp is newer.
This change is part of the efforts to make DockerManager implement the Runtime
interface.
The change also modifies the interface slightly to work with existing
code, and aggregates the type converting functions to convert.go.
Remove GetDockerServerVersion() from DockerContainerCommandRunner interface,
replaced with runtime.Version(). Also added Version type in runtime for version
comparision.