This change cleans up the pod manager extensively so that
* Mirror pods are actually stored in the pod manager.
* Both (non-mirror) pods and mirror pods are indexed by UID and full name for
easy lookup and mapping. This is required for the next change to send
full pod along with the pod status update.
This change also renames mirrorManager as mirrorClient since it is merely a
client to contact the API server and create/delete mirror pods.
This change moves pod array and mirrorPods into podManager, along with all
methods accessing these internal pod storages. This is the first step of the
refactoring, and no function change is involved.
Kubelet supports retrieving stats for pods/containers with and without UID.
This does not always work for the static pods because users may get the UIDs of
the mirror pods from the API server, and use them to query Kubelet. In this
case, Kubelet would fail to locate the containers due to mismatched UIDs.
This change adds a intenral mirror to static pod UID mapping and teaches all
public-facing functions to perform UID lookup before proceeding. This allows
users to use either mirror or static pod's UID to retrieve stats.
Currently, API server is not aware of the static pods (manifests from
sources other than the API server, e.g. file and http) at all. This is
inconvenient since users cannot check the static pods through kubectl.
It is also sub-optimal because scheduler is unaware of the resource
consumption by these static pods on the node.
This change syncs the information back to the API server by creating a
mirror pod via API server for each static pod.
- Kubelet creates containers for the static pod, as it would do
normally.
- If a mirror pod gets deleted, Kubelet will re-create one. The
containers are sync'd to the static pods, so they will not be
affected.
- If a static pod gets removed from the source (e.g. manifest file
removed from the directory), the orphaned mirror pod will be deleted.
Note that because events are associated with UID, and the mirror pod has
a different UID than the original static pod, the events will not be
shown for the mirror pod when running `kubectl describe pod
<mirror_pod>`.
These containers may be caused by a change in the Kubernetes naming
convention. The old containers are killed, the new ones started, but the
old ones are never GC'd. This change makes Kubelet GC all Kubernetes
containers, old and new.
Fixes#5372.
This will make it easier to start running the real cAdvisor alongside
Kubelet. This change is primarily no-op refactoring. The main behavioral
change is that we always create a cAdvisor interface and expect it to
always be available. When we make a request, if cAdvisor is not
connected the request fails with a connection error. This failure is
handled today as well.
There are two main goals for this change.
1. Fix the naming scheme in kubelet so that it accepts DNS subdomain
name/namespaces correctly (#4920). The design is discussed in #3453.
2. Prepare for syncing the static pods back to the apiserver(#4090). This
includes
- Eliminate the source component in the internal full pod name (#4922). Pods
no longer need sources as they will all be sync'd via apiserver.
- Changing the naming scheme for the static (file-, http-, and etcd-based)
pods such that they are distinguishable when syncing back to the apiserver.
The changes includes:
* name = <pod.Name>-<hostname>
* namespace = <cluster_namespace> (i.e. "default" for now).
* container_name = k8s_<contianer_name>.<hash_of_container>_<pod_name>_<namespace>_<uid>_<random>
Note that this is not backward-compatible, meaning the kubelet won't recognize
existing running containers using the old naming scheme.
When a host port conflict is detected, kubelet should set the pod status to
fail. The failed status will then be polled by other components at a later time,
which allows replication controller to create a new pod if necessary.
To achieve this, this change stores the pod status information in a status map
upon the detecton of port conflict. GetPodStatus() consults this status map
before attempting to query docker. The entries in the status map will be removed
when the pod is no longer associated with the node.
Currently, kubelet silently ignores pods that caused host port conflict. This
commit surfaces the error by recording an event.
It also makes sure that kubelet iterates through the pods in the order of the
creation timestamp, which ensures that pods created later are ignored on
conflict.
Setting the oom_score_adj of the PID of the POD container to -100 which is less
than the default of 0. This ensures that this PID is the last OOM victim
chosen by the kernel.
Fixes#3067.
Since the parsing function doesn't return an error all the components
returned empty strings. This caused us to enforce the MaxContainerLimit
as a global limit instead of a per-container limit.
Fixes#4413.
Break up the monolithic volumes code in kubelet into very small individual
modules with a well-defined interface. Move them all into their own packages
and beef up testing along the way.
Added a kubelet config source for watching pods on apiserver.
The pods are converted to boundpods for merging with other
config sources.
The preferred way to create a kubelet is now to pass an apiserver
client but not an etcd client. Changed cmd/integration to use
apiserver to talk to kubelets. And cmd/kubernetes.
Unit, integration, and e2e tests pass, except for a failure of the pd
e2e test which was unrelated.
This should only have been triggered by tests, and those should now be fixed.
I tested by calling panic() if UID was blank in BuildDockerName() or if number
of fields was < 5 in ParseDockerName(). All errors were fixed.
Make all kubelet config sources ensure that UID and Namespace are defaulted, if
need be.
We can *almost* disable the "if blank" logic for UID, except for tests that
call APIs that do not run through SyncPods. We really ought to be enforcing
invariants better.
This is makes it possible to read back "known" pods from disk without
getting other (non-pod) kubelet dirs in the mix. Ditto for containers
within a pod. This is just saner overall. Pods now nest in a pods/
dir. Likewise containers.
This adds --cluster_dns and --cluster_domain flags to kubelet. If
non-empty, kubelet will set docker --dns and --dns-search flags based on
these. It uses the cluster DNS and appends the hosts's DNS servers.
Likewise for DNS search domains.
This also adds API support to bypass cluster DNS entirely, needed to
bootstrap DNS.
There are three values that uniquely identify a pod on a host -
the configuration source (etcd, file, http), the pod name, and the
pod namespace. This change ensures that configuration properly
makes those names unique by changing podFullName to contain both
name (currently ID in v1beta1, Name in v1beta3) and namespace.
The Kubelet does not properly handle information requests for
pods not in the default namespace at this time.
Public access to the DockerHub is not guaranteed in all environments,
add a flag to the kubelet that allows it to use a different image (like
one on a private registry) as well as only pull the first time the
image is needed.
Fixes#1545
Also rename some to other names that make better reading. There are still a
bunch of "make" functions but they do things like assemble a string from parts
or build an array of things. It seemed that "make" there seemed fine. "New"
is for "constructors".
This makes two main changes:
- Runs syncPod in a separate Go routine (and enforces only one of those
runs at a time).
- Uses the pod list to determine if a container should be running or
should be killed (used to use the output of syncPod).
Since Docker pulls are synchronized by the Docker daemon we still block
on that, but pods can now be removed and prepared for starting without
blocking on long pulls.
Also transfer the Kubelet from using ContainerManifest.ID to source specific
identifiers with namespacing. Move goroutine behavior out of kubelet/ and
into integration.go and cmd/kubelet/kubelet.go for better isolation.
The tests for Docker image name parsing are repetitive and do not
cover enough test cases.
Refactor the tests to use table testing and add additional test cases.
Adds the framework for external volume mounts.
Currently supports bare host directory mounts.
Modifies the API to support host directory mounts from Volumes
instead of VolumeMounts.