kubelet relies on source annotation to distinguish pods from different sources.
It should always annotate the source when receiving the pod to avoid any
confusion. This commit fixes that and also make sure we don't overwrite the
first seen time.
A lot of packages use StringSet, but they don't use anything else from
the util package. Moving StringSet into another package will shrink
their dependency trees significantly.
When the pod annotations are updated in the apiserver, update the pod.
Annotations may be used to convey attributes that are required to the
pod execution, such as networking parameters.
Currently, whenever there is any update, kubelet would force all pod workers to
sync again, causing resource contention and hence performance degradation.
This commit flips kubelet to use incremental updates (as opposed to snapshots).
This allows us to know what pods have changed and send updates to those pod
workers only. The `SyncPods` function has been replaced with individual
handlers, each handling an operation (ADD, REMOVE, UPDATE). Pod workers are
still triggered periodically, and kubelet performs periodic cleanup as well.
This commit also spawns a new goroutine solely responsible for killing pods.
This is necessary because pod killing could hold up the sync loop for
indefinitely long amount of time now user can define the graceful termination
period in the container spec.
Add a new latency metric for the time from seeing the pod for the first time
to starting a pod worker for it.
Also, change PodStartLatency to include this initial processing latency.
This commit wires together the graceful delete option for pods
on the Kubelet. When a pod is deleted on the API server, a
grace period is calculated that is based on the
Pod.Spec.TerminationGracePeriodInSeconds, the user's provided grace
period, or a default. The grace period can only shrink once set.
The value provided by the user (or the default) is set onto metadata
as DeletionGracePeriod.
When the Kubelet sees a pod with DeletionTimestamp set, it uses the
value of ObjectMeta.GracePeriodSeconds as the grace period
sent to Docker. When updating status, if the pod has DeletionTimestamp
set and all containers are terminated, the Kubelet will update the
status one last time and then invoke Delete(pod, grace: 0) to
clean up the pod immediately.
pkg/service:
There were a couple of references here just as a reminder to change the
behavior of findPort. As of v1beta3, TargetPort was always defaulted, so
we could remove findDefaultPort and related tests.
pkg/apiserver:
The tests were using versioned API codecs for some of their encoding
tests. Necessary API types had to be written and registered with the
fake versioned codecs.
pkg/kubectl:
Some tests were converted to current versions where it made sense.
This change appends the full hostname to the mirror pod name (instead of taking
the first token) so that if the hostname is overriden, we'd not be creating
unncessary name conflicts. An example would be that a user overrides the
hostnames to be "127.0.0.1" and "127.0.0.2", and both of them were resolved to
"127" for the mirror pod name suffix.
Also, because `uname -n` could return a FQDN or not, this change takes only
the first token of it as the hostname for consistency.
If a static pod changes, delete the corresponding mirror pod. When kubelet
could not see mirror pod from the API server update, it'd attemp to create a
new mirror pod with up-to-date specs.
Currently, API server is not aware of the static pods (manifests from
sources other than the API server, e.g. file and http) at all. This is
inconvenient since users cannot check the static pods through kubectl.
It is also sub-optimal because scheduler is unaware of the resource
consumption by these static pods on the node.
This change syncs the information back to the API server by creating a
mirror pod via API server for each static pod.
- Kubelet creates containers for the static pod, as it would do
normally.
- If a mirror pod gets deleted, Kubelet will re-create one. The
containers are sync'd to the static pods, so they will not be
affected.
- If a static pod gets removed from the source (e.g. manifest file
removed from the directory), the orphaned mirror pod will be deleted.
Note that because events are associated with UID, and the mirror pod has
a different UID than the original static pod, the events will not be
shown for the mirror pod when running `kubectl describe pod
<mirror_pod>`.
Hostname behavior across operating systems is inconsistent (Macs can
have uppercase host names, so can some other systems). In general,
always strings.ToLower(os.Hostname()).
There are two main goals for this change.
1. Fix the naming scheme in kubelet so that it accepts DNS subdomain
name/namespaces correctly (#4920). The design is discussed in #3453.
2. Prepare for syncing the static pods back to the apiserver(#4090). This
includes
- Eliminate the source component in the internal full pod name (#4922). Pods
no longer need sources as they will all be sync'd via apiserver.
- Changing the naming scheme for the static (file-, http-, and etcd-based)
pods such that they are distinguishable when syncing back to the apiserver.
The changes includes:
* name = <pod.Name>-<hostname>
* namespace = <cluster_namespace> (i.e. "default" for now).
* container_name = k8s_<contianer_name>.<hash_of_container>_<pod_name>_<namespace>_<uid>_<random>
Note that this is not backward-compatible, meaning the kubelet won't recognize
existing running containers using the old naming scheme.
Currently, the validation logic validates fields in an object and supply default
values wherever applies. This change factors out defaulting to a set of
defaulting callback functions for decoding (see #1502 for more discussion).
* This change is based on pull request 2587.
* Most defaulting has been migrated to defaults.go where the defaulting
functions are added.
* validation_test.go and converter_test.go have been adapted to not testing the
default values.
* Fixed all tests with that create invalid objects with the absence of
defaulting logic.
Support namespacing in cache.Store by framing the interface functions
around interface{} and providing a key function to each Store implementation.
Implementation of a fix for #2294.
# *** ERROR: *** Some files have not been gofmt'd. To fix these
# errors, run gofmt -s -w <file>, or cut and paste the following:
# gofmt -s -w pkg/kubecfg/resource_printer.go pkg/proxy/config/config.go pkg/runtime/types.go
#
# Your commit will be aborted unless you override this warning. To
# commit in spite of these format errors, delete the following line:
# COMMIT_BLOCKED_ON_GOFMT
Make all kubelet config sources ensure that UID and Namespace are defaulted, if
need be.
We can *almost* disable the "if blank" logic for UID, except for tests that
call APIs that do not run through SyncPods. We really ought to be enforcing
invariants better.
make etcd registry pass test
fix kubelet config for quantity
fix openstack for quantity
fix controller for quantity
fix last tests for quantity
wire into binaries
fix controller manager
fix build for 32 bit systems
There are three values that uniquely identify a pod on a host -
the configuration source (etcd, file, http), the pod name, and the
pod namespace. This change ensures that configuration properly
makes those names unique by changing podFullName to contain both
name (currently ID in v1beta1, Name in v1beta3) and namespace.
The Kubelet does not properly handle information requests for
pods not in the default namespace at this time.
Allows us to define different watch versioning regimes in the future
as well as to encode information with the resource version.
This changes /watch/resources?resourceVersion=3 to start the watch at
4 instead of 3, which means clients can read a resource version and
then send it back to the server. Clients should no longer do math on
resource versions.
Move a lot of common error logging into better buckets:
glog.Errorf() - Always an error
glog.Warningf() - Something unexpected, but probably not an error
glog.V(0) - Generally useful for this to ALWAYS be visible
to an operator
* Programmer errors
* Logging extra info about a panic
* CLI argument handling
glog.V(1) - A reasonable default log level if you don't want
verbosity
* Information about config (listening on X, watching Y)
* Errors that repeat frequently that relate to conditions
that can be corrected (pod detected as unhealthy)
glog.V(2) - Useful steady state information about the service
* Logging HTTP requests and their exit code
* System state changing (killing pod)
* Controller state change events (starting pods)
* Scheduler log messages
glog.V(3) - Extended information about changes
* More info about system state changes
glog.V(4) - Debug level verbosity (for now)
* Logging in particularly thorny parts of code where
you may want to come back later and check it