- fix races in unit tests
- fix flaky singleActionEndsProcess test; further simplify Process impl
- fix flakey error handling in processAdapter
- eliminate time-based test assertions
Allow the user to specify the resolver configuration file that is used
to determine the default DNS parameters. This defaults to the system's
/etc/resolv.conf.
Currently, whenever there is any update, kubelet would force all pod workers to
sync again, causing resource contention and hence performance degradation.
This commit flips kubelet to use incremental updates (as opposed to snapshots).
This allows us to know what pods have changed and send updates to those pod
workers only. The `SyncPods` function has been replaced with individual
handlers, each handling an operation (ADD, REMOVE, UPDATE). Pod workers are
still triggered periodically, and kubelet performs periodic cleanup as well.
This commit also spawns a new goroutine solely responsible for killing pods.
This is necessary because pod killing could hold up the sync loop for
indefinitely long amount of time now user can define the graceful termination
period in the container spec.
- The TASK_ERROR task status was introduced with Mesos 0.21 and is actually used since 0.22.
It was not handled at all before this patch, leaving errored task in the registry in phase
"Pending". This will lead to task status updates from the Mesos Master on reconciliation with empty
slaveId fields, leading to scheduler crashes eventually.
- Handle terminal task with empty slaveId.
The slave id can be empty for TASK_ERROR.
The modified code path does not use the slaveId.
In contrast to the docker and system containers (= cgroup path) this has no
undesired consequence, only that the kubelet itself will be in its own cgroup
below the executor cgroup.
Before this patch if you type `kubectl delete [tab][tab]` you would get
nothing. `kubectl delete pod [tab][tab]` would show the list of pods.
This patch causes `kubectl delete [tab][tab]` to show the list of
resources like pod, service, podtemplate, serviceaccount, etc
Check to make sure there is not an alphanumeric character immeditely
before or after the 'flag'. It there is an alphanumeric character then
this is obviously not actually the flag we care about. For example if
the project declares a flag "valid-name" but the regex finds something
like "invalid_name" we should not match. Clearly this "invalid_name" is
not actually a wrong usage of the "valid-name" flag.
- move assigned slave to T.Spec.AssignedSlave
- only create the BindingHost annoation in prepareTaskForLaunch
- recover the assigned slave from annotation and write it back to the T.Spec field
Before this patch the annotation were used to store the assign slave. But due
to the cloning of tasks in the registry, this value was never persisted in the
registry.
This patch adds it to the Spec of a task and only creates the annotation
last-minute before launching.
Without this patch pods which fail before binding will stay in the registry,
but they are never rescheduled again. The reason: the BindingHost annotation does
not exist in the registry and not on the apiserver (compare reconcilePod function).
pflag can handle IP addresses so use the pflag code instead of doing it
ourselves. This means our code just uses net.IP and we don't have all of
the useless casting back and forth!
The EndpointPort struct only stores one port: the port which is used
to connect to the container from outside. In the case of the Mesos
endpoint controller this is the host port. The container port is not part
of the endpoint structure at all.
A number of e2e tests need the container port information to validate correct
endpoint creation. Therefore this patch annotates the Endpoint struct with a
number of annotations mapping "<HostIP>:<HostPort>" to "<ContainerPort>". In a
follow-up commit these annotations are used to validate endpoints in a Mesos
setup.
All binaries in kubenretes show `-` for help and seem to expect `-`. Although
`_` also works. The inconsistencies across the codebase using - and _
result in difficultly using things like grep to find things that need to
be changed.
The test assumes that all nodes have Ceph client utilities installed.
Ceph RBD container is hand crafted to be really minimal. It creates a new RBD
on startup, which can take up to several minutes on busy machines.
iSCSI and RBD volumes don't work as Kubernetes services - these protocols
are broken by S-NAT created by kube-proxy - at least iSCSI exhanges real
IP address of the iSCSI target as part of the protocol.
This reverts commit 118004c166.
Proxies on a TCP port are accessible outside the current security
context (eg: uid). Add support for having the proxy listen on a
unix socket, which has permissions applied to it.
We make sure the socket starts its life only accessible by the
current user using Umask.
This is useful for applications like Cockpit and other tools which
want the help of kubectl to handle authentication, configuration and
transport security, but also want to not make that accessible to
all users on a multi-user system.
The diurnal controller changes the number of replicas of a replication controller based on a list of times and replica counts. It is meant to be run under a replication controller.
Until Docker learns parent mount namespace customization the container will
always have the root ns as a parent, not the one of the km minion. Hence, the
kubelet (which lives in the km minion mount ns) will create mounts that cannot
be seen by the Docker containers.
This feature can be enabled again when Docker learns to explicitly set the
parent mount ns, in analogy to the parent cgroup.
The minion server will
- launch the proxy and executor
- relaunch them when they terminate uncleanly
- logrotate their logs.
It is a replacement for a full-blown init process like s6 which is not necessary
in this case.
Before NodeName in the pod spec was used. Hence, pods with a fixed, pre-set
NodeName were never scheduled by the k8sm-scheduler, leading e.g. to a failing
e2e intra-pod test.
Fixesmesosphere/kubernetes-mesos#388
This patch
- set limits (0.25 cpu, 64 MB) on containers which are not limited in pod spec
(these are also passed to the kubelet such that it uses them for the docker
run limits)
- sums up the container resource limits for cpu and memory inside a pod,
- compares the sums to the offered resources
- puts the sums into the Mesos TaskInfo such that Mesos does the accounting
for the pod.
- parses the static pod spec and adds up the resources
- sets the executor resources to 0.25 cpu, 64 MB plus the static pod resources
- sets the cgroups in the kubelet for system containers, resource containers
and docker to the one of the executor that Mesos assigned
- adds scheduler parameters --default-container-cpu-limit and
--default-container-mem-limit.
The containers themselves are resource limited the Docker resource limit which
the kubelet applies when launching them.
Fixesmesosphere/kubernetes-mesos#68 and mesosphere/kubernetes-mesos#304
This should ensure all load balancers get deleted even if a reordering of
watch events causes us to strand one after its service has been deleted,
because the sync will notice that the service controller's cache has a
service in it that no longer exists in the apiserver.
It could still leak in the case that the controller manager is killed
between when it leaks something and the sync runs, but this should
improve things.
Creating a cluster from scratch takes about 7 minutes. But if you just
rebuild the binaries and want to update those you don't want to have to
rerun the entire thing. There is an ansible tag 'binary-update' which
will do that. Now one can do
```
ANSIBLE_TAGS=binary-update vagrant provision
```
And it will push the new binaries.
If you are using locally build binaries as a developer you likely will
want to just push those binaries to an existing cluster, not rerun the
entire playbook. Add a tag to do just that.
Do the /etc/host creation with vagrant, so it uses internal instead of
external ips (hostmanager only knew about the public ip)
Ignore errors on docker failure when 'restarting' docker in flannel
handler. If this is a clean install, we haven't run 'node' yet so docker
isn't installed so it doesn't need to be started. It would be better to
be more specific in ignoring errors though...
This was originally submitted to pick up v0.3.1 of the cloud logging
plugin which had a fix for the name 'metadata' failing to resolve.
Since new releases of google-fluentd have this fix, it is no longer
required.
I've done some additional testing of 'gem update' behavior in the interim
and I think it is ok to use in targeted situations, but we should not be
doing an unconstrained update in general. The issue is that updating a
gem may bring new dependencies, some of those dependencies may include
native code, so it may try to launch a compiler, which isn't desirable
and prone to failure.
If we do need to grab an updated gem in the future we should specify an
explicit version and the --minimal-deps flag.
This helps with routing of TCP traffic between clients and servers in case
flannel or similar service is not installed and pods don't see each other.
- It needs 'insecure' in /etc/exports to allow NFS clients on ports > 1024,
Kubernetes service will change client port to a random number.
- glusterfs no longer needs explicit endpoint definition, it uses the service
instead.
- n.node used the n.lock as underlaying locker. The service loop initially
locked it, the Notify function tried to lock it before calling n.node.Signal,
leading to a dead-lock.
- the go routine calling ChangeMaster was not synchronized with the Notify
method. The former was triggering change events that the later never saw
when the former's startup was faster that of Notify. Hence, not even a single
event was noticed and not even a single start/stop call of the slow service
was triggered.
This patch replaces the n.node condition object with a simple channel n.changed.
The service loop watches it.
Updating the notified private variables is still protected with n.lock against
races, but independently of the n.changed channel. Hence, the deadlock is gone.
Moreover, the startup of the Notify loop is synchronized with the go routine which
changes the master. Hence, the Notify loop will see the master changes.
Fixes#10776
- Offers were reused and led to unexpected declining by the scheduler because
the reused offer did not get a new expiration time.
- Pod scheduling and offer creation was not synchronized. When scheduling
happened after aging of offers, the first issue was trigger. Because
the mesos driver DeclineOffer was not mocked this lead to a test error.
Depending on timing the mesos scheduler might call DeclineOffer:
The default ttl of an offer in mesos scheduler is 5sec. If the tests run longer,
the old, unused offers are declined, leading to an mock error.
Probably fixesGoogleCloudPlatform/kubernetes#10795