Merge pull request #23343 from derekparker/self-hosted-kubelet-proposal

Automatic merge from submit-queue

docs: Self-hosted Kubelet proposal

Provides a proposal for changes needed with Kubernetes to allow for a
self-hosted Kubelet bootstrap.
pull/6/head
k8s-merge-robot 2016-05-09 21:09:35 -07:00
commit 088694fa72
1 changed files with 164 additions and 0 deletions

View File

@ -0,0 +1,164 @@
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
<!-- BEGIN STRIP_FOR_RELEASE -->
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
width="25" height="25">
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
width="25" height="25">
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
width="25" height="25">
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
width="25" height="25">
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
width="25" height="25">
<h2>PLEASE NOTE: This document applies to the HEAD of the source tree</h2>
If you are using a released version of Kubernetes, you should
refer to the docs that go with that version.
Documentation for other releases can be found at
[releases.k8s.io](http://releases.k8s.io).
</strong>
--
<!-- END STRIP_FOR_RELEASE -->
<!-- END MUNGE: UNVERSIONED_WARNING -->
# Proposal: Self-hosted kubelet
## Abstract
In a self-hosted Kubernetes deployment (see (this
comment)[https://github.com/kubernetes/kubernetes/issues/246#issuecomment-64533959]
for background on self hosted kubernetes), we have the initial bootstrap problem.
When running self-hosted components, there needs to be a mechanism for pivoting
from the initial bootstrap state to the kubernetes-managed (self-hosted) state.
In the case of a self-hosted kubelet, this means pivoting from the initial
kubelet defined and run on the host, to the kubelet pod which has been scheduled
to the node.
This proposal presents a solution to the kubelet bootstrap, and assumes a
functioning control plane (e.g. an apiserver, controller-manager, scheduler, and
etcd cluster), and a kubelet that can securely contact the API server. This
functioning control plane can be temporary, and not necessarily the "production"
control plane that will be used after the initial pivot / bootstrap.
## Background and Motivation
In order to understand the goals of this proposal, one must understand what
"self-hosted" means. This proposal defines "self-hosted" as a kubernetes cluster
that is installed and managed by the kubernetes installation itself. This means
that each kubernetes component is described by a kubernetes manifest (Daemonset,
Deployment, etc) and can be updated via kubernetes.
The overall goal of this proposal is to make kubernetes easier to install and
upgrade. We can then treat kubernetes itself just like any other application
hosted in a kubernetes cluster, and have access to easy upgrades, monitoring,
and durability for core kubernetes components themselves.
We intend to achieve this by using kubernetes to manage itself. However, in
order to do that we must first "bootstrap" the cluster, by using kubernetes to
install kubernetes components. This is where this proposal fits in, by
describing the necessary modifications, and required procedures, needed to run a
self-hosted kubelet.
The approach being proposed for a self-hosted kubelet is a "pivot" style
installation. This procedure assumes a short-lived “bootstrap” kubelet will run
and start a long-running “self-hosted” kubelet. Once the self-hosted kubelet is
running the bootstrap kubelet will exit. As part of this, we propose introducing
a new `--bootstrap` flag to the kubelet. The behaviour of that flag will be
explained in detail below.
## Proposal
We propose adding a new flag to the kubelet, the `--bootstrap` flag, which is
assumed to be used in conjunction with the `--lock-file` flag. The `--lock-file`
flag is used to ensure only a single kubelet is running at any given time during
this pivot process. When the `--bootstrap` flag is provided, after the kubelet
acquires the file lock, it will begin asynchronously waiting on
[inotify](http://man7.org/linux/man-pages/man7/inotify.7.html) events. Once an
"open" event is received, the kubelet will assume another kubelet is attempting
to take control and will exit by calling `exit(0)`.
Thus, the initial bootstrap becomes:
1. "bootstrap" kubelet is started by $init system.
1. "bootstrap" kubelet pulls down "self-hosted" kubelet as a pod from a
daemonset
1. "self-hosted" kubelet attempts to acquire the file lock, causing "bootstrap"
kubelet to exit
1. "self-hosted" kubelet acquires lock and takes over
1. "bootstrap" kubelet is restarted by $init system and blocks on acquiring the
file lock
During an upgrade of the kubelet, for simplicity we will consider 3 kubelets,
namely "bootstrap", "v1", and "v2". We imagine the following scenario for
upgrades:
1. Cluster administrator introduces "v2" kubelet daemonset
1. "v1" kubelet pulls down and starts "v2"
1. Cluster administrator removes "v1" kubelet daemonset
1. "v1" kubelet is killed
1. Both "bootstrap" and "v2" kubelets race for file lock
1. If "v2" kubelet acquires lock, process has completed
1. If "bootstrap" kubelet acquires lock, it is assumed that "v2" kubelet will
fail a health check and be killed. Once restarted, it will try to acquire the
lock, triggering the "bootstrap" kubelet to exit.
Alternatively, it would also be possible via this mechanism to delete the "v1"
daemonset first, allow the "bootstrap" kubelet to take over, and then introduce
the "v2" kubelet daemonset, effectively eliminating the race between "bootstrap"
and "v2" for lock acquisition, and the reliance on the failing health check
procedure.
Eventually this could be handled by a DaemonSet upgrade policy.
This will allow a "self-hosted" kubelet with minimal new concepts introduced
into the core Kubernetes code base, and remains flexible enough to work well
with future [bootstrapping
services](https://github.com/kubernetes/kubernetes/issues/5754).
## Production readiness considerations / Out of scope issues
* Deterministically pulling and running kubelet pod: we would prefer not to have
to loop until we finally get a kubelet pod.
* It is possible that the bootstrap kubelet version is incompatible with the
newer versions that were run in the node. For example, the cgroup
configurations might be incompatible. In the beginning, we will require
cluster admins to keep the configuration in sync. Since we want the bootstrap
kubelet to come up and run even if the API server is not available, we should
persist the configuration for bootstrap kubelet on the node. Once we have
checkpointing in kubelet, we will checkpoint the updated config and have the
bootstrap kubelet use the updated config, if it were to take over.
* Currently best practice when upgrading the kubelet on a node is to drain all
pods first. Automatically draining of the node during kubelet upgrade is out
of scope for this proposal. It is assumed that either the cluster
administrator or the daemonset upgrade policy will handle this.
## Other discussion
Various similar approaches have been discussed
[here](https://github.com/kubernetes/kubernetes/issues/246#issuecomment-64533959)
and
[here](https://github.com/kubernetes/kubernetes/issues/23073#issuecomment-198478997).
Other discussion around the kubelet being able to be run inside a container is
[here](https://github.com/kubernetes/kubernetes/issues/4869). Note this isn't a
strict requirement as the kubelet could be run in a chroot jail via rkt fly or
other such similar approach.
Additionally, [Taints and
Tolerations](../../docs/design/taint-toleration-dedicated.md), whose design has
already been accepted, would make the overall kubelet bootstrap more
deterministic. With this, we would also need the ability for a kubelet to
register itself with a given taint when it first contacts the API server. Given
that, a kubelet could register itself with a given taint such as
“component=kubelet”, and a kubelet pod could exist that has a toleration to that
taint, ensuring it is the only pod the “bootstrap” kubelet runs.
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/self-hosted-kubelet.md?pixel)]()
<!-- END MUNGE: GENERATED_ANALYTICS -->