Merge pull request #23343 from derekparker/self-hosted-kubelet-proposal

Automatic merge from submit-queue docs: Self-hosted Kubelet proposal Provides a proposal for changes needed with Kubernetes to allow for a self-hosted Kubelet bootstrap.
2016-05-09 21:09:35 -07:00 · 2016-05-09 21:09:35 -07:00 · 088694fa72
parent ff339c77cf a7f4402e41
commit 088694fa72
1 changed files with 164 additions and 0 deletions
--- a/docs/proposals/self-hosted-kubelet.md
+++ b/docs/proposals/self-hosted-kubelet.md
@ -0,0 +1,164 @@
+<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
+
+<!-- BEGIN STRIP_FOR_RELEASE -->
+
+<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
+     width="25" height="25">
+<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
+     width="25" height="25">
+<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
+     width="25" height="25">
+<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
+     width="25" height="25">
+<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
+     width="25" height="25">
+
+<h2>PLEASE NOTE: This document applies to the HEAD of the source tree</h2>
+
+If you are using a released version of Kubernetes, you should
+refer to the docs that go with that version.
+
+Documentation for other releases can be found at
+[releases.k8s.io](http://releases.k8s.io).
+</strong>
+--
+
+<!-- END STRIP_FOR_RELEASE -->
+
+<!-- END MUNGE: UNVERSIONED_WARNING -->
+
+# Proposal: Self-hosted kubelet
+
+## Abstract
+
+In a self-hosted Kubernetes deployment (see (this
+comment)[https://github.com/kubernetes/kubernetes/issues/246#issuecomment-64533959]
+for background on self hosted kubernetes), we have the initial bootstrap problem.
+When running self-hosted components, there needs to be a mechanism for pivoting
+from the initial bootstrap state to the kubernetes-managed (self-hosted) state.
+In the case of a self-hosted kubelet, this means pivoting from the initial
+kubelet defined and run on the host, to the kubelet pod which has been scheduled
+to the node.
+
+This proposal presents a solution to the kubelet bootstrap, and assumes a
+functioning control plane (e.g. an apiserver, controller-manager, scheduler, and
+etcd cluster), and a kubelet that can securely contact the API server. This
+functioning control plane can be temporary, and not necessarily the "production"
+control plane that will be used after the initial pivot / bootstrap.
+
+## Background and Motivation
+
+In order to understand the goals of this proposal, one must understand what
+"self-hosted" means. This proposal defines "self-hosted" as a kubernetes cluster
+that is installed and managed by the kubernetes installation itself. This means
+that each kubernetes component is described by a kubernetes manifest (Daemonset,
+Deployment, etc) and can be updated via kubernetes.
+
+The overall goal of this proposal is to make kubernetes easier to install and
+upgrade. We can then treat kubernetes itself just like any other application
+hosted in a kubernetes cluster, and have access to easy upgrades, monitoring,
+and durability for core kubernetes components themselves.
+
+We intend to achieve this by using kubernetes to manage itself.  However, in
+order to do that we must first "bootstrap" the cluster, by using kubernetes to
+install kubernetes components. This is where this proposal fits in, by
+describing the necessary modifications, and required procedures, needed to run a
+self-hosted kubelet.
+
+The approach being proposed for a self-hosted kubelet is a "pivot" style
+installation.  This procedure assumes a short-lived “bootstrap” kubelet will run
+and start a long-running “self-hosted” kubelet. Once the self-hosted kubelet is
+running the bootstrap kubelet will exit. As part of this, we propose introducing
+a new `--bootstrap` flag to the kubelet. The behaviour of that flag will be
+explained in detail below.
+
+## Proposal
+
+We propose adding a new flag to the kubelet, the `--bootstrap` flag, which is
+assumed to be used in conjunction with the `--lock-file` flag. The `--lock-file`
+flag is used to ensure only a single kubelet is running at any given time during
+this pivot process. When the `--bootstrap` flag is provided, after the kubelet
+acquires the file lock, it will begin asynchronously waiting on
+[inotify](http://man7.org/linux/man-pages/man7/inotify.7.html) events. Once an
+"open" event is received, the kubelet will assume another kubelet is attempting
+to take control and will exit by calling `exit(0)`.
+
+Thus, the initial bootstrap becomes:
+
+1. "bootstrap" kubelet is started by $init system.
+1. "bootstrap" kubelet pulls down "self-hosted" kubelet as a pod from a
+   daemonset
+1. "self-hosted" kubelet attempts to acquire the file lock, causing "bootstrap"
+   kubelet to exit
+1. "self-hosted" kubelet acquires lock and takes over
+1. "bootstrap" kubelet is restarted by $init system and blocks on acquiring the
+   file lock
+
+During an upgrade of the kubelet, for simplicity we will consider 3 kubelets,
+namely "bootstrap", "v1", and "v2". We imagine the following scenario for
+upgrades:
+
+1. Cluster administrator introduces "v2" kubelet daemonset
+1. "v1" kubelet pulls down and starts "v2"
+1. Cluster administrator removes "v1" kubelet daemonset
+1. "v1" kubelet is killed
+1. Both "bootstrap" and "v2" kubelets race for file lock
+1. If "v2" kubelet acquires lock, process has completed
+1. If "bootstrap" kubelet acquires lock, it is assumed that "v2" kubelet will
+   fail a health check and be killed. Once restarted, it will try to acquire the
+   lock, triggering the "bootstrap" kubelet to exit.
+
+Alternatively, it would also be possible via this mechanism to delete the "v1"
+daemonset first, allow the "bootstrap" kubelet to take over, and then introduce
+the "v2" kubelet daemonset, effectively eliminating the race between "bootstrap"
+and "v2" for lock acquisition, and the reliance on the failing health check
+procedure.
+
+Eventually this could be handled by a DaemonSet upgrade policy.
+
+This will allow a "self-hosted" kubelet with minimal new concepts introduced
+into the core Kubernetes code base, and remains flexible enough to work well
+with future [bootstrapping
+services](https://github.com/kubernetes/kubernetes/issues/5754).
+
+## Production readiness considerations / Out of scope issues
+
+* Deterministically pulling and running kubelet pod: we would prefer not to have
+  to loop until we finally get a kubelet pod.
+* It is possible that the bootstrap kubelet version is incompatible with the
+  newer versions that were run in the node. For example, the cgroup
+  configurations might be incompatible. In the beginning, we will require
+  cluster admins to keep the configuration in sync. Since we want the bootstrap
+  kubelet to come up and run even if the API server is not available, we should
+  persist the configuration for bootstrap kubelet on the node. Once we have
+  checkpointing in kubelet, we will checkpoint the updated config and have the
+  bootstrap kubelet use the updated config, if it were to take over.
+* Currently best practice when upgrading the kubelet on a node is to drain all
+  pods first. Automatically draining of the node during kubelet upgrade is out
+  of scope for this proposal. It is assumed that either the cluster
+  administrator or the daemonset upgrade policy will handle this.
+
+## Other discussion
+
+Various similar approaches have been discussed
+[here](https://github.com/kubernetes/kubernetes/issues/246#issuecomment-64533959)
+and
+[here](https://github.com/kubernetes/kubernetes/issues/23073#issuecomment-198478997).
+Other discussion around the kubelet being able to be run inside a container is
+[here](https://github.com/kubernetes/kubernetes/issues/4869). Note this isn't a
+strict requirement as the kubelet could be run in a chroot jail via rkt fly or
+other such similar approach.
+
+Additionally, [Taints and
+Tolerations](../../docs/design/taint-toleration-dedicated.md), whose design has
+already been accepted, would make the overall kubelet bootstrap more
+deterministic. With this, we would also need the ability for a kubelet to
+register itself with a given taint when it first contacts the API server. Given
+that, a kubelet could register itself with a given taint such as
+“component=kubelet”, and a kubelet pod could exist that has a toleration to that
+taint, ensuring it is the only pod the “bootstrap” kubelet runs.
+
+
+<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
+[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/self-hosted-kubelet.md?pixel)]()
+<!-- END MUNGE: GENERATED_ANALYTICS -->