mirror of https://github.com/k3s-io/k3s
253 lines
12 KiB
Markdown
253 lines
12 KiB
Markdown
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
|
|
|
|
<!-- BEGIN STRIP_FOR_RELEASE -->
|
|
|
|
<img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
|
|
width="25" height="25">
|
|
<img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
|
|
width="25" height="25">
|
|
<img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
|
|
width="25" height="25">
|
|
<img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
|
|
width="25" height="25">
|
|
<img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
|
|
width="25" height="25">
|
|
|
|
<h2>PLEASE NOTE: This document applies to the HEAD of the source tree</h2>
|
|
|
|
If you are using a released version of Kubernetes, you should
|
|
refer to the docs that go with that version.
|
|
|
|
<!-- TAG RELEASE_LINK, added by the munger automatically -->
|
|
<strong>
|
|
The latest release of this document can be found
|
|
[here](http://releases.k8s.io/release-1.4/docs/design/security.md).
|
|
|
|
Documentation for other releases can be found at
|
|
[releases.k8s.io](http://releases.k8s.io).
|
|
</strong>
|
|
--
|
|
|
|
<!-- END STRIP_FOR_RELEASE -->
|
|
|
|
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
|
|
|
# Security in Kubernetes
|
|
|
|
Kubernetes should define a reasonable set of security best practices that allows
|
|
processes to be isolated from each other, from the cluster infrastructure, and
|
|
which preserves important boundaries between those who manage the cluster, and
|
|
those who use the cluster.
|
|
|
|
While Kubernetes today is not primarily a multi-tenant system, the long term
|
|
evolution of Kubernetes will increasingly rely on proper boundaries between
|
|
users and administrators. The code running on the cluster must be appropriately
|
|
isolated and secured to prevent malicious parties from affecting the entire
|
|
cluster.
|
|
|
|
|
|
## High Level Goals
|
|
|
|
1. Ensure a clear isolation between the container and the underlying host it
|
|
runs on
|
|
2. Limit the ability of the container to negatively impact the infrastructure
|
|
or other containers
|
|
3. [Principle of Least Privilege](http://en.wikipedia.org/wiki/Principle_of_least_privilege) -
|
|
ensure components are only authorized to perform the actions they need, and
|
|
limit the scope of a compromise by limiting the capabilities of individual
|
|
components
|
|
4. Reduce the number of systems that have to be hardened and secured by
|
|
defining clear boundaries between components
|
|
5. Allow users of the system to be cleanly separated from administrators
|
|
6. Allow administrative functions to be delegated to users where necessary
|
|
7. Allow applications to be run on the cluster that have "secret" data (keys,
|
|
certs, passwords) which is properly abstracted from "public" data.
|
|
|
|
## Use cases
|
|
|
|
### Roles
|
|
|
|
We define "user" as a unique identity accessing the Kubernetes API server, which
|
|
may be a human or an automated process. Human users fall into the following
|
|
categories:
|
|
|
|
1. k8s admin - administers a Kubernetes cluster and has access to the underlying
|
|
components of the system
|
|
2. k8s project administrator - administrates the security of a small subset of
|
|
the cluster
|
|
3. k8s developer - launches pods on a Kubernetes cluster and consumes cluster
|
|
resources
|
|
|
|
Automated process users fall into the following categories:
|
|
|
|
1. k8s container user - a user that processes running inside a container (on the
|
|
cluster) can use to access other cluster resources independent of the human
|
|
users attached to a project
|
|
2. k8s infrastructure user - the user that Kubernetes infrastructure components
|
|
use to perform cluster functions with clearly defined roles
|
|
|
|
### Description of roles
|
|
|
|
* Developers:
|
|
* write pod specs.
|
|
* making some of their own images, and using some "community" docker images
|
|
* know which pods need to talk to which other pods
|
|
* decide which pods should share files with other pods, and which should not.
|
|
* reason about application level security, such as containing the effects of a
|
|
local-file-read exploit in a webserver pod.
|
|
* do not often reason about operating system or organizational security.
|
|
* are not necessarily comfortable reasoning about the security properties of a
|
|
system at the level of detail of Linux Capabilities, SELinux, AppArmor, etc.
|
|
|
|
* Project Admins:
|
|
* allocate identity and roles within a namespace
|
|
* reason about organizational security within a namespace
|
|
* don't give a developer permissions that are not needed for role.
|
|
* protect files on shared storage from unnecessary cross-team access
|
|
* are less focused about application security
|
|
|
|
* Administrators:
|
|
* are less focused on application security. Focused on operating system
|
|
security.
|
|
* protect the node from bad actors in containers, and properly-configured
|
|
innocent containers from bad actors in other containers.
|
|
* comfortable reasoning about the security properties of a system at the level
|
|
of detail of Linux Capabilities, SELinux, AppArmor, etc.
|
|
* decides who can use which Linux Capabilities, run privileged containers, use
|
|
hostPath, etc.
|
|
* e.g. a team that manages Ceph or a mysql server might be trusted to have
|
|
raw access to storage devices in some organizations, but teams that develop the
|
|
applications at higher layers would not.
|
|
|
|
|
|
## Proposed Design
|
|
|
|
A pod runs in a *security context* under a *service account* that is defined by
|
|
an administrator or project administrator, and the *secrets* a pod has access to
|
|
is limited by that *service account*.
|
|
|
|
|
|
1. The API should authenticate and authorize user actions [authn and authz](access.md)
|
|
2. All infrastructure components (kubelets, kube-proxies, controllers,
|
|
scheduler) should have an infrastructure user that they can authenticate with
|
|
and be authorized to perform only the functions they require against the API.
|
|
3. Most infrastructure components should use the API as a way of exchanging data
|
|
and changing the system, and only the API should have access to the underlying
|
|
data store (etcd)
|
|
4. When containers run on the cluster and need to talk to other containers or
|
|
the API server, they should be identified and authorized clearly as an
|
|
autonomous process via a [service account](service_accounts.md)
|
|
1. If the user who started a long-lived process is removed from access to
|
|
the cluster, the process should be able to continue without interruption
|
|
2. If the user who started processes are removed from the cluster,
|
|
administrators may wish to terminate their processes in bulk
|
|
3. When containers run with a service account, the user that created /
|
|
triggered the service account behavior must be associated with the container's
|
|
action
|
|
5. When container processes run on the cluster, they should run in a
|
|
[security context](security_context.md) that isolates those processes via Linux
|
|
user security, user namespaces, and permissions.
|
|
1. Administrators should be able to configure the cluster to automatically
|
|
confine all container processes as a non-root, randomly assigned UID
|
|
2. Administrators should be able to ensure that container processes within
|
|
the same namespace are all assigned the same unix user UID
|
|
3. Administrators should be able to limit which developers and project
|
|
administrators have access to higher privilege actions
|
|
4. Project administrators should be able to run pods within a namespace
|
|
under different security contexts, and developers must be able to specify which
|
|
of the available security contexts they may use
|
|
5. Developers should be able to run their own images or images from the
|
|
community and expect those images to run correctly
|
|
6. Developers may need to ensure their images work within higher security
|
|
requirements specified by administrators
|
|
7. When available, Linux kernel user namespaces can be used to ensure 5.2
|
|
and 5.4 are met.
|
|
8. When application developers want to share filesystem data via distributed
|
|
filesystems, the Unix user ids on those filesystems must be consistent across
|
|
different container processes
|
|
6. Developers should be able to define [secrets](secrets.md) that are
|
|
automatically added to the containers when pods are run
|
|
1. Secrets are files injected into the container whose values should not be
|
|
displayed within a pod. Examples:
|
|
1. An SSH private key for git cloning remote data
|
|
2. A client certificate for accessing a remote system
|
|
3. A private key and certificate for a web server
|
|
4. A .kubeconfig file with embedded cert / token data for accessing the
|
|
Kubernetes master
|
|
5. A .dockercfg file for pulling images from a protected registry
|
|
2. Developers should be able to define the pod spec so that a secret lands
|
|
in a specific location
|
|
3. Project administrators should be able to limit developers within a
|
|
namespace from viewing or modifying secrets (anyone who can launch an arbitrary
|
|
pod can view secrets)
|
|
4. Secrets are generally not copied from one namespace to another when a
|
|
developer's application definitions are copied
|
|
|
|
|
|
### Related design discussion
|
|
|
|
* [Authorization and authentication](access.md)
|
|
* [Secret distribution via files](http://pr.k8s.io/2030)
|
|
* [Docker secrets](https://github.com/docker/docker/pull/6697)
|
|
* [Docker vault](https://github.com/docker/docker/issues/10310)
|
|
* [Service Accounts:](service_accounts.md)
|
|
* [Secret volumes](http://pr.k8s.io/4126)
|
|
|
|
## Specific Design Points
|
|
|
|
### TODO: authorization, authentication
|
|
|
|
### Isolate the data store from the nodes and supporting infrastructure
|
|
|
|
Access to the central data store (etcd) in Kubernetes allows an attacker to run
|
|
arbitrary containers on hosts, to gain access to any protected information
|
|
stored in either volumes or in pods (such as access tokens or shared secrets
|
|
provided as environment variables), to intercept and redirect traffic from
|
|
running services by inserting middlemen, or to simply delete the entire history
|
|
of the cluster.
|
|
|
|
As a general principle, access to the central data store should be restricted to
|
|
the components that need full control over the system and which can apply
|
|
appropriate authorization and authentication of change requests. In the future,
|
|
etcd may offer granular access control, but that granularity will require an
|
|
administrator to understand the schema of the data to properly apply security.
|
|
An administrator must be able to properly secure Kubernetes at a policy level,
|
|
rather than at an implementation level, and schema changes over time should not
|
|
risk unintended security leaks.
|
|
|
|
Both the Kubelet and Kube Proxy need information related to their specific roles -
|
|
for the Kubelet, the set of pods it should be running, and for the Proxy, the
|
|
set of services and endpoints to load balance. The Kubelet also needs to provide
|
|
information about running pods and historical termination data. The access
|
|
pattern for both Kubelet and Proxy to load their configuration is an efficient
|
|
"wait for changes" request over HTTP. It should be possible to limit the Kubelet
|
|
and Proxy to only access the information they need to perform their roles and no
|
|
more.
|
|
|
|
The controller manager for Replication Controllers and other future controllers
|
|
act on behalf of a user via delegation to perform automated maintenance on
|
|
Kubernetes resources. Their ability to access or modify resource state should be
|
|
strictly limited to their intended duties and they should be prevented from
|
|
accessing information not pertinent to their role. For example, a replication
|
|
controller needs only to create a copy of a known pod configuration, to
|
|
determine the running state of an existing pod, or to delete an existing pod
|
|
that it created - it does not need to know the contents or current state of a
|
|
pod, nor have access to any data in the pods attached volumes.
|
|
|
|
The Kubernetes pod scheduler is responsible for reading data from the pod to fit
|
|
it onto a node in the cluster. At a minimum, it needs access to view the ID of a
|
|
pod (to craft the binding), its current state, any resource information
|
|
necessary to identify placement, and other data relevant to concerns like
|
|
anti-affinity, zone or region preference, or custom logic. It does not need the
|
|
ability to modify pods or see other resources, only to create bindings. It
|
|
should not need the ability to delete bindings unless the scheduler takes
|
|
control of relocating components on failed hosts (which could be implemented by
|
|
a separate component that can delete bindings but not create them). The
|
|
scheduler may need read access to user or project-container information to
|
|
determine preferential location (underspecified at this time).
|
|
|
|
|
|
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
|
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/security.md?pixel)]()
|
|
<!-- END MUNGE: GENERATED_ANALYTICS -->
|