## Abstract A proposal for refactoring `SecurityContext` to have pod-level and container-level attributes in order to correctly model pod- and container-level security concerns. ## Motivation Currently, containers have a `SecurityContext` attribute which contains information about the security settings the container uses. In practice, many of these attributes are uniform across all containers in a pod. Simultaneously, there is also a need to apply the security context pattern at the pod level to correctly model security attributes that apply only at a pod level. Users should be able to: 1. Express security settings that are applicable to the entire pod 2. Express base security settings that apply to all containers 3. Override only the settings that need to be differentiated from the base in individual containers This proposal is a dependency for other changes related to security context: 1. [Volume ownership management in the Kubelet](https://github.com/kubernetes/kubernetes/pull/12944) 2. [Generic SELinux label management in the Kubelet](https://github.com/kubernetes/kubernetes/pull/14192) Goals of this design: 1. Describe the use cases for which a pod-level security context is necessary 2. Thoroughly describe the API backward compatibility issues that arise from the introduction of a pod-level security context 3. Describe all implementation changes necessary for the feature ## Constraints and assumptions 1. We will not design for intra-pod security; we are not currently concerned about isolating containers in the same pod from one another 1. We will design for backward compatibility with the current V1 API ## Use Cases 1. As a developer, I want to correctly model security attributes which belong to an entire pod 2. As a user, I want to be able to specify container attributes that apply to all containers without repeating myself 3. As an existing user, I want to be able to use the existing container-level security API ### Use Case: Pod level security attributes Some security attributes make sense only to model at the pod level. For example, it is a fundamental property of pods that all containers in a pod share the same network namespace. Therefore, using the host namespace makes sense to model at the pod level only, and indeed, today it is part of the `PodSpec`. Other host namespace support is currently being added and these will also be pod-level settings; it makes sense to model them as a pod-level collection of security attributes. ## Use Case: Override pod security context for container Some use cases require the containers in a pod to run with different security settings. As an example, a user may want to have a pod with two containers, one of which runs as root with the privileged setting, and one that runs as a non-root UID. To support use cases like this, it should be possible to override appropriate (i.e., not intrinsically pod-level) security settings for individual containers. ## Proposed Design ### SecurityContext For posterity and ease of reading, note the current state of `SecurityContext`: ```go package api type Container struct { // Other fields omitted // Optional: SecurityContext defines the security options the pod should be run with SecurityContext *SecurityContext `json:"securityContext,omitempty"` } type SecurityContext struct { // Capabilities are the capabilities to add/drop when running the container Capabilities *Capabilities `json:"capabilities,omitempty"` // Run the container in privileged mode Privileged *bool `json:"privileged,omitempty"` // SELinuxOptions are the labels to be applied to the container // and volumes SELinuxOptions *SELinuxOptions `json:"seLinuxOptions,omitempty"` // RunAsUser is the UID to run the entrypoint of the container process. RunAsUser *int64 `json:"runAsUser,omitempty"` // RunAsNonRoot indicates that the container should be run as a non-root user. If the RunAsUser // field is not explicitly set then the kubelet may check the image for a specified user or // perform defaulting to specify a user. RunAsNonRoot bool `json:"runAsNonRoot,omitempty"` } // SELinuxOptions contains the fields that make up the SELinux context of a container. type SELinuxOptions struct { // SELinux user label User string `json:"user,omitempty"` // SELinux role label Role string `json:"role,omitempty"` // SELinux type label Type string `json:"type,omitempty"` // SELinux level label. Level string `json:"level,omitempty"` } ``` ### PodSecurityContext `PodSecurityContext` specifies two types of security attributes: 1. Attributes that apply to the pod itself 2. Attributes that apply to the containers of the pod In the internal API, fields of the `PodSpec` controlling the use of the host PID, IPC, and network namespaces are relocated to this type: ```go package api type PodSpec struct { // Other fields omitted // Optional: SecurityContext specifies pod-level attributes and container security attributes // that apply to all containers. SecurityContext *PodSecurityContext `json:"securityContext,omitempty"` } // PodSecurityContext specifies security attributes of the pod and container attributes that apply // to all containers of the pod. type PodSecurityContext struct { // Use the host's network namespace. If this option is set, the ports that will be // used must be specified. // Optional: Default to false. HostNetwork bool // Use the host's IPC namespace HostIPC bool // Use the host's PID namespace HostPID bool // Capabilities are the capabilities to add/drop when running containers Capabilities *Capabilities `json:"capabilities,omitempty"` // Run the container in privileged mode Privileged *bool `json:"privileged,omitempty"` // SELinuxOptions are the labels to be applied to the container // and volumes SELinuxOptions *SELinuxOptions `json:"seLinuxOptions,omitempty"` // RunAsUser is the UID to run the entrypoint of the container process. RunAsUser *int64 `json:"runAsUser,omitempty"` // RunAsNonRoot indicates that the container should be run as a non-root user. If the RunAsUser // field is not explicitly set then the kubelet may check the image for a specified user or // perform defaulting to specify a user. RunAsNonRoot bool } // Comments and generated docs will change for the container.SecurityContext field to indicate // the precedence of these fields over the pod-level ones. type Container struct { // Other fields omitted // Optional: SecurityContext defines the security options the pod should be run with. // Settings specified in this field take precedence over the settings defined in // pod.Spec.SecurityContext. SecurityContext *SecurityContext `json:"securityContext,omitempty"` } ``` In the V1 API, the pod-level security attributes which are currently fields of the `PodSpec` are retained on the `PodSpec` for backward compatibility purposes: ```go package v1 type PodSpec struct { // Other fields omitted // Use the host's network namespace. If this option is set, the ports that will be // used must be specified. // Optional: Default to false. HostNetwork bool `json:"hostNetwork,omitempty"` // Use the host's pid namespace. // Optional: Default to false. HostPID bool `json:"hostPID,omitempty"` // Use the host's ipc namespace. // Optional: Default to false. HostIPC bool `json:"hostIPC,omitempty"` // Optional: SecurityContext specifies pod-level attributes and container security attributes // that apply to all containers. SecurityContext *PodSecurityContext `json:"securityContext,omitempty"` } ``` The `pod.Spec.SecurityContext` specifies the security context of all containers in the pod. The containers' `securityContext` field is overlaid on the base security context to determine the effective security context for the container. The new V1 API should be backward compatible with the existing API. Backward compatibility is defined as: > 1. Any API call (e.g. a structure POSTed to a REST endpoint) that worked before your change must > work the same after your change. > 2. Any API call that uses your change must not cause problems (e.g. crash or degrade behavior) when > issued against servers that do not include your change. > 3. It must be possible to round-trip your change (convert to different API versions and back) with > no loss of information. Previous versions of this proposal attempted to deal with backward compatibility by defining the affect of setting the pod-level fields on the container-level fields. While trying to find consensus on this design, it became apparent that this approach was going to be extremely complex to implement, explain, and support. Instead, we will approach backward compatibility as follows: 1. Pod-level and container-level settings will not affect one another 2. Old clients will be able to use container-level settings in the exact same way 3. Container level settings always override pod-level settings if they are set #### Examples 1. Old client using `pod.Spec.Containers[x].SecurityContext` An old client creates a pod: ```yaml apiVersion: v1 kind: Pod metadata: name: test-pod spec: containers: - name: a securityContext: runAsUser: 1001 - name: b securityContext: runAsUser: 1002 ``` looks to old clients like: ```yaml apiVersion: v1 kind: Pod metadata: name: test-pod spec: containers: - name: a securityContext: runAsUser: 1001 - name: b securityContext: runAsUser: 1002 ``` looks to new clients like: ```yaml apiVersion: v1 kind: Pod metadata: name: test-pod spec: containers: - name: a securityContext: runAsUser: 1001 - name: b securityContext: runAsUser: 1002 ``` 2. New client using `pod.Spec.SecurityContext` A new client creates a pod using a field of `pod.Spec.SecurityContext`: ```yaml apiVersion: v1 kind: Pod metadata: name: test-pod spec: securityContext: runAsUser: 1001 containers: - name: a - name: b ``` appears to new clients as: ```yaml apiVersion: v1 kind: Pod metadata: name: test-pod spec: securityContext: runAsUser: 1001 containers: - name: a - name: b ``` old clients will see: ```yaml apiVersion: v1 kind: Pod metadata: name: test-pod spec: containers: - name: a - name: b ``` 3. Pods created using `pod.Spec.SecurityContext` and `pod.Spec.Containers[x].SecurityContext` If a field is set in both `pod.Spec.SecurityContext` and `pod.Spec.Containers[x].SecurityContext`, the value in `pod.Spec.Containers[x].SecurityContext` wins. In the following pod: ```yaml apiVersion: v1 kind: Pod metadata: name: test-pod spec: securityContext: runAsUser: 1001 containers: - name: a securityContext: runAsUser: 1002 - name: b ``` The effective setting for `runAsUser` for container A is `1002`. #### Testing A backward compatibility test suite will be established for the v1 API. The test suite will verify compatibility by converting objects into the internal API and back to the version API and examining the results. All of the examples here will be used as test-cases. As more test cases are added, the proposal will be updated. An example of a test like this can be found in the [OpenShift API package](https://github.com/openshift/origin/blob/master/pkg/api/compatibility_test.go) E2E test cases will be added to test the correct determination of the security context for containers. ### Kubelet changes 1. The Kubelet will use the new fields on the `PodSecurityContext` for host namespace control 2. The Kubelet will be modified to correctly implement the backward compatibility and effective security context determination defined here [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/pod-security-context.md?pixel)]()