k3s/docs/pod-states.md

# The life of a pod

Updated: 4/14/2015

This document covers the lifecycle of a pod.  It is not an exhaustive document, but an introduction to the topic.

## Pod Phase

As consistent with the overall [API convention](api-conventions.md#typical-status-properties), phase is a simple, high-level summary of the phase of the lifecycle of a pod. It is not intended to be a comprehensive rollup of observations of container-level or even pod-level conditions or other state, nor is it intended to be a comprehensive state machine.

The number and meanings of `PodPhase` values are tightly guarded.  Other than what is documented here, nothing should be assumed about pods with a given `PodPhase`.

* Pending: The pod has been accepted by the system, but one or more of the container images has not been created.  This includes time before being scheduled as well as time spent downloading images over the network, which could take a while.
* Running: The pod has been bound to a node, and all of the containers have been created.  At least one container is still running, or is in the process of starting or restarting.
* Succeeded: All containers in the pod have terminated in success, and will not be restarted.
* Failed: All containers in the pod have terminated, at least one container has terminated in failure (exited with non-zero exit status or was terminated by the system).

## Pod Conditions

A pod containing containers that specify readiness probes will also report the Ready condition. Condition status values may be `True`, `False`, or `Unknown`.

## Container Statuses

More detailed information about the current (and previous) container statuses can be found in `containerStatuses`. The information reported depends on the current ContainerState, which may be Waiting, Running, or Termination (sic). 

## RestartPolicy

The RestartPolicy may be `Always`, `OnFailure`, or `Never`. RestartPolicy applies to all containers in the pod. RestartPolicy only refers to restarts of the containers by the Kubelet on the same node. As discussed in the [pods document](pods.md#durability-of-pods-or-lack-thereof), once bound to a node, a pod may never be rebound to another node. This means that some kind of controller is necessary in order for a pod to survive node failure, even if just a single pod at a time is desired.

The only controller we have today is [`ReplicationController`](replication-controller.md).  `ReplicationController` is *only* appropriate for pods with `RestartPolicy = Always`.  `ReplicationController` should refuse to instantiate any pod that has a different restart policy.

There is a legitimate need for a controller which keeps pods with other policies alive.  Both of the other policies (`OnFailure` and `Never`) eventually terminate, at which point the controller should stop recreating them.  Because of this fundamental distinction, let's hypothesize a new controller, called [`JobController`](https://github.com/GoogleCloudPlatform/kubernetes/issues/1624) for the sake of this document, which can implement this policy.

## Pod lifetime

In general, pods which are created do not disappear until someone destroys them.  This might be a human or a `ReplicationController`.  The only exception to this rule is that pods with a `PodPhase` of `Succeeded` or `Failed` for more than some duration (determined by the master) will expire and be automatically reaped.

If a node dies or is disconnected from the rest of the cluster, some entity within the system (call it the NodeController for now) is responsible for applying policy (e.g. a timeout) and marking any pods on the lost node as `Failed`.

## Examples

   * Pod is `Running`, 1 container, container exits success
     * Log completion event
     * If RestartPolicy is:
       * Always: restart container, pod stays `Running`
       * OnFailure: pod becomes `Succeeded`
       * Never: pod becomes `Succeeded`

   * Pod is `Running`, 1 container, container exits failure
     * Log failure event
     * If RestartPolicy is:
       * Always: restart container, pod stays `Running`
       * OnFailure: restart container, pod stays `Running`
       * Never: pod becomes `Failed`

   * Pod is `Running`, 2 containers, container 1 exits failure
     * Log failure event
     * If RestartPolicy is:
       * Always: restart container, pod stays `Running`
       * OnFailure: restart container, pod stays `Running`
       * Never: pod stays `Running`
     * When container 2 exits...
       * Log failure event
       * If RestartPolicy is:
         * Always: restart container, pod stays `Running`
         * OnFailure: restart container, pod stays `Running`
         * Never: pod becomes `Failed`

   * Pod is `Running`, container becomes OOM
     * Container terminates in failure
     * Log OOM event
     * If RestartPolicy is:
       * Always: restart container, pod stays `Running`
       * OnFailure: restart container, pod stays `Running`
       * Never: log failure event, pod becomes `Failed`

   * Pod is `Running`, a disk dies
     * All containers are killed
     * Log appropriate event
     * Pod becomes `Failed`
     * If running under a controller, pod will be recreated elsewhere

   * Pod is `Running`, its node is segmented out
     * NodeController waits for timeout
     * NodeController marks pod `Failed`
     * If running under a controller, pod will be recreated elsewhere
Add a doc on pod states 2014-09-23 04:35:18 +00:00			`# The life of a pod`

Updated API conventions and other details, per #6133. 2015-04-15 00:39:23 +00:00			`Updated: 4/14/2015`
Add a doc on pod states 2014-09-23 04:35:18 +00:00
Updated API conventions and other details, per #6133. 2015-04-15 00:39:23 +00:00			`This document covers the lifecycle of a pod. It is not an exhaustive document, but an introduction to the topic.`
Add a doc on pod states 2014-09-23 04:35:18 +00:00
Updated API conventions and other details, per #6133. 2015-04-15 00:39:23 +00:00			`## Pod Phase`
Add a doc on pod states 2014-09-23 04:35:18 +00:00
Updated API conventions and other details, per #6133. 2015-04-15 00:39:23 +00:00			`As consistent with the overall [API convention](api-conventions.md#typical-status-properties), phase is a simple, high-level summary of the phase of the lifecycle of a pod. It is not intended to be a comprehensive rollup of observations of container-level or even pod-level conditions or other state, nor is it intended to be a comprehensive state machine.`
Add a doc on pod states 2014-09-23 04:35:18 +00:00
Updated API conventions and other details, per #6133. 2015-04-15 00:39:23 +00:00			The number and meanings of `PodPhase` values are tightly guarded. Other than what is documented here, nothing should be assumed about pods with a given `PodPhase`.
Add a doc on pod states 2014-09-23 04:35:18 +00:00
Updated API conventions and other details, per #6133. 2015-04-15 00:39:23 +00:00			`* Pending: The pod has been accepted by the system, but one or more of the container images has not been created. This includes time before being scheduled as well as time spent downloading images over the network, which could take a while.`
			`* Running: The pod has been bound to a node, and all of the containers have been created. At least one container is still running, or is in the process of starting or restarting.`
			`* Succeeded: All containers in the pod have terminated in success, and will not be restarted.`
			`* Failed: All containers in the pod have terminated, at least one container has terminated in failure (exited with non-zero exit status or was terminated by the system).`
Add a doc on pod states 2014-09-23 04:35:18 +00:00
Updated API conventions and other details, per #6133. 2015-04-15 00:39:23 +00:00			`## Pod Conditions`
Add a doc on pod states 2014-09-23 04:35:18 +00:00
Updated API conventions and other details, per #6133. 2015-04-15 00:39:23 +00:00			A pod containing containers that specify readiness probes will also report the Ready condition. Condition status values may be `True`, `False`, or `Unknown`.
Add a doc on pod states 2014-09-23 04:35:18 +00:00
Updated API conventions and other details, per #6133. 2015-04-15 00:39:23 +00:00			`## Container Statuses`
Add a doc on pod states 2014-09-23 04:35:18 +00:00
Updated API conventions and other details, per #6133. 2015-04-15 00:39:23 +00:00			More detailed information about the current (and previous) container statuses can be found in `containerStatuses`. The information reported depends on the current ContainerState, which may be Waiting, Running, or Termination (sic).
Add a doc on pod states 2014-09-23 04:35:18 +00:00
Updated API conventions and other details, per #6133. 2015-04-15 00:39:23 +00:00			`## RestartPolicy`
Add a doc on pod states 2014-09-23 04:35:18 +00:00
Updated API conventions and other details, per #6133. 2015-04-15 00:39:23 +00:00			The RestartPolicy may be `Always`, `OnFailure`, or `Never`. RestartPolicy applies to all containers in the pod. RestartPolicy only refers to restarts of the containers by the Kubelet on the same node. As discussed in the [pods document](pods.md#durability-of-pods-or-lack-thereof), once bound to a node, a pod may never be rebound to another node. This means that some kind of controller is necessary in order for a pod to survive node failure, even if just a single pod at a time is desired.
Add a doc on pod states 2014-09-23 04:35:18 +00:00
Updated API conventions and other details, per #6133. 2015-04-15 00:39:23 +00:00			The only controller we have today is [`ReplicationController`](replication-controller.md). `ReplicationController` is only appropriate for pods with `RestartPolicy = Always`. `ReplicationController` should refuse to instantiate any pod that has a different restart policy.
Add a doc on pod states 2014-09-23 04:35:18 +00:00
Updated API conventions and other details, per #6133. 2015-04-15 00:39:23 +00:00			There is a legitimate need for a controller which keeps pods with other policies alive. Both of the other policies (`OnFailure` and `Never`) eventually terminate, at which point the controller should stop recreating them. Because of this fundamental distinction, let's hypothesize a new controller, called [`JobController`](https://github.com/GoogleCloudPlatform/kubernetes/issues/1624) for the sake of this document, which can implement this policy.
Add a doc on pod states 2014-09-23 04:35:18 +00:00
			`## Pod lifetime`

Updated API conventions and other details, per #6133. 2015-04-15 00:39:23 +00:00			In general, pods which are created do not disappear until someone destroys them. This might be a human or a `ReplicationController`. The only exception to this rule is that pods with a `PodPhase` of `Succeeded` or `Failed` for more than some duration (determined by the master) will expire and be automatically reaped.
Add a doc on pod states 2014-09-23 04:35:18 +00:00
Updated API conventions and other details, per #6133. 2015-04-15 00:39:23 +00:00			If a node dies or is disconnected from the rest of the cluster, some entity within the system (call it the NodeController for now) is responsible for applying policy (e.g. a timeout) and marking any pods on the lost node as `Failed`.
Add a doc on pod states 2014-09-23 04:35:18 +00:00
			`## Examples`

Updated API conventions and other details, per #6133. 2015-04-15 00:39:23 +00:00			* Pod is `Running`, 1 container, container exits success
Add a doc on pod states 2014-09-23 04:35:18 +00:00			`* Log completion event`
			`* If RestartPolicy is:`
Updated API conventions and other details, per #6133. 2015-04-15 00:39:23 +00:00			* Always: restart container, pod stays `Running`
			* OnFailure: pod becomes `Succeeded`
			* Never: pod becomes `Succeeded`
Add a doc on pod states 2014-09-23 04:35:18 +00:00
Updated API conventions and other details, per #6133. 2015-04-15 00:39:23 +00:00			* Pod is `Running`, 1 container, container exits failure
Add a doc on pod states 2014-09-23 04:35:18 +00:00			`* Log failure event`
			`* If RestartPolicy is:`
Updated API conventions and other details, per #6133. 2015-04-15 00:39:23 +00:00			* Always: restart container, pod stays `Running`
			* OnFailure: restart container, pod stays `Running`
			* Never: pod becomes `Failed`
Add a doc on pod states 2014-09-23 04:35:18 +00:00
Updated API conventions and other details, per #6133. 2015-04-15 00:39:23 +00:00			* Pod is `Running`, 2 containers, container 1 exits failure
Add a doc on pod states 2014-09-23 04:35:18 +00:00			`* Log failure event`
			`* If RestartPolicy is:`
Updated API conventions and other details, per #6133. 2015-04-15 00:39:23 +00:00			* Always: restart container, pod stays `Running`
			* OnFailure: restart container, pod stays `Running`
			* Never: pod stays `Running`
Add a doc on pod states 2014-09-23 04:35:18 +00:00			`* When container 2 exits...`
			`* Log failure event`
			`* If RestartPolicy is:`
Updated API conventions and other details, per #6133. 2015-04-15 00:39:23 +00:00			* Always: restart container, pod stays `Running`
			* OnFailure: restart container, pod stays `Running`
			* Never: pod becomes `Failed`
Add a doc on pod states 2014-09-23 04:35:18 +00:00
Updated API conventions and other details, per #6133. 2015-04-15 00:39:23 +00:00			* Pod is `Running`, container becomes OOM
Add a doc on pod states 2014-09-23 04:35:18 +00:00			`* Container terminates in failure`
			`* Log OOM event`
			`* If RestartPolicy is:`
Updated API conventions and other details, per #6133. 2015-04-15 00:39:23 +00:00			* Always: restart container, pod stays `Running`
			* OnFailure: restart container, pod stays `Running`
			* Never: log failure event, pod becomes `Failed`
Add a doc on pod states 2014-09-23 04:35:18 +00:00
Updated API conventions and other details, per #6133. 2015-04-15 00:39:23 +00:00			* Pod is `Running`, a disk dies
Add a doc on pod states 2014-09-23 04:35:18 +00:00			`* All containers are killed`
			`* Log appropriate event`
Updated API conventions and other details, per #6133. 2015-04-15 00:39:23 +00:00			* Pod becomes `Failed`
Add a doc on pod states 2014-09-23 04:35:18 +00:00			`* If running under a controller, pod will be recreated elsewhere`

Updated API conventions and other details, per #6133. 2015-04-15 00:39:23 +00:00			* Pod is `Running`, its node is segmented out
Add a doc on pod states 2014-09-23 04:35:18 +00:00			`* NodeController waits for timeout`
Updated API conventions and other details, per #6133. 2015-04-15 00:39:23 +00:00			* NodeController marks pod `Failed`
Add a doc on pod states 2014-09-23 04:35:18 +00:00			`* If running under a controller, pod will be recreated elsewhere`