mirror of https://github.com/k3s-io/k3s
Merge pull request #12236 from nikhiljindal/deploymentProposal
Adding a proposal for deploymentpull/6/head
commit
55b7500b33
|
@ -0,0 +1,269 @@
|
|||
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
<!-- BEGIN STRIP_FOR_RELEASE -->
|
||||
|
||||
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
|
||||
width="25" height="25">
|
||||
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
|
||||
width="25" height="25">
|
||||
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
|
||||
width="25" height="25">
|
||||
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
|
||||
width="25" height="25">
|
||||
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
|
||||
width="25" height="25">
|
||||
|
||||
<h2>PLEASE NOTE: This document applies to the HEAD of the source tree</h2>
|
||||
|
||||
If you are using a released version of Kubernetes, you should
|
||||
refer to the docs that go with that version.
|
||||
|
||||
<strong>
|
||||
The latest 1.0.x release of this document can be found
|
||||
[here](http://releases.k8s.io/release-1.0/docs/proposals/deployment.md).
|
||||
|
||||
Documentation for other releases can be found at
|
||||
[releases.k8s.io](http://releases.k8s.io).
|
||||
</strong>
|
||||
--
|
||||
|
||||
<!-- END STRIP_FOR_RELEASE -->
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
# Deployment
|
||||
|
||||
## Abstract
|
||||
|
||||
A proposal for implementing a new resource - Deployment - which will enable
|
||||
declarative config updates for Pods and ReplicationControllers.
|
||||
|
||||
Users will be able to create a Deployment, which will spin up
|
||||
a ReplicationController to bring up the desired pods.
|
||||
Users can also target the Deployment at existing ReplicationControllers, in
|
||||
which case the new RC will replace the existing ones. The exact mechanics of
|
||||
replacement depends on the DeploymentStrategy chosen by the user.
|
||||
DeploymentStrategies are explained in detail in a later section.
|
||||
|
||||
## Implementation
|
||||
|
||||
### API Object
|
||||
|
||||
The `Deployment` API object will have the following structure:
|
||||
|
||||
```go
|
||||
type Deployment struct {
|
||||
TypeMeta
|
||||
ObjectMeta
|
||||
|
||||
// Specification of the desired behavior of the Deployment.
|
||||
Spec DeploymentSpec
|
||||
|
||||
// Most recently observed status of the Deployment.
|
||||
Status DeploymentStatus
|
||||
}
|
||||
|
||||
type DeploymentSpec struct {
|
||||
// Number of desired pods. This is a pointer to distinguish between explicit
|
||||
// zero and not specified. Defaults to 1.
|
||||
Replicas *int
|
||||
|
||||
// Label selector for pods. Existing ReplicationControllers whose pods are
|
||||
// selected by this will be scaled down.
|
||||
Selector map[string]string
|
||||
|
||||
// Describes the pods that will be created.
|
||||
Template *PodTemplateSpec
|
||||
|
||||
// The deployment strategy to use to replace existing pods with new ones.
|
||||
Strategy DeploymentStrategy
|
||||
|
||||
// Key of the selector that is added to existing RCs (and label key that is
|
||||
// added to its pods) to prevent the existing RCs to select new pods (and old
|
||||
// pods being selected by new RC).
|
||||
// Users can set this to an empty string to indicate that the system should
|
||||
// not add any selector and label. If unspecified, system uses
|
||||
// "deployment.kubernetes.io/podTemplateHash".
|
||||
// Value of this key is hash of DeploymentSpec.PodTemplateSpec.
|
||||
UniqueLabelKey *string
|
||||
}
|
||||
|
||||
type DeploymentStrategy struct {
|
||||
// Type of deployment. Can be "Recreate" or "RollingUpdate".
|
||||
Type DeploymentType
|
||||
|
||||
// TODO: Update this to follow our convention for oneOf, whatever we decide it
|
||||
// to be.
|
||||
// Rolling update config params. Present only if DeploymentType =
|
||||
// RollingUpdate.
|
||||
RollingUpdate *RollingUpdateDeploymentSpec
|
||||
}
|
||||
|
||||
type DeploymentType string
|
||||
|
||||
const (
|
||||
// Kill all existing pods before creating new ones.
|
||||
DeploymentRecreate DeploymentType = "Recreate"
|
||||
|
||||
// Replace the old RCs by new one using rolling update i.e gradually scale down the old RCs and scale up the new one.
|
||||
DeploymentRollingUpdate DeploymentType = "RollingUpdate"
|
||||
)
|
||||
|
||||
// Spec to control the desired behavior of rolling update.
|
||||
type RollingUpdateDeploymentSpec struct {
|
||||
// The maximum number of pods that can be unavailable during the update.
|
||||
// Value can be an absolute number (ex: 5) or a percentage of total pods at the start of update (ex: 10%).
|
||||
// Absolute number is calculated from percentage by rounding up.
|
||||
// This can not be 0 if MaxSurge is 0.
|
||||
// By default, a fixed value of 1 is used.
|
||||
// Example: when this is set to 30%, the old RC can be scaled down by 30%
|
||||
// immediately when the rolling update starts. Once new pods are ready, old RC
|
||||
// can be scaled down further, followed by scaling up the new RC, ensuring
|
||||
// that at least 70% of original number of pods are available at all times
|
||||
// during the update.
|
||||
MaxUnavailable IntOrString
|
||||
|
||||
// The maximum number of pods that can be scheduled above the original number of
|
||||
// pods.
|
||||
// Value can be an absolute number (ex: 5) or a percentage of total pods at
|
||||
// the start of the update (ex: 10%). This can not be 0 if MaxUnavailable is 0.
|
||||
// Absolute number is calculated from percentage by rounding up.
|
||||
// By default, a value of 1 is used.
|
||||
// Example: when this is set to 30%, the new RC can be scaled up by 30%
|
||||
// immediately when the rolling update starts. Once old pods have been killed,
|
||||
// new RC can be scaled up further, ensuring that total number of pods running
|
||||
// at any time during the update is atmost 130% of original pods.
|
||||
MaxSurge IntOrString
|
||||
|
||||
// Minimum number of seconds for which a newly created pod should be ready
|
||||
// without any of its container crashing, for it to be considered available.
|
||||
// Defaults to 0 (pod will be considered available as soon as it is ready)
|
||||
MinReadySeconds int
|
||||
}
|
||||
|
||||
type DeploymentStatus struct {
|
||||
// Total number of ready pods targeted by this deployment (this
|
||||
// includes both the old and new pods).
|
||||
Replicas int
|
||||
|
||||
// Total number of new ready pods with the desired template spec.
|
||||
UpdatedReplicas int
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
### Controller
|
||||
|
||||
#### Deployment Controller
|
||||
|
||||
The DeploymentController will make Deployments happen.
|
||||
It will watch Deployment objects in etcd.
|
||||
For each pending deployment, it will:
|
||||
|
||||
1. Find all RCs whose label selector is a superset of DeploymentSpec.Selector.
|
||||
- For now, we will do this in the client - list all RCs and then filter the
|
||||
ones we want. Eventually, we want to expose this in the API.
|
||||
2. The new RC can have the same selector as the old RC and hence we add a unique
|
||||
selector to all these RCs (and the corresponding label to their pods) to ensure
|
||||
that they do not select the newly created pods (or old pods get selected by
|
||||
new RC).
|
||||
- The label key will be "deployment.kubernetes.io/podTemplateHash".
|
||||
- The label value will be hash of the podTemplateSpec for that RC without
|
||||
this label. This value will be unique for all RCs, since PodTemplateSpec should be unique.
|
||||
- If the RCs and pods dont already have this label and selector:
|
||||
- We will first add this to RC.PodTemplateSpec.Metadata.Labels for all RCs to
|
||||
ensure that all new pods that they create will have this label.
|
||||
- Then we will add this label to their existing pods and then add this as a selector
|
||||
to that RC.
|
||||
3. Find if there exists an RC for which value of "deployment.kubernetes.io/podTemplateHash" label
|
||||
is same as hash of DeploymentSpec.PodTemplateSpec. If it exists already, then
|
||||
this is the RC that will be ramped up. If there is no such RC, then we create
|
||||
a new one using DeploymentSpec and then add a "deployment.kubernetes.io/podTemplateHash" label
|
||||
to it. RCSpec.replicas = 0 for a newly created RC.
|
||||
4. Scale up the new RC and scale down the olds ones as per the DeploymentStrategy.
|
||||
- Raise an event if we detect an error, like new pods failing to come up.
|
||||
5. Go back to step 1 unless the new RC has been ramped up to desired replicas
|
||||
and the old RCs have been ramped down to 0.
|
||||
6. Cleanup.
|
||||
|
||||
DeploymentController is stateless so that it can recover incase it crashes during a deployment.
|
||||
|
||||
### MinReadySeconds
|
||||
|
||||
We will implement MinReadySeconds using the Ready condition in Pod. We will add
|
||||
a LastTransitionTime to PodCondition and update kubelet to set Ready to false,
|
||||
each time any container crashes. Kubelet will set Ready condition back to true once
|
||||
all containers are ready. For containers without a readiness probe, we will
|
||||
assume that they are ready as soon as they are up.
|
||||
https://github.com/kubernetes/kubernetes/issues/11234 tracks updating kubelet
|
||||
and https://github.com/kubernetes/kubernetes/issues/12615 tracks adding
|
||||
LastTransitionTime to PodCondition.
|
||||
|
||||
## Changing Deployment mid-way
|
||||
|
||||
### Updating
|
||||
|
||||
Users can update an ongoing deployment before it is completed.
|
||||
In this case, the existing deployment will be stalled and the new one will
|
||||
begin.
|
||||
For ex: consider the following case:
|
||||
- User creates a deployment to rolling-update 10 pods with image:v1 to
|
||||
pods with image:v2.
|
||||
- User then updates this deployment to create pods with image:v3,
|
||||
when the image:v2 RC had been ramped up to 5 pods and the image:v1 RC
|
||||
had been ramped down to 5 pods.
|
||||
- When Deployment Controller observes the new deployment, it will create
|
||||
a new RC for creating pods with image:v3. It will then start ramping up this
|
||||
new RC to 10 pods and will ramp down both the existing RCs to 0.
|
||||
|
||||
### Deleting
|
||||
|
||||
Users can pause/cancel a deployment by deleting it before it is completed.
|
||||
Recreating the same deployment will resume it.
|
||||
For ex: consider the following case:
|
||||
- User creates a deployment to rolling-update 10 pods with image:v1 to
|
||||
pods with image:v2.
|
||||
- User then deletes this deployment while the old and new RCs are at 5 replicas each.
|
||||
User will end up with 2 RCs with 5 replicas each.
|
||||
User can then create the same deployment again in which case, DeploymentController will
|
||||
notice that the second RC exists already which it can ramp up while ramping down
|
||||
the first one.
|
||||
|
||||
### Rollback
|
||||
|
||||
We want to allow the user to rollback a deployment. To rollback a
|
||||
completed (or ongoing) deployment, user can create (or update) a deployment with
|
||||
DeploymentSpec.PodTemplateSpec = oldRC.PodTemplateSpec.
|
||||
|
||||
## Deployment Strategies
|
||||
|
||||
DeploymentStrategy specifies how the new RC should replace existing RCs.
|
||||
To begin with, we will support 2 types of deployment:
|
||||
* Recreate: We kill all existing RCs and then bring up the new one. This results
|
||||
in quick deployment but there is a downtime when old pods are down but
|
||||
the new ones have not come up yet.
|
||||
* Rolling update: We gradually scale down old RCs while scaling up the new one.
|
||||
This results in a slower deployment, but there is no downtime. At all times
|
||||
during the deployment, there are a few pods available (old or new). The number
|
||||
of available pods and when is a pod considered "available" can be configured
|
||||
using RollingUpdateDeploymentSpec.
|
||||
|
||||
In future, we want to support more deployment types.
|
||||
|
||||
## Future
|
||||
|
||||
Apart from the above, we want to add support for the following:
|
||||
* Running the deployment process in a pod: In future, we can run the deployment process in a pod. Then users can define their own custom deployments and we can run it using the image name.
|
||||
* More DeploymentTypes: https://github.com/openshift/origin/blob/master/examples/deployment/README.md#deployment-types lists most commonly used ones.
|
||||
* Triggers: Deployment will have a trigger field to identify what triggered the deployment. Options are: Manual/UserTriggered, Autoscaler, NewImage.
|
||||
* Automatic rollback on error: We want to support automatic rollback on error or timeout.
|
||||
|
||||
## References
|
||||
|
||||
- https://github.com/GoogleCloudPlatform/kubernetes/issues/1743 has most of the
|
||||
discussion that resulted in this proposal.
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/deployment.md?pixel)]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
Loading…
Reference in New Issue