Merge pull request #12859 from jszczepkowski/hpa-docs

Design proposal: Horizontal Pod Autoscaler.
2015-08-21 18:56:47 -07:00 · 2015-08-21 18:56:47 -07:00 · ea1b2672cb
parent 5e1d793a0c 354b8d2abf
commit ea1b2672cb
1 changed files with 281 additions and 0 deletions
--- a/docs/proposals/horizontal-pod-autoscaler.md
+++ b/docs/proposals/horizontal-pod-autoscaler.md
@ -0,0 +1,281 @@
+<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
+
+<!-- BEGIN STRIP_FOR_RELEASE -->
+
+<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
+     width="25" height="25">
+<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
+     width="25" height="25">
+<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
+     width="25" height="25">
+<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
+     width="25" height="25">
+<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
+     width="25" height="25">
+
+<h2>PLEASE NOTE: This document applies to the HEAD of the source tree</h2>
+
+If you are using a released version of Kubernetes, you should
+refer to the docs that go with that version.
+
+<strong>
+The latest 1.0.x release of this document can be found
+[here](http://releases.k8s.io/release-1.0/docs/proposals/horizontal-pod-autoscaler.md).
+
+Documentation for other releases can be found at
+[releases.k8s.io](http://releases.k8s.io).
+</strong>
+--
+
+<!-- END STRIP_FOR_RELEASE -->
+
+<!-- END MUNGE: UNVERSIONED_WARNING -->
+
+# Horizontal Pod Autoscaling
+
+**Author**: Jerzy Szczepkowski (@jszczepkowski)
+
+## Preface
+
+This document briefly describes the design of the horizontal autoscaler for pods.
+The autoscaler (implemented as a kubernetes control loop) will be responsible for automatically
+choosing and setting the number of pods of a given type that run in a kubernetes cluster.
+
+This proposal supersedes [autoscaling.md](http://releases.k8s.io/release-1.0/docs/proposals/autoscaling.md).
+
+## Overview
+
+The usage of a serving application usually vary over time: sometimes the demand for the application rises,
+and sometimes it drops.
+In Kubernetes version 1.0, a user can only manually set the number of serving pods.
+Our aim is to provide a mechanism for the automatic adjustment of the number of pods based on usage statistics.
+
+## Scale Subresource
+
+We are going to introduce Scale subresource and implement horizontal autoscaling of pods based on it.
+Scale subresource will be supported for replication controllers and deployments.
+Scale subresource will be a Virtual Resource (will not be stored in etcd as a separate object).
+It will be only present in API as an interface to accessing replication controller or deployment,
+and the values of Scale fields will be inferred from the corresponing replication controller/deployment object.
+HorizontalPodAutoscaler object will be bound with exactly one Scale subresource and will be
+autoscaling associated replication controller/deployment through it.
+The main advantage of such approach is that whenever we introduce another type we want to auto-scale,
+we just need to implement Scale subresource for it (w/o modifying autoscaler code or API).
+The wider discussion regarding Scale took place in [#1629](https://github.com/GoogleCloudPlatform/kubernetes/issues/1629).
+
+Scale subresource will be present in API for replication controller or deployment under the following paths:
+
+```api/vX/replicationcontrollers/myrc/scale```
+
+```api/vX/deployments/mydeployment/scale```
+
+It will have the following structure:
+
+```go
+// Scale subresource, applicable to ReplicationControllers and (in future) Deployment.
+type Scale struct {
+	api.TypeMeta
+	api.ObjectMeta
+
+	// Spec defines the behavior of the scale.
+	Spec ScaleSpec
+
+	// Status represents the current status of the scale.
+	Status ScaleStatus
+}
+
+// ScaleSpec describes the attributes a Scale subresource
+type ScaleSpec struct {
+	// Replicas is the number of desired replicas.
+	Replicas int
+}
+
+// ScaleStatus represents the current status of a Scale subresource.
+type ScaleStatus struct {
+	// Replicas is the number of actual replicas.
+	Replicas int
+
+	// Selector is a label query over pods that should match the replicas count.
+	Selector map[string]string
+}
+
+```
+
+Writing ```ScaleSpec.Replicas``` will resize the replication controller/deployment associated with
+the given Scale subresource.
+```ScaleStatus.Replicas``` will report how many pods are currently running in the replication controller/deployment,
+and ```ScaleStatus.Selector``` will return selector for the pods.
+
+## HorizontalPodAutoscaler Object
+
+We will introduce HorizontalPodAutoscaler object, it will be accessible under:
+
+```
+api/vX/horizontalpodautoscalers/myautoscaler
+```
+
+It will have the following structure:
+
+```go
+// HorizontalPodAutoscaler represents the configuration of a horizontal pod autoscaler.
+type HorizontalPodAutoscaler struct {
+	api.TypeMeta
+	api.ObjectMeta
+
+	// Spec defines the behaviour of autoscaler.
+	Spec HorizontalPodAutoscalerSpec
+
+	// Status represents the current information about the autoscaler.
+	Status HorizontalPodAutoscalerStatus
+}
+
+// HorizontalPodAutoscalerSpec is the specification of a horizontal pod autoscaler.
+type HorizontalPodAutoscalerSpec struct {
+	// ScaleRef is a reference to Scale subresource. HorizontalPodAutoscaler will learn the current
+	// resource consumption from its status, and will set the desired number of pods by modyfying its spec.
+	ScaleRef *SubresourceReference
+	// MinCount is the lower limit for the number of pods that can be set by the autoscaler.
+	MinCount int
+	// MaxCount is the upper limit for the number of pods that can be set by the autoscaler.
+	// It cannot be smaller than MinCount.
+	MaxCount int
+	// Target is the target average consumption of the given resource that the autoscaler will try
+	// to maintain by adjusting the desired number of pods.
+	// Currently this can be either "cpu" or "memory".
+	Target ResourceConsumption
+}
+
+// HorizontalPodAutoscalerStatus contains the current status of a horizontal pod autoscaler
+type HorizontalPodAutoscalerStatus struct {
+	// CurrentReplicas is the number of replicas of pods managed by this autoscaler.
+	CurrentReplicas int
+
+	// DesiredReplicas is the desired number of replicas of pods managed by this autoscaler.
+	// The number may be different because pod downscaling is someteimes delayed to keep the number
+	// of pods stable.
+	DesiredReplicas int
+
+	// CurrentConsumption is the current average consumption of the given resource that the autoscaler will
+	// try to maintain by adjusting the desired number of pods.
+	// Two types of resources are supported: "cpu" and "memory".
+	CurrentConsumption ResourceConsumption
+
+	// LastScaleTimestamp is the last time the HorizontalPodAutoscaler scaled the number of pods.
+	// This is used by the autoscaler to controll how often the number of pods is changed.
+	LastScaleTimestamp *util.Time
+}
+
+// ResourceConsumption is an object for specifying average resource consumption of a particular resource.
+type ResourceConsumption struct {
+	Resource api.ResourceName
+	Quantity resource.Quantity
+}
+```
+
+```Scale``` will be a reference to the Scale subresource.
+```MinCount```, ```MaxCount``` and ```Target``` will define autoscaler configuration.
+We will also introduce HorizontalPodAutoscalerList object to enable listing all autoscalers in the cluster:
+
+```go
+// HorizontalPodAutoscaler is a collection of pod autoscalers.
+type HorizontalPodAutoscalerList struct {
+	api.TypeMeta
+	api.ListMeta
+
+	Items []HorizontalPodAutoscaler
+}
+```
+
+## Autoscaling Algorithm
+
+The autoscaler will be implemented as a control loop.
+It will periodically (e.g.: every 1 minute) query pods described by ```Status.PodSelector``` of Scale subresource,
+and check their average CPU or memory usage from the last 1 minute
+(there will be API on master for this purpose, see
+[#11951](https://github.com/GoogleCloudPlatform/kubernetes/issues/11951).
+Then, it will compare the current CPU or memory consumption with the Target,
+and adjust the count of the Scale if needed to match the target
+(preserving condition: MinCount <= Count <= MaxCount).
+
+The target number of pods will be calculated from the following formula:
+
+```
+TargetNumOfPods = sum(CurrentPodsConsumption) / Target
+```
+
+To make scaling more stable, scale-up will happen only when the floor of ```TargetNumOfPods``` is higher than
+the current number, while scale-down will happen only when the ceiling of ```TargetNumOfPods``` is lower than
+the current number.
+
+The decision to scale-up will be executed instantly.
+However, we will execute scale-down only if the sufficient time has passed from the last scale-up (e.g.: 10 minutes).
+Such approach has two benefits:
+
+* Autoscaler works in a conservative way.
+  If new user load appears, it is important for us to rapidly increase the number of pods,
+  so that user requests will not be rejected.
+  Lowering the number of pods is not that urgent.
+
+* Autoscaler avoids thrashing, i.e.: prevents rapid execution of conflicting decision if the load is not stable.
+
+
+As the CPU consumption of a pod immediately after start may be highly variable due to initialization/startup,
+autoscaler will skip metrics from the first minute of pod lifecycle.
+
+## Relative vs. absolute metrics
+
+The question arises whether the values of the target metrics should be absolute (e.g.: 0.6 core, 100MB of RAM)
+or relative (e.g.: 110% of resource request, 90% of resource limit).
+The argument for the relative metrics is that when user changes resources for a pod,
+she will not have to change the definition of the autoscaler object, as the relative metric will still be valid.
+However, we want to be able to base autoscaling on custom metrics in the future.
+Such metrics will rather be absolute (e.g.: the number of queries-per-second).
+Therefore, we decided to give absolute values for the target metrics in the initial version.
+
+Please note that when custom metrics are supported, it will be possible to create additional metrics
+in heapster that will divide CPU/memory consumption by resource request/limit.
+From autoscaler point of view the metrics will be absolute,
+although such metrics will be bring the benefits of relative metrics to the user.
+
+
+## Support in kubectl
+
+To make manipulation on HorizontalPodAutoscaler object simpler, we will add support for
+creating/updating/deletion/listing of HorizontalPodAutoscaler to kubectl.
+In addition, we will add kubectl support for the following use-cases:
+* When running an image with ```kubectl run```, there should be an additional option to create
+  an autoscaler for it.
+* When creating a replication controller or deployment with ```kubectl create [-f]```, there should be
+  a possibility to specify an additional autoscaler object.
+  (This should work out-of-the-box when creation of autoscaler is supported by kubectl as we may include
+  multiple objects in the same config file).
+* We will and a new command ```kubectl autoscale``` that will allow for easy creation of an autoscaler object
+  for already existing replication controller/deployment.
+
+## Next steps
+
+We list here some features that will not be supported in the initial version of autoscaler.
+However, we want to keep them in mind, as they will most probably be needed in future.
+Our design is in general compatible with them.
+* Autoscale pods based on metrics different than CPU & memory (e.g.: network traffic, qps).
+  This includes scaling based on a custom metric.
+* Autoscale pods based on multiple metrics.
+  If the target numbers of pods for different metrics are different, choose the largest target number of pods.
+* Scale the number of pods starting from 0: all pods can be turned-off,
+  and then turned-on when there is a demand for them.
+  When a request to service with no pods arrives, kube-proxy will generate an event for autoscaler
+  to create a new pod.
+  Discussed in [#3247](https://github.com/GoogleCloudPlatform/kubernetes/issues/3247).
+* When scaling down, make more educated decision which pods to kill (e.g.: if two or more pods are on the same node, kill one of them).
+  Discussed in [#4301](https://github.com/GoogleCloudPlatform/kubernetes/issues/4301).
+* Allow rule based autoscaling: instead of specifying the target value for metric,
+  specify a rule, e.g.: “if average CPU consumption of pod is higher than 80% add two more replicas”.
+  This approach was initially suggested in
+  [autoscaling.md](http://releases.k8s.io/release-1.0/docs/proposals/autoscaling.md) proposal.
+  Before doing this, we need to evaluate why the target based scaling described in this proposal is not sufficient.
+
+
+
+<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
+[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/horizontal-pod-autoscaler.md?pixel)]()
+<!-- END MUNGE: GENERATED_ANALYTICS -->