From eaedcd0caccbe6199474f20255e2b7faf21bc7bc Mon Sep 17 00:00:00 2001 From: Jerzy Szczepkowski Date: Tue, 18 Aug 2015 15:25:57 +0200 Subject: [PATCH] Design proposal: Horizontal Pod Autoscaler. Added design proposal for Horizontal Pod Autoscaler. Related to #12087. --- docs/proposals/horizontal-pod-autoscaler.md | 272 ++++++++++++++++++++ 1 file changed, 272 insertions(+) create mode 100644 docs/proposals/horizontal-pod-autoscaler.md diff --git a/docs/proposals/horizontal-pod-autoscaler.md b/docs/proposals/horizontal-pod-autoscaler.md new file mode 100644 index 0000000000..91211793ed --- /dev/null +++ b/docs/proposals/horizontal-pod-autoscaler.md @@ -0,0 +1,272 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/proposals/horizontal-pod-autoscaler.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +# Horizontal Pod Autoscaling + +**Author**: Jerzy Szczepkowski (@jszczepkowski) + +## Preface + +This document briefly describes the design of the horizontal autoscaler for pods. +The autoscaler (implemented as a kubernetes control loop) will be responsible for automatically +choosing and setting the number of pods of a given type that run in a kubernetes cluster. + +This proposal supersedes [autoscaling.md](http://releases.k8s.io/release-1.0/docs/proposals/autoscaling.md). + +## Overview + +The usage of a serving application usually vary over time: sometimes the demand for the application rises, +and sometimes it drops. +In the version 1.0, a user can only manually set the number of serving pods. +Our aim is to provide a mechanism for the automatic adjustment of the number of pods basing on usage statistics. + +## Scale Subresource + +We are going to introduce Scale subresource and implement horizontal autoscaling of pods on a base of it. +Scale subresource will be supported for replication controllers and deployments. +HorizontalPodAutoscaler object will be bound with exactly one Scale subresource and will be +autoscaling associated replication controller/deployment through it. + +Scale subresource will be present for replication controller or deployment under the following paths: + +```api/vX/replicationcontrollers/myrc/scale``` + +```api/vX/deployments/mydeployment/scale``` + +It will have the following structure: + +```go +// Scale subresource, applicable to ReplicationControllers and (in future) Deployment. +type Scale struct { + api.TypeMeta + api.ObjectMeta + + // Spec defines the behavior of the scale. + Spec ScaleSpec + + // Status represents the current status of the scale. + Status ScaleStatus +} + +// ScaleSpec describes the attributes a Scale subresource +type ScaleSpec struct { + // Replicas is the number of desired replicas. + Replicas int +} + +// ScaleStatus represents the current status of a Scale subresource. +type ScaleStatus struct { + // Replicas is the number of actual replicas. + Replicas int + + // Selector is a label query over pods that should match the replicas count. + Selector map[string]string +} + +``` + +Writing ```ScaleSpec.Count``` will resize the replication controller/deployment associated with +the given Scale subresource. +```ScaleStatus.Count``` will report how many pods are currently running in the replication controller/deployment, +and ```ScaleStatus.PodSelector``` will return selector for the pods. + +## HorizontalPodAutoscaler Object + +We will introduce HorizontalPodAutoscaler object, it will be accessible under: + +``` +api/vX/horizontalpodautoscalers/myautoscaler +``` + +It will have the following structure: + +```go +// HorizontalPodAutoscaler represents the configuration of a horizontal pod autoscaler. +type HorizontalPodAutoscaler struct { + api.TypeMeta + api.ObjectMeta + + // Spec defines the behaviour of autoscaler. + Spec HorizontalPodAutoscalerSpec + + // Status represents the current information about the autoscaler. + Status HorizontalPodAutoscalerStatus +} + +// HorizontalPodAutoscalerSpec is the specification of a horizontal pod autoscaler. +type HorizontalPodAutoscalerSpec struct { + // ScaleRef is a reference to Scale subresource. HorizontalPodAutoscaler will learn the current + // resource consumption from its status, and will set the desired number of pods by modyfying its spec. + ScaleRef *SubresourceReference + // MinCount is the lower limit for the number of pods that can be set by the autoscaler. + MinCount int + // MaxCount is the upper limit for the number of pods that can be set by the autoscaler. + // It cannot be smaller than MinCount. + MaxCount int + // Target is the target average consumption of the given resource that the autoscaler will try + // to maintain by adjusting the desired number of pods. + // Currently two types of resources are supported: "cpu" and "memory". + Target ResourceConsumption +} + +// HorizontalPodAutoscalerStatus contains the current status of a horizontal pod autoscaler +type HorizontalPodAutoscalerStatus struct { + // CurrentReplicas is the number of replicas of pods managed by this autoscaler. + CurrentReplicas int + + // DesiredReplicas is the desired number of replicas of pods managed by this autoscaler. + // The number may be different because pod downscaling is someteimes delayed to keep the number + // of pods stable. + DesiredReplicas int + + // CurrentConsumption is the current average consumption of the given resource that the autoscaler will + // try to maintain by adjusting the desired number of pods. + // Two types of resources are supported: "cpu" and "memory". + CurrentConsumption ResourceConsumption + + // LastScaleTimestamp is the last time the HorizontalPodAutoscaler scaled the number of pods. + // This is used by the autoscaler to controll how often the number of pods is changed. + LastScaleTimestamp *util.Time +} + +// ResourceConsumption is an object for specifying average resource consumption of a particular resource. +type ResourceConsumption struct { + Resource api.ResourceName + Quantity resource.Quantity +} +``` + +```Scale``` will be a reference to the Scale subresource. +```MinCount```, ```MaxCount``` and ```Target``` will define autoscaler configuration. +We will also introduce HorizontalPodAutoscalerList object to enable listing all autoscalers in the cluster: + +```go +// HorizontalPodAutoscaler is a collection of pod autoscalers. +type HorizontalPodAutoscalerList struct { + api.TypeMeta + api.ListMeta + + Items []HorizontalPodAutoscaler +} +``` + +## Autoscaling Algorithm + +The autoscaler will be implemented as a control loop. +It will periodically (e.g.: every 1 minute) query pods described by ```Status.PodSelector``` of Scale subresource, +and check their average CPU or memory usage from the last 1 minute +(there will be API on master for this purpose, see +[#11951](https://github.com/GoogleCloudPlatform/kubernetes/issues/11951). +Then, it will compare the current CPU or memory consumption with the Target, +and adjust the count of the Scale if needed to match the target +(preserving condition: MinCount <= Count <= MaxCount). + +The target number of pods will be calculated from the following formula: + +``` +TargetNumOfPods = sum(CurrentPodsConsumption) / Target +``` + +To make scaling more stable, scale-up will happen only when the floor of ```TargetNumOfPods``` is higher than +the current number, while scale-down will happen only when the ceiling of ```TargetNumOfPods``` is lower than +the current number. + +The decision to scale-up will be executed instantly. +However, we will execute scale-down only if the sufficient time has past from the last scale-up (e.g.: 10 minutes). +Such approach has two benefits: + +* Autoscaler works in a conservative way. + If new user load appears, it is important for us to rapidly increase the number of pods, + so that user requests will not be rejected. + Lowering the number of pods is not that urgent. + +* Autoscaler avoids thrashing, i.e.: prevents rapid execution of conflicting decision if the load is not stable. + + +As the CPU consumption of a pod immediately after start may be highly variable due to initialization/startup, +autoscaler will skip metrics from the first minute of pod lifecycle. + +## Relative vs. absolute metrics + +The question arises whether the values of the target metrics should be absolute (e.g.: 0.6 core, 100MB of RAM) +or relative (e.g.: 110% of resource request, 90% of resource limit). +The argument for the relative metrics is that when user changes resources for a pod, +she will not have to change the definition of the autoscaler object, as the relative metric will still be valid. +However, we want to be able to base autoscaling on custom metrics in the future. +Such metrics will rather be absolute (e.g.: the number of queries-per-second). +Therefore, we decided to give absolute values for the target metrics in the initial version. + +Please note that when custom metrics are supported, it will be possible to create additional metrics +in heapster that will divide CPU/memory consumption by resource request/limit. +From autoscaler point of view the metrics will be absolute, +althoug such metrics will be bring the benefits of relative metrics to the user. + + +## Support in kubectl + +To make manipulation on HorizontalPodAutoscaler object simpler, we will add support for creating/updating/deletion/listing of HorizontalPodAutoscaler to kubectl. +In addition, we will add kubectl support for the following use-cases: +* When running an image with ```kubectl run```, there should be an additional option to create + an autoscaler for it. +* When creating a replication controller or deployment with ```kubectl create [-f]```, there should be + a possibility to specify an additional autoscaler object. +* We will and a new command ```kubectl autoscale``` that will allow for easy creation of an autoscaler object + for already existing replication controller/deployment. + +## Future Features + +We list here some features that will not be supported in the initial version of autoscaler. +However, we want to keep them in mind, as they will most probably be needed in future. +Our design is in general compatible with them. +* Autoscale pods on a base of metrics different than CPU & memory (e.g.: network traffic, qps). + This includes scaling based on a custom metric. +* Autoscale pods on a base of multiple metrics. + If the target numbers of pods for different metrics are different, choose the largest target number of pods. +* Scale the number of pods starting from 0: all pods can be turned-off, + and then turned-on when there is a demand for them. + When a request to service with no pods arrives, kube-proxy will generate an event for autoscaler + to create a new pod. + Discussed in [#3247](https://github.com/GoogleCloudPlatform/kubernetes/issues/3247). +* When scaling down, make more educated decision which pods to kill (e.g.: kill pods that doubled-up first). + Discussed in [#4301](https://github.com/GoogleCloudPlatform/kubernetes/issues/4301). +* Allow rule based autoscaling: instead of specifying the target value for metric, + specify a rule, e.g.: “if average CPU consumption of pod is higher than 80% add two more replicas”. + This approach was initially suggested in + [autoscaling.md](http://releases.k8s.io/release-1.0/docs/proposals/autoscaling.md) proposal. + Before doing this, we need to evaluate why the target based scaling described in this proposal is not sufficient. + + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/horizontal-pod-autoscaler.md?pixel)]() +