# Resource Quota - Scoping resources ## Problem Description ### Ability to limit compute requests and limits The existing `ResourceQuota` API object constrains the total amount of compute resource requests. This is useful when a cluster-admin is interested in controlling explicit resource guarantees such that there would be a relatively strong guarantee that pods created by users who stay within their quota will find enough free resources in the cluster to be able to schedule. The end-user creating the pod is expected to have intimate knowledge on their minimum required resource as well as their potential limits. There are many environments where a cluster-admin does not extend this level of trust to their end-user because user's often request too much resource, and they have trouble reasoning about what they hope to have available for their application versus what their application actually needs. In these environments, the cluster-admin will often just expose a single value (the limit) to the end-user. Internally, they may choose a variety of other strategies for setting the request. For example, some cluster operators are focused on satisfying a particular over-commit ratio and may choose to set the request as a factor of the limit to control for over-commit. Other cluster operators may defer to a resource estimation tool that sets the request based on known historical trends. In this environment, the cluster-admin is interested in exposing a quota to their end-users that maps to their desired limit instead of their request since that is the value the user manages. ### Ability to limit impact to node and promote fair-use The current `ResourceQuota` API object does not allow the ability to quota best-effort pods separately from pods with resource guarantees. For example, if a cluster-admin applies a quota that caps requested cpu at 10 cores and memory at 10Gi, all pods in the namespace must make an explicit resource request for cpu and memory to satisfy quota. This prevents a namespace with a quota from supporting best-effort pods. In practice, the cluster-admin wants to control the impact of best-effort pods to the cluster, but not restrict the ability to run best-effort pods altogether. As a result, the cluster-admin requires the ability to control the max number of active best-effort pods. In addition, the cluster-admin requires the ability to scope a quota that limits compute resources to exclude best-effort pods. ### Ability to quota long-running vs. bounded-duration compute resources The cluster-admin may want to quota end-users separately based on long-running vs. bounded-duration compute resources. For example, a cluster-admin may offer more compute resources for long running pods that are expected to have a more permanent residence on the node than bounded-duration pods. Many batch style workloads tend to consume as much resource as they can until something else applies the brakes. As a result, these workloads tend to operate at their limit, while many traditional web applications may often consume closer to their request if there is no active traffic. An operator that wants to control density will offer lower quota limits for batch workloads than web applications. A classic example is a PaaS deployment where the cluster-admin may allow a separate budget for pods that run their web application vs. pods that build web applications. Another example is providing more quota to a database pod than a pod that performs a database migration. ## Use Cases * As a cluster-admin, I want the ability to quota * compute resource requests * compute resource limits * compute resources for terminating vs. non-terminating workloads * compute resources for best-effort vs. non-best-effort pods ## Proposed Change ### New quota tracked resources Support the following resources that can be tracked by quota. | Resource Name | Description | | ------------- | ----------- | | cpu | total cpu requests (backwards compatibility) | | memory | total memory requests (backwards compatibility) | | requests.cpu | total cpu requests | | requests.memory | total memory requests | | limits.cpu | total cpu limits | | limits.memory | total memory limits | ### Resource Quota Scopes Add the ability to associate a set of `scopes` to a quota. A quota will only measure usage for a `resource` if it matches the intersection of enumerated `scopes`. Adding a `scope` to a quota limits the number of resources it supports to those that pertain to the `scope`. Specifying a resource on the quota object outside of the allowed set would result in a validation error. | Scope | Description | | ----- | ----------- | | Terminating | Match `kind=Pod` where `spec.activeDeadlineSeconds >= 0` | | NotTerminating | Match `kind=Pod` where `spec.activeDeadlineSeconds = nil` | | BestEffort | Match `kind=Pod` where `status.qualityOfService in (BestEffort)` | | NotBestEffort | Match `kind=Pod` where `status.qualityOfService not in (BestEffort)` | A `BestEffort` scope restricts a quota to tracking the following resources: * pod A `Terminating`, `NotTerminating`, `NotBestEffort` scope restricts a quota to tracking the following resources: * pod * memory, requests.memory, limits.memory * cpu, requests.cpu, limits.cpu ## Data Model Impact ``` // The following identify resource constants for Kubernetes object types const ( // CPU request, in cores. (500m = .5 cores) ResourceRequestsCPU ResourceName = "requests.cpu" // Memory request, in bytes. (500Gi = 500GiB = 500 * 1024 * 1024 * 1024) ResourceRequestsMemory ResourceName = "requests.memory" // CPU limit, in cores. (500m = .5 cores) ResourceLimitsCPU ResourceName = "limits.cpu" // Memory limit, in bytes. (500Gi = 500GiB = 500 * 1024 * 1024 * 1024) ResourceLimitsMemory ResourceName = "limits.memory" ) // A scope is a filter that matches an object type ResourceQuotaScope string const ( ResourceQuotaScopeTerminating ResourceQuotaScope = "Terminating" ResourceQuotaScopeNotTerminating ResourceQuotaScope = "NotTerminating" ResourceQuotaScopeBestEffort ResourceQuotaScope = "BestEffort" ResourceQuotaScopeNotBestEffort ResourceQuotaScope = "NotBestEffort" ) // ResourceQuotaSpec defines the desired hard limits to enforce for Quota // The quota matches by default on all objects in its namespace. // The quota can optionally match objects that satisfy a set of scopes. type ResourceQuotaSpec struct { // Hard is the set of desired hard limits for each named resource Hard ResourceList `json:"hard,omitempty"` // A collection of filters that must match each object tracked by a quota. // If not specified, the quota matches all objects. Scopes []ResourceQuotaScope `json:"scopes,omitempty"` } ``` ## Rest API Impact None. ## Security Impact None. ## End User Impact The `kubectl` commands that render quota should display its scopes. ## Performance Impact This feature will make having more quota objects in a namespace more common in certain clusters. This impacts the number of quota objects that need to be incremented during creation of an object in admission control. It impacts the number of quota objects that need to be updated during controller loops. ## Developer Impact None. ## Alternatives This proposal initially enumerated a solution that leveraged a `FieldSelector` on a `ResourceQuota` object. A `FieldSelector` grouped an `APIVersion` and `Kind` with a selector over its fields that supported set-based requirements. It would have allowed a quota to track objects based on cluster defined attributes. For example, a quota could do the following: * match `Kind=Pod` where `spec.restartPolicy in (Always)` * match `Kind=Pod` where `spec.restartPolicy in (Never, OnFailure)` * match `Kind=Pod` where `status.qualityOfService in (BestEffort)` * match `Kind=Service` where `spec.type in (LoadBalancer)` * see [#17484](https://github.com/kubernetes/kubernetes/issues/17484) Theoretically, it would enable support for fine-grained tracking on a variety of resource types. While extremely flexible, there are cons to to this approach that make it premature to pursue at this time. * Generic field selectors are not yet settled art * see [#1362](https://github.com/kubernetes/kubernetes/issues/1362) * see [#19084](https://github.com/kubernetes/kubernetes/pull/19804) * Discovery API Limitations * Not possible to discover the set of field selectors supported by kind. * Not possible to discover if a field is readonly, readwrite, or immutable post-creation. The quota system would want to validate that a field selector is valid, and it would only want to select on those fields that are readonly/immutable post creation to make resource tracking work during update operations. The current proposal could grow to support a `FieldSelector` on a `ResourceQuotaSpec` and support a simple migration path to convert `scopes` to the matching `FieldSelector` once the project has identified how it wants to handle `fieldSelector` requirements longer term. This proposal previously discussed a solution that leveraged a `LabelSelector` as a mechanism to partition quota. This is potentially interesting to explore in the future to allow `namespace-admins` to quota workloads based on local knowledge. For example, a quota could match all kinds that match the selector `tier=cache, environment in (dev, qa)` separately from quota that matched `tier=cache, environment in (prod)`. This is interesting to explore in the future, but labels are insufficient selection targets for `cluster-administrators` to control footprint. In those instances, you need fields that are cluster controlled and not user-defined. ## Example ### Scenario 1 The cluster-admin wants to restrict the following: * limit 2 best-effort pods * limit 2 terminating pods that can not use more than 1Gi of memory, and 2 cpu cores * limit 4 long-running pods that can not use more than 4Gi of memory, and 4 cpu cores * limit 6 pods in total, 10 replication controllers This would require the following quotas to be added to the namespace: ``` $ cat quota-best-effort apiVersion: v1 kind: ResourceQuota metadata: name: quota-best-effort spec: hard: pods: "2" scopes: - BestEffort $ cat quota-terminating apiVersion: v1 kind: ResourceQuota metadata: name: quota-terminating spec: hard: pods: "2" memory.limit: 1Gi cpu.limit: 2 scopes: - Terminating - NotBestEffort $ cat quota-longrunning apiVersion: v1 kind: ResourceQuota metadata: name: quota-longrunning spec: hard: pods: "2" memory.limit: 4Gi cpu.limit: 4 scopes: - NotTerminating - NotBestEffort $ cat quota apiVersion: v1 kind: ResourceQuota metadata: name: quota spec: hard: pods: "6" replicationcontrollers: "10" ``` In the above scenario, every pod creation will result in its usage being tracked by `quota` since it has no additional scoping. The pod will then be tracked by at 1 additional quota object based on the scope it matches. In order for the pod creation to succeed, it must not violate the constraint of any matching quota. So for example, a best-effort pod would only be created if there was available quota in `quota-best-effort` and `quota`. ## Implementation ### Assignee @derekwaynecarr ### Work Items * Add support for requests and limits * Add support for scopes in quota-related admission and controller code ## Dependencies None. Longer term, we should evaluate what we want to do with `fieldSelector` as the requests around different quota semantics will continue to grow. ## Testing Appropriate unit and e2e testing will be authored. ## Documentation Impact Existing resource quota documentation and examples will be updated. [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/resource-quota-scoping.md?pixel)]()