Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Adding tests for ImageLocalityPriority
**What this PR does / why we need it**:
This PR adds tests for ImageLocalityPriority scheduling policy, as follow-ups of [#63842](https://github.com/kubernetes/kubernetes/issues/63842) and [#63345](https://github.com/kubernetes/kubernetes/issues/63345). It includes the unit test for ImageSizes function of NodeInfo in the scheduler cache.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
@resouer
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
minor fix for VolumeZoneChecker predicate
storageclass can be in annotation and spec.
```release-note
minor fix for VolumeZoneChecker predicate, storageclass can be in annotation and spec.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
use subtest for table units (pkg-scheduler-algorithm-priorities)
**What this PR does / why we need it**: Update scheduler's unit table tests to use subtest
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
**Special notes for your reviewer**:
breaks up PR: https://github.com/kubernetes/kubernetes/pull/63281
/ref #63267
**Release note**:
```release-note
This PR will leverage subtests on the existing table tests for the scheduler units.
Some refactoring of error/status messages and functions to align with new approach.
```
Automatic merge from submit-queue (batch tested with PRs 62354, 62934, 63502). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Refactor GetResourceRequest and GetResourceLimit
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
/assign @bsalamat
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Supported nodeSelector.matchFields in scheduler.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
part of #61410
**Special notes for your reviewer**:
**Release note**:
```release-note
Supported nodeSelector.matchFields (node's `metadata.node`) in scheduler.
```
Automatic merge from submit-queue (batch tested with PRs 58580, 63120). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
-Remove TODO comment of GetNonzeroRequests function
**What this PR does / why we need it**:
-Remove TODO comment of GetNonzeroRequests function
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
NONE
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add test for scheduler:VolumeCountConflicts
**What this PR does / why we need it**:
Add test for scheduler:VolumeCountConflicts
**Special notes for your reviewer**:
Automatic merge from submit-queue (batch tested with PRs 62937, 63105, 63031, 63174). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Revert "Revert "Revert revert of equivalence class hash calculation i…
…n scheduler""
This reverts commit 4386751b5d.
**What this PR does / why we need it**:
This re-introduces the change from https://github.com/kubernetes/kubernetes/pull/58555 which changes how the scheduler computes equivalence classes of pods. I believe we have fixed the flakiness observed previously (https://github.com/kubernetes/kubernetes/issues/61512, https://github.com/kubernetes/kubernetes/issues/62921). I have run the test in question a few dozen times without a failure.
```bash
make test-integration WHAT="./test/integration/scheduler" KUBE_TEST_ARGS="-run TestPreemptionStarvation" GOFLAGS="-v"
```
/ref https://github.com/kubernetes/kubernetes/issues/58222
**Special notes for your reviewer**:
I had to resolve several merge conflicts. I think I resolved them correctly, but keep an eye out for anything silly.
**Release note**:
```release-note
NONE
```
/sig scheduling
Automatic merge from submit-queue (batch tested with PRs 62982, 63075, 63067, 62877, 63141). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Removed e2e test on empty NodeAffinity.
Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#63027
**Special notes for your reviewer**:
In #62448, we removed the validation on empty `nodeAffinity` which is already handled in scheduler: select no objects.
**Release note**:
```release-note
None
```
Automatic merge from submit-queue (batch tested with PRs 62761, 62715). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix inter-pod anti-affinity check to consider a pod a match when all the anti-affinity terms match
**What this PR does / why we need it**:
Inter-pod anti-affinity check used to incorrectly consider a pod a match when any of the anti-affinity terms matched the pod. This PR fixes the logic to consider a pod a match when all the terms match the pod.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#62567
**Special notes for your reviewer**:
**Release note**:
```release-note
Fix inter-pod anti-affinity check to consider a pod a match when all the anti-affinity terms match.
```
/sig scheduling
Automatic merge from submit-queue (batch tested with PRs 62467, 62482, 62211). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Improve performance of affinity/anti-affinity predicate by 20x in large clusters
**What this PR does / why we need it**:
Improves performance of affinity/anti-affinity predicate by over 20x in large clusters. Performance improvement is smaller in small clusters, but it is still very significant and is about 4x. Also, before this PR, performance of the predicate was dropping quadratically with increasing size of nodes and pods. As the results shows, the slow down is now linear in larger clusters.
Affinity/anti-affinity predicate was checking all pods of the cluster for each node in the cluster to determine feasibility of affinit/anti-affinity terms of the pod being scheduled. This optimization first finds all the pods in a cluster that match the affinity/anti-affinity terms of the pod being scheduled once and stores the metadata. It then only checks the topology of the matching pods for each node in the cluster.
This results in major reduction of the search space per node and improves performance significantly.
Below results are obtained by running scheduler benchmarks:
```
make test-integration WHAT=./test/integration/scheduler_perf KUBE_TEST_ARGS="-run=xxx -bench=.*BenchmarkSchedulingAntiAffinity"
```
```
AntiAffinity Topology: Hostname
before: BenchmarkSchedulingAntiAffinity/500Nodes/250Pods-12 37031638 ns/op
after: BenchmarkSchedulingAntiAffinity/500Nodes/250Pods-12 10373222 ns/op
before: BenchmarkSchedulingAntiAffinity/500Nodes/5000Pods-12 134205302 ns/op
after: BenchmarkSchedulingAntiAffinity/500Nodes/5000Pods-12 12000580 ns/op
befor: BenchmarkSchedulingAntiAffinity/1000Nodes/10000Pods-12 498439953 ns/op
after: BenchmarkSchedulingAntiAffinity/1000Nodes/10000Pods-12 24692552 ns/op
AntiAffinity Topology: Region
before: BenchmarkSchedulingAntiAffinity/500Nodes/250Pods-12 60003672 ns/op
after: BenchmarkSchedulingAntiAffinity/500Nodes/250Pods-12 13346400 ns/op
before: BenchmarkSchedulingAntiAffinity/1000Nodes/10000Pods-12 600085491 ns/op
after: BenchmarkSchedulingAntiAffinity/1000Nodes/10000Pods-12 27783333 ns/op
```
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
ref/ #56032#47318#25319
**Release note**:
```release-note
improve performance of affinity/anti-affinity predicate of default scheduler significantly.
```
/sig scheduling
Automatic merge from submit-queue (batch tested with PRs 61918, 62180, 62198). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Use provided node object in volume binding predicate
**What this PR does / why we need it**:
Autoscaler creates fake node objects, so we should use the provided node object instead of looking up the node from the informer.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#62178
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 62063, 62169, 62155, 62139, 61445). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Schedule even if extender is not available when using extender
**What this PR does / why we need it**:
When using scheduler extender, if the extender is not available scheduling of all pods fail.
We should let the scheduling happen but display error message that extender is failing.
`IsIgnorable()` is added to extender to indicate: if scheduling of all pods should fail when it's unavailable
**Backward compabtiility:**
We use `IsIgnorable` instead of `IsCritical` so that when this flag is not set, the default value will be `false`, i.e. not ignorable, which consistent with the current behavior in existing extenders.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes: #60616
**Special notes for your reviewer**:
kindly cc @ravisantoshgudimetla to see if this meets your expectation
TODO: update the examples in kubernetes/examples, but the strategy there is not clear to me for now
**Release note**:
```release-note
Schedule even if extender is not available when using extender
```
Automatic merge from submit-queue (batch tested with PRs 60073, 58519, 61860). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Resources prefixed with *kubernetes.io/ should remain unscheduled if they are not exposed on the node.
Currently, resources prefixed with `*kubernetes.io/` get scheduled to any
node whether it's exposing that resource or not.
On the other hand, resources prefixed with `someother.domain/` don't get
scheduled to a node until that node is exposing that resource (or if the
resource is ignored because of scheduler extender).
This commit brings the behavior of `*kubernetes.io/` prefixed resources in
line with other extended resources and they will remain unscheduled
until some node exposes these resources.
Fixes#50658
```release-note
Pods requesting resources prefixed with `*kubernetes.io` will remain unscheduled if there are no nodes exposing that resource.
```
/sig scheduling
/assign jiayingz vishh bsalamat ConnorDoyle k82cn
Automatic merge from submit-queue (batch tested with PRs 54997, 61869, 61816, 61909, 60525). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Balanced resource allocation priority to include volume count on nodes.
Scheduler balanced resource allocation priority to include volume count on nodes.
/cc @aveshagarwal @abhgupta
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#58232
**Release note**:
```release-note
Balanced resource allocation priority in scheduler to include volume count on node
```
Currently, resources prefixed with *kubernetes.io/ get scheduled to any
node whether it's exposing that resource or not.
On the other hand, resources prefixed with someother.domain/ don't get
scheduled to a node until that node is exposing that resource (or if the
resource is ignored because of scheduler extender).
This commit brings the behavior of *kubernetes.io/ prefixed resources in
line with other extended resources and they will remain unscheduled
until some node exposes these resources.
This also includes renaming IsDefaultNamespaceResource() to
IsNativeResource().
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Do not consider pods being deleted in the same namespace for spreading purposes for service anti-affinity priority similar to selectorspread priority.
**What this PR does / why we need it**:
Currently for service anti-affinity priority, pods being deleted in the same namespace are being considered in computation for spreading purposes, which should not happen. This PR aligns it with selectorspread priority, which also does spreading and does not consider pods being deleted in the same namespace.
@bsalamat @timothysc @kubernetes/sig-scheduling-bugs
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
None
```
Automatic merge from submit-queue (batch tested with PRs 60898, 60912, 60753, 61002, 60796). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Revert revert of equivalence class hash calculation in scheduler
**What this PR does / why we need it**:
NOTE: This is a revert revert of https://github.com/kubernetes/kubernetes/pull/58555
But since the original PR has been changed, I have to copy the original changes and resend this new PR. See: https://github.com/kubernetes/kubernetes/pull/58555#issuecomment-364345972
And I kept @misterikkit 's change as the first commit (by co-author feature of github) in the history.
We decide to do revert revert because #58989 has been fixed, which should help to improve the time consumed by integration test.
**But** we should still pay attention to integration tests to see if there's frequent timeout happen.
**Special notes for your reviewer**:
**Release note**:
```release-note
Improve equivalence class hash calculation in scheduler
```
Automatic merge from submit-queue (batch tested with PRs 60759, 60531, 60923, 60851, 58717). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Implement preemption for extender with a verb and new interface
**What this PR does / why we need it**:
This is an alternative way of implementing #51656
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#51656
**Special notes for your reviewer**:
We will also want to compare with #56296 to see which one is the best solution. See: https://github.com/kubernetes/kubernetes/pull/56296#discussion_r163381235
cc @ravigadde @bsalamat
**Release note**:
```release-note
Implement preemption for extender with a verb and new interface
```