Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Adding tests for ImageLocalityPriority
**What this PR does / why we need it**:
This PR adds tests for ImageLocalityPriority scheduling policy, as follow-ups of [#63842](https://github.com/kubernetes/kubernetes/issues/63842) and [#63345](https://github.com/kubernetes/kubernetes/issues/63345). It includes the unit test for ImageSizes function of NodeInfo in the scheduler cache.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
@resouer
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 61963, 64279, 64130, 64125, 64049). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix TestSchedulerWithVolumeBinding to avoid setting predicate ordering.
It is causing data race condition as predicate ordering is changing global
variable `predicatesOrdering`. Infact this test does not require any special
predicate order and should work on default predicate ordering as far as
VolumeScheduling feature is enabled.
See these logs:
```
==================
==================
WARNING: DATA RACE
Read at 0x00c420894180 by goroutine 156:
k8s.io/kubernetes/pkg/scheduler/core.podFitsOnNode()
/home/avagarwa/upstream-code/gocode/src/k8s.io/kubernetes/pkg/scheduler/core/generic_scheduler.go:503 +0xbb
k8s.io/kubernetes/pkg/scheduler/core.(*genericScheduler).findNodesThatFit.func1()
/home/avagarwa/upstream-code/gocode/src/k8s.io/kubernetes/pkg/scheduler/core/generic_scheduler.go:353 +0x2f0
k8s.io/kubernetes/vendor/k8s.io/client-go/util/workqueue.Parallelize.func1()
/home/avagarwa/upstream-code/gocode/src/k8s.io/kubernetes/vendor/k8s.io/client-go/util/workqueue/parallelizer.go:47 +0xa3
Previous write at 0x00c420894180 by goroutine 186:
k8s.io/kubernetes/pkg/scheduler.TestSchedulerWithVolumeBinding()
/home/avagarwa/upstream-code/gocode/src/k8s.io/kubernetes/pkg/scheduler/scheduler_test.go:663 +0x71
testing.tRunner()
/usr/lib/golang/src/testing/testing.go:777 +0x16d
Goroutine 156 (running) created at:
k8s.io/kubernetes/vendor/k8s.io/client-go/util/workqueue.Parallelize()
/home/avagarwa/upstream-code/gocode/src/k8s.io/kubernetes/vendor/k8s.io/client-go/util/workqueue/parallelizer.go:43 +0x139
k8s.io/kubernetes/pkg/scheduler/core.(*genericScheduler).findNodesThatFit()
/home/avagarwa/upstream-code/gocode/src/k8s.io/kubernetes/pkg/scheduler/core/generic_scheduler.go:378 +0xe8a
k8s.io/kubernetes/pkg/scheduler/core.(*genericScheduler).Schedule()
/home/avagarwa/upstream-code/gocode/src/k8s.io/kubernetes/pkg/scheduler/core/generic_scheduler.go:131 +0x385
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).schedule()
/home/avagarwa/upstream-code/gocode/src/k8s.io/kubernetes/pkg/scheduler/scheduler.go:192 +0xcd
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).scheduleOne()
/home/avagarwa/upstream-code/gocode/src/k8s.io/kubernetes/pkg/scheduler/scheduler.go:447 +0x598
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).(k8s.io/kubernetes/pkg/scheduler.scheduleOne)-fm()
/home/avagarwa/upstream-code/gocode/src/k8s.io/kubernetes/pkg/scheduler/scheduler.go:182 +0x41
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1()
/home/avagarwa/upstream-code/gocode/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x61
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil()
/home/avagarwa/upstream-code/gocode/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134 +0xcd
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.Until()
/home/avagarwa/upstream-code/gocode/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x5a
Goroutine 186 (running) created at:
testing.(*T).Run()
/usr/lib/golang/src/testing/testing.go:824 +0x564
testing.runTests.func1()
/usr/lib/golang/src/testing/testing.go:1063 +0xa4
testing.tRunner()
/usr/lib/golang/src/testing/testing.go:777 +0x16d
testing.runTests()
/usr/lib/golang/src/testing/testing.go:1061 +0x4e1
testing.(*M).Run()
/usr/lib/golang/src/testing/testing.go:978 +0x2cd
main.main()
_testmain.go:52 +0x22a
==================
--- FAIL: TestSchedulerWithVolumeBinding (18.04s)
testing.go:730: race detected during execution of test
FAIL
```
It is pretty easy to reproduce this race by following these steps:
```
cd pkg/scheduler
go test -c -race
stress -p 100 ./scheduler.test
```
Predicate ordering to this unit test was added here: https://github.com/kubernetes/kubernetes/pull/57168
Since the whole scheduler instance uses just one ordering at time, not sure what is the advantage.
@kubernetes/sig-scheduling-bugs @bsalamat @k82cn @frobware @smarterclayton @sjenning
```release-note
None
```
Automatic merge from submit-queue (batch tested with PRs 63434, 64172, 63975, 64180, 63755). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Optimize the lock which in the RunPredicate
**What this PR does / why we need it**:
Enhance the performance of scheduler
- Change the lock in the RunPredicate from lock to rlock
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Could solve part of #63784
**Special notes for your reviewer**:
_Run benchmark test by scheduler_perf_:
`Before` BenchmarkScheduling/1000Nodes/0Pods-32 1000 11689758 ns/op
`After` BenchmarkScheduling/1000Nodes/0Pods-32 1000 5951510 ns/op
_Run integration (density) test by scheduler_perf_:
Schedule 3000 Pods On 3000 Nodes
`Before` rate 19 per second on average
`After` rate 58 per second on average
_Cpu profile test result_:
`Before` [click](https://cdn.rawgit.com/godliness/files/master/63784_before.svg)
`After` [click](https://cdn.rawgit.com/godliness/files/master/63784_after.svg)
**Release note**:
```release-note
`None`
```
/sig scheduling
/cc @misterikkit
/cc @bsalamat
/cc @ravisantoshgudimetla
/cc @resouer
Automatic merge from submit-queue (batch tested with PRs 64127, 63895, 64066, 64215, 64202). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add warnings about cache invalidation.
Part of https://github.com/kubernetes/kubernetes/pull/63040 is the
assumption that scheduler cache updates must happen before equivalence
cache updates for any given informer event.
The reason for this is that the equivalence cache implementation checks
the main cache for staleness while holding the equiv. cache write lock.
case 1: If an informer invalidates an equiv. cache entry before the
staleness check, then we know that the main cache update completed.
case 2: If an informer blocks trying to grab the equiv. cache lock, then
invalidation will occur right after the potentially stale update is
written.
This patch adds a note to places where we invalidate the equivalence
cache so that hopefully nobody violates this invariant.
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
/kind cleanup
/sig scheduling
Automatic merge from submit-queue (batch tested with PRs 64174, 64187, 64216, 63265, 64223). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Do not use DeepEqual to compare slices in test.
This wraps DeepEqual with a helper that considers nil slices and empty
slices to be equal.
Scheduler code might use a nil slice or empty slice to represent an
empty list, so tests should not be sensitive to the difference. Tests
could fail because DeepEqual considers nil to be different from an empty
slice.
**What this PR does / why we need it**:
Avoid breaking tests in cases where application behavior is not changed.
**Special notes for your reviewer**:
This brittle test keeps breaking in a number of my PRs. Hoping to get this fix merged independently.
**Release note**:
```release-note
NONE
```
/sig scheduling
/kind cleanup
This wraps DeepEqual with a helper that considers nil slices and empty
slices to be equal.
Scheduler code might use a nil slice or empty slice to represent an
empty list, so tests should not be sensitive to the difference. Tests
could fail because DeepEqual considers nil to be different from an empty
slice.
Automatic merge from submit-queue (batch tested with PRs 63283, 64032, 64159, 64126, 64098). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
remove unused code of (pkg/scheduler)
**What this PR does / why we need it**:
/kind cleanup
remove unused code
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Part of https://github.com/kubernetes/kubernetes/pull/63040 is the
assumption that scheduler cache updates must happen before equivalence
cache updates for any given informer event.
The reason for this is that the equivalence cache implementation checks
the main cache for staleness while holding the equiv. cache write lock.
case 1: If an informer invalidates an equiv. cache entry before the
staleness check, then we know that the main cache update completed.
case 2: If an informer blocks trying to grab the equiv. cache lock, then
invalidation will occur right after the potentially stale update is
written.
This patch adds a note to places where we invalidate the equivalence
cache so that hopefully nobody violates this invariant.
Automatic merge from submit-queue (batch tested with PRs 63598, 63913, 63459, 63963, 60464). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Check nodeInfo before ecache predicate
**What this PR does / why we need it**:
There's chances during test when nodeInfo is nil which may cause ecache predicate fail with nil pointer.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#63427
**Special notes for your reviewer**:
Not sure how to reproduce the original issue yet. i.e. why and when `nodeInfo` will become nil in tests is not clear to me, that's why I label it as WIP.
cc @bsalamat who may have more inputs.
**Release note**:
```release-note
NONE
```
It is causing data race condition as predicate ordering is changing global
variable predicatesOrdering. Infact this test does not require any special
predicate order and should work on default predicate ordering as far as
VolumeScheduling feature is enabled.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
minor fix for VolumeZoneChecker predicate
storageclass can be in annotation and spec.
```release-note
minor fix for VolumeZoneChecker predicate, storageclass can be in annotation and spec.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
use subtest for table units (pkg-scheduler-algorithm-priorities)
**What this PR does / why we need it**: Update scheduler's unit table tests to use subtest
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
**Special notes for your reviewer**:
breaks up PR: https://github.com/kubernetes/kubernetes/pull/63281
/ref #63267
**Release note**:
```release-note
This PR will leverage subtests on the existing table tests for the scheduler units.
Some refactoring of error/status messages and functions to align with new approach.
```
Automatic merge from submit-queue (batch tested with PRs 63603, 63557, 62015). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Clean up equiv cache with a simple implementation instead of LRU
**What this PR does / why we need it**:
The original version of equiv cache use pod hash as cache key, also, the predicate order is not fixed. So I used a LRU cache to improve hit rate.
While now we've already refactored it to use predicates as keys, and its order was also fixed in scheduler, we can use a simplest cache instead now.
**Special notes for your reviewer**:
The question is brought up by @misterikkit
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 62354, 62934, 63502). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Refactor GetResourceRequest and GetResourceLimit
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
/assign @bsalamat
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Supported nodeSelector.matchFields in scheduler.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
part of #61410
**Special notes for your reviewer**:
**Release note**:
```release-note
Supported nodeSelector.matchFields (node's `metadata.node`) in scheduler.
```
Automatic merge from submit-queue (batch tested with PRs 58580, 63120). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
-Remove TODO comment of GetNonzeroRequests function
**What this PR does / why we need it**:
-Remove TODO comment of GetNonzeroRequests function
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
NONE
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Increase scheduler cache generation number monotonically in order to avoid collision
**What this PR does / why we need it**:
Increments the scheduler cache generation number monotonically to avoid collision of the generation numbers. More context in #63262.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#63262.
**Special notes for your reviewer**:
**Release note**:
```release-note
Increase scheduler cache generation number monotonically in order to avoid collision and use of stale information in scheduler.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
scheduler: clean up and simplify equivalence cache locking
**What this PR does / why we need it**:
This is a cleanup of the locking code for equivalence cache. There is no change to the current logic or locking. This PR has a couple of implications, though.
1. It deletes (unreachable) code that could have been used to cache predicate results that consider nominated pods.
2. Callers should no longer lock/unlock the eCache manually, so coordinating that lock with other synchronization is restricted.
**Special notes for your reviewer**:
**Release note**:
<!-- Write your release note:
1. Enter your extended release note in the below block. If the PR requires additional action from users switching to the new release, include the string "**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
action required".
2. If no release note is required, just write "NONE".
-->
```release-note
NONE
```
/sig scheduling
/kind cleanup
This changes two methods in EquivalenceCache to be unexported, because
they should no longer be called by users of this type. (Even users in
the same package!)
The purpose of this map is to combine two predicate results before
writing to the equivalence cache. However, the branch that combines
results is unreachable.
1. Combining results happens in the second iteration of the outer loop.
2. There is only a second iteration when podsAdded is true.
3. We skip equiv. cache when podsAdded is true.
This method combines "lookup" and "update" into one operation. The
benefit is that this method call is very similar to running an ordinary
predicate, so callers can simplify their code.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add test for scheduler:VolumeCountConflicts
**What this PR does / why we need it**:
Add test for scheduler:VolumeCountConflicts
**Special notes for your reviewer**:
Automatic merge from submit-queue (batch tested with PRs 62937, 63105, 63031, 63174). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Revert "Revert "Revert revert of equivalence class hash calculation i…
…n scheduler""
This reverts commit 4386751b5d.
**What this PR does / why we need it**:
This re-introduces the change from https://github.com/kubernetes/kubernetes/pull/58555 which changes how the scheduler computes equivalence classes of pods. I believe we have fixed the flakiness observed previously (https://github.com/kubernetes/kubernetes/issues/61512, https://github.com/kubernetes/kubernetes/issues/62921). I have run the test in question a few dozen times without a failure.
```bash
make test-integration WHAT="./test/integration/scheduler" KUBE_TEST_ARGS="-run TestPreemptionStarvation" GOFLAGS="-v"
```
/ref https://github.com/kubernetes/kubernetes/issues/58222
**Special notes for your reviewer**:
I had to resolve several merge conflicts. I think I resolved them correctly, but keep an eye out for anything silly.
**Release note**:
```release-note
NONE
```
/sig scheduling
Automatic merge from submit-queue (batch tested with PRs 62432, 62868, 63040). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
scheduler: fix race condition in equivalence cache
**What this PR does / why we need it**:
This adds an equivalence cache test to exercise the race condition observed in https://github.com/kubernetes/kubernetes/issues/62921 and then fixes the race.
The `Cache` interface needed a new method to check whether a `NodeInfo` is stale, and `genericScheduler` needed some plumbing to make the `Cache` object available to `podFitsOnNode()`.
The solution is, right before writing to the eCache, check the scheduler cache to see if the current `NodeInfo` object is out of date. If the node is out of date, then don't write to the eCache. If the `NodeInfo` is stale, it is because of a cache update that should also invalidate the eCache entry. That invalidation either happens before `podFitsOnNode()` acquires the eCache lock (original bug, so we don't do the write) or blocks until we release that lock (removing the potentially bad entry).
Fixes#62921
**Special notes for your reviewer**:
**Release note**:
equivalence cache is still alpha, so no release note.
```release-note
NONE
```
/sig scheduling
/assign bsalalamat
/assign resouer
Automatic merge from submit-queue (batch tested with PRs 62982, 63075, 63067, 62877, 63141). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Removed e2e test on empty NodeAffinity.
Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#63027
**Special notes for your reviewer**:
In #62448, we removed the validation on empty `nodeAffinity` which is already handled in scheduler: select no objects.
**Release note**:
```release-note
None
```
Because the scheduler takes a snapshot of cache data at the start of
each scheduling cycle, updates to the equivalence cache should be
skipped if there was a cache update during the cycle.
If the current NodeInfo becomes stale while we evaluate predicates, we
will not write any results into the equivalence cache. We will still use
the results for the current scheduling cycle, though.
This allows scheduler implementations to check if a NodeInfo object
matches the current state of the cache. Useful if the NodeInfo in
question came from a Snapshot() for example.
Automatic merge from submit-queue (batch tested with PRs 63129, 63066, 60009, 63136, 63086). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
add node shutdown taint
**What this PR does / why we need it**: we need node stopped taint in order to detach volumes immediately without waiting timeout. More info in issue ticket #58635
**Which issue(s) this PR fixes**
Fixes#58635
**Special notes for your reviewer**: this was reverted, original PR https://github.com/kubernetes/kubernetes/pull/59323 Hopefully now bugs are fixed. However, I will execute more tests manually today.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 61324, 62880, 62765). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
-Fix the name could cause a conflict if an object with the same name …
…is created in a different namespace
**What this PR does / why we need it**:
/kind bug
Using the name could cause a conflict if an object with the same name is created in a different namespace
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
#62750
**Special notes for your reviewer**:
/assign @bsalamat
**Release note**:
```
NONE
```
Automatic merge from submit-queue (batch tested with PRs 59592, 62308, 62523, 62635, 62243). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Separate pod priority from preemption
**What this PR does / why we need it**:
Users request to split priority and preemption feature gate so they can use priority separately.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#62068
**Special notes for your reviewer**:
~~I kept use `ENABLE_POD_PRIORITY` as ENV name for gce cluster scripts for backward compatibility reason. Please let me know if other approach is preffered.~~
~~This is a potential **break change** as existing clusters will be affected, we may need to include this in 1.11 maybe?~~
TODO: update this doc https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/
[Update] Usage: in config file for scheduler:
```yaml
apiVersion: componentconfig/v1alpha1
kind: KubeSchedulerConfiguration
...
disablePreemption: true
```
**Release note**:
```release-note
Split PodPriority and PodPreemption feature gate
```
Automatic merge from submit-queue (batch tested with PRs 62761, 62715). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix inter-pod anti-affinity check to consider a pod a match when all the anti-affinity terms match
**What this PR does / why we need it**:
Inter-pod anti-affinity check used to incorrectly consider a pod a match when any of the anti-affinity terms matched the pod. This PR fixes the logic to consider a pod a match when all the terms match the pod.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#62567
**Special notes for your reviewer**:
**Release note**:
```release-note
Fix inter-pod anti-affinity check to consider a pod a match when all the anti-affinity terms match.
```
/sig scheduling
Automatic merge from submit-queue (batch tested with PRs 62467, 62482, 62211). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Improve performance of affinity/anti-affinity predicate by 20x in large clusters
**What this PR does / why we need it**:
Improves performance of affinity/anti-affinity predicate by over 20x in large clusters. Performance improvement is smaller in small clusters, but it is still very significant and is about 4x. Also, before this PR, performance of the predicate was dropping quadratically with increasing size of nodes and pods. As the results shows, the slow down is now linear in larger clusters.
Affinity/anti-affinity predicate was checking all pods of the cluster for each node in the cluster to determine feasibility of affinit/anti-affinity terms of the pod being scheduled. This optimization first finds all the pods in a cluster that match the affinity/anti-affinity terms of the pod being scheduled once and stores the metadata. It then only checks the topology of the matching pods for each node in the cluster.
This results in major reduction of the search space per node and improves performance significantly.
Below results are obtained by running scheduler benchmarks:
```
make test-integration WHAT=./test/integration/scheduler_perf KUBE_TEST_ARGS="-run=xxx -bench=.*BenchmarkSchedulingAntiAffinity"
```
```
AntiAffinity Topology: Hostname
before: BenchmarkSchedulingAntiAffinity/500Nodes/250Pods-12 37031638 ns/op
after: BenchmarkSchedulingAntiAffinity/500Nodes/250Pods-12 10373222 ns/op
before: BenchmarkSchedulingAntiAffinity/500Nodes/5000Pods-12 134205302 ns/op
after: BenchmarkSchedulingAntiAffinity/500Nodes/5000Pods-12 12000580 ns/op
befor: BenchmarkSchedulingAntiAffinity/1000Nodes/10000Pods-12 498439953 ns/op
after: BenchmarkSchedulingAntiAffinity/1000Nodes/10000Pods-12 24692552 ns/op
AntiAffinity Topology: Region
before: BenchmarkSchedulingAntiAffinity/500Nodes/250Pods-12 60003672 ns/op
after: BenchmarkSchedulingAntiAffinity/500Nodes/250Pods-12 13346400 ns/op
before: BenchmarkSchedulingAntiAffinity/1000Nodes/10000Pods-12 600085491 ns/op
after: BenchmarkSchedulingAntiAffinity/1000Nodes/10000Pods-12 27783333 ns/op
```
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
ref/ #56032#47318#25319
**Release note**:
```release-note
improve performance of affinity/anti-affinity predicate of default scheduler significantly.
```
/sig scheduling
Automatic merge from submit-queue (batch tested with PRs 61147, 62236, 62018). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
spec.SchedulerName should be spec.schedulerName in kube-scheduler help
**What this PR does / why we need it**:
spec.SchedulerName should be spec.schedulerName in kube-scheduler help
```shell
--scheduler-name string Name of the scheduler, used to select which pods will be processed by this scheduler, based on pod's "spec.SchedulerName". (default "default-scheduler")
```
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 61918, 62180, 62198). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Use provided node object in volume binding predicate
**What this PR does / why we need it**:
Autoscaler creates fake node objects, so we should use the provided node object instead of looking up the node from the informer.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#62178
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 62063, 62169, 62155, 62139, 61445). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Schedule even if extender is not available when using extender
**What this PR does / why we need it**:
When using scheduler extender, if the extender is not available scheduling of all pods fail.
We should let the scheduling happen but display error message that extender is failing.
`IsIgnorable()` is added to extender to indicate: if scheduling of all pods should fail when it's unavailable
**Backward compabtiility:**
We use `IsIgnorable` instead of `IsCritical` so that when this flag is not set, the default value will be `false`, i.e. not ignorable, which consistent with the current behavior in existing extenders.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes: #60616
**Special notes for your reviewer**:
kindly cc @ravisantoshgudimetla to see if this meets your expectation
TODO: update the examples in kubernetes/examples, but the strategy there is not clear to me for now
**Release note**:
```release-note
Schedule even if extender is not available when using extender
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Update OWNERS labels for cluster-lifecycle and scheduling
**What this PR does / why we need it**:
Updates auto labeling to make everyone's lives easier.
**Special notes for your reviewer**:
**Release note**:
```
NONE
```
/cc @kubernetes/sig-cluster-lifecycle-pr-reviews @kubernetes/sig-scheduling-pr-reviews
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
remove pvc node affinity update check since beta NodeAffinity is immu…
…table
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
xref https://github.com/kubernetes/kubernetes/pull/61816#discussion_r178212208
**Special notes for your reviewer**:
/assign @msau42
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Disabled MemoryPressure and DiskPressure predicates if TaintNodesByCondition enabled
Signed-off-by: Da K. Ma <madaxa@cn.ibm.com>
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#60397
**Release note**:
```release-note
Disabled CheckNodeMemoryPressure and CheckNodeDiskPressure predicates if TaintNodesByCondition enabled
```
Automatic merge from submit-queue (batch tested with PRs 60073, 58519, 61860). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Resources prefixed with *kubernetes.io/ should remain unscheduled if they are not exposed on the node.
Currently, resources prefixed with `*kubernetes.io/` get scheduled to any
node whether it's exposing that resource or not.
On the other hand, resources prefixed with `someother.domain/` don't get
scheduled to a node until that node is exposing that resource (or if the
resource is ignored because of scheduler extender).
This commit brings the behavior of `*kubernetes.io/` prefixed resources in
line with other extended resources and they will remain unscheduled
until some node exposes these resources.
Fixes#50658
```release-note
Pods requesting resources prefixed with `*kubernetes.io` will remain unscheduled if there are no nodes exposing that resource.
```
/sig scheduling
/assign jiayingz vishh bsalamat ConnorDoyle k82cn
Automatic merge from submit-queue (batch tested with PRs 54997, 61869, 61816, 61909, 60525). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Balanced resource allocation priority to include volume count on nodes.
Scheduler balanced resource allocation priority to include volume count on nodes.
/cc @aveshagarwal @abhgupta
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#58232
**Release note**:
```release-note
Balanced resource allocation priority in scheduler to include volume count on node
```
Currently, resources prefixed with *kubernetes.io/ get scheduled to any
node whether it's exposing that resource or not.
On the other hand, resources prefixed with someother.domain/ don't get
scheduled to a node until that node is exposing that resource (or if the
resource is ignored because of scheduler extender).
This commit brings the behavior of *kubernetes.io/ prefixed resources in
line with other extended resources and they will remain unscheduled
until some node exposes these resources.
This also includes renaming IsDefaultNamespaceResource() to
IsNativeResource().
Automatic merge from submit-queue (batch tested with PRs 60519, 61099, 61218, 61166, 61714). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Automatically add system critical priority classes at cluster boostrapping
**What this PR does / why we need it**:
We had two PriorityClasses that were hardcoded and special cased in our code base. These two priority classes never existed in API server. Priority admission controller had code to resolve these two names. This PR removes the hardcoded PriorityClasses and adds code to create these PriorityClasses automatically when API server starts.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#60178
ref/ #57471
**Special notes for your reviewer**:
**Release note**:
```release-note
Automatically add system critical priority classes at cluster boostrapping.
```
/sig scheduling
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Do not consider pods being deleted in the same namespace for spreading purposes for service anti-affinity priority similar to selectorspread priority.
**What this PR does / why we need it**:
Currently for service anti-affinity priority, pods being deleted in the same namespace are being considered in computation for spreading purposes, which should not happen. This PR aligns it with selectorspread priority, which also does spreading and does not consider pods being deleted in the same namespace.
@bsalamat @timothysc @kubernetes/sig-scheduling-bugs
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
None
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Remove YEAR field of all generated files and fix kubernetes boilerplate checker
**What this PR does / why we need it**:
Remove YEAR field of all generated files and fix kubernetes boilerplate checker
xref: [remove YEAR fileds in gengo #91](https://github.com/kubernetes/gengo/pull/91)
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes [#gengo/issues/24](https://github.com/kubernetes/gengo/issues/24)
**Special notes for your reviewer**:
/cc @thockin @lavalamp @sttts
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 57871, 61094, 60459, 61089, 61105). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add namespace when we describe pod
**What this PR does / why we need it**:
Add namespace when we describe pod
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 60919, 60953, 61085, 61083, 60971). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Sched cache resync
**What this PR does / why we need it**: Scheduler cache comparer
A debug tool that collects resources from api server and compares it
with the scheduler cache. It currently only compares the node list, but
it should be easy to extend. The compare is triggered by signal USER2,
by doing
kill -12 ${SCHED_PID}
The compare result goes to scheduler log.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Towards #60860
**Special notes for your reviewer**: @bsalamat
**Release note**:
```release-note
None
```