Automatic merge from submit-queue (batch tested with PRs 62467, 62482, 62211). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Improve performance of affinity/anti-affinity predicate by 20x in large clusters
**What this PR does / why we need it**:
Improves performance of affinity/anti-affinity predicate by over 20x in large clusters. Performance improvement is smaller in small clusters, but it is still very significant and is about 4x. Also, before this PR, performance of the predicate was dropping quadratically with increasing size of nodes and pods. As the results shows, the slow down is now linear in larger clusters.
Affinity/anti-affinity predicate was checking all pods of the cluster for each node in the cluster to determine feasibility of affinit/anti-affinity terms of the pod being scheduled. This optimization first finds all the pods in a cluster that match the affinity/anti-affinity terms of the pod being scheduled once and stores the metadata. It then only checks the topology of the matching pods for each node in the cluster.
This results in major reduction of the search space per node and improves performance significantly.
Below results are obtained by running scheduler benchmarks:
```
make test-integration WHAT=./test/integration/scheduler_perf KUBE_TEST_ARGS="-run=xxx -bench=.*BenchmarkSchedulingAntiAffinity"
```
```
AntiAffinity Topology: Hostname
before: BenchmarkSchedulingAntiAffinity/500Nodes/250Pods-12 37031638 ns/op
after: BenchmarkSchedulingAntiAffinity/500Nodes/250Pods-12 10373222 ns/op
before: BenchmarkSchedulingAntiAffinity/500Nodes/5000Pods-12 134205302 ns/op
after: BenchmarkSchedulingAntiAffinity/500Nodes/5000Pods-12 12000580 ns/op
befor: BenchmarkSchedulingAntiAffinity/1000Nodes/10000Pods-12 498439953 ns/op
after: BenchmarkSchedulingAntiAffinity/1000Nodes/10000Pods-12 24692552 ns/op
AntiAffinity Topology: Region
before: BenchmarkSchedulingAntiAffinity/500Nodes/250Pods-12 60003672 ns/op
after: BenchmarkSchedulingAntiAffinity/500Nodes/250Pods-12 13346400 ns/op
before: BenchmarkSchedulingAntiAffinity/1000Nodes/10000Pods-12 600085491 ns/op
after: BenchmarkSchedulingAntiAffinity/1000Nodes/10000Pods-12 27783333 ns/op
```
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
ref/ #56032#47318#25319
**Release note**:
```release-note
improve performance of affinity/anti-affinity predicate of default scheduler significantly.
```
/sig scheduling
Automatic merge from submit-queue (batch tested with PRs 61147, 62236, 62018). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
spec.SchedulerName should be spec.schedulerName in kube-scheduler help
**What this PR does / why we need it**:
spec.SchedulerName should be spec.schedulerName in kube-scheduler help
```shell
--scheduler-name string Name of the scheduler, used to select which pods will be processed by this scheduler, based on pod's "spec.SchedulerName". (default "default-scheduler")
```
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 61918, 62180, 62198). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Use provided node object in volume binding predicate
**What this PR does / why we need it**:
Autoscaler creates fake node objects, so we should use the provided node object instead of looking up the node from the informer.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#62178
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 62063, 62169, 62155, 62139, 61445). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Schedule even if extender is not available when using extender
**What this PR does / why we need it**:
When using scheduler extender, if the extender is not available scheduling of all pods fail.
We should let the scheduling happen but display error message that extender is failing.
`IsIgnorable()` is added to extender to indicate: if scheduling of all pods should fail when it's unavailable
**Backward compabtiility:**
We use `IsIgnorable` instead of `IsCritical` so that when this flag is not set, the default value will be `false`, i.e. not ignorable, which consistent with the current behavior in existing extenders.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes: #60616
**Special notes for your reviewer**:
kindly cc @ravisantoshgudimetla to see if this meets your expectation
TODO: update the examples in kubernetes/examples, but the strategy there is not clear to me for now
**Release note**:
```release-note
Schedule even if extender is not available when using extender
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Update OWNERS labels for cluster-lifecycle and scheduling
**What this PR does / why we need it**:
Updates auto labeling to make everyone's lives easier.
**Special notes for your reviewer**:
**Release note**:
```
NONE
```
/cc @kubernetes/sig-cluster-lifecycle-pr-reviews @kubernetes/sig-scheduling-pr-reviews
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
remove pvc node affinity update check since beta NodeAffinity is immu…
…table
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
xref https://github.com/kubernetes/kubernetes/pull/61816#discussion_r178212208
**Special notes for your reviewer**:
/assign @msau42
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Disabled MemoryPressure and DiskPressure predicates if TaintNodesByCondition enabled
Signed-off-by: Da K. Ma <madaxa@cn.ibm.com>
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#60397
**Release note**:
```release-note
Disabled CheckNodeMemoryPressure and CheckNodeDiskPressure predicates if TaintNodesByCondition enabled
```
Automatic merge from submit-queue (batch tested with PRs 60073, 58519, 61860). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Resources prefixed with *kubernetes.io/ should remain unscheduled if they are not exposed on the node.
Currently, resources prefixed with `*kubernetes.io/` get scheduled to any
node whether it's exposing that resource or not.
On the other hand, resources prefixed with `someother.domain/` don't get
scheduled to a node until that node is exposing that resource (or if the
resource is ignored because of scheduler extender).
This commit brings the behavior of `*kubernetes.io/` prefixed resources in
line with other extended resources and they will remain unscheduled
until some node exposes these resources.
Fixes#50658
```release-note
Pods requesting resources prefixed with `*kubernetes.io` will remain unscheduled if there are no nodes exposing that resource.
```
/sig scheduling
/assign jiayingz vishh bsalamat ConnorDoyle k82cn
Automatic merge from submit-queue (batch tested with PRs 54997, 61869, 61816, 61909, 60525). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Balanced resource allocation priority to include volume count on nodes.
Scheduler balanced resource allocation priority to include volume count on nodes.
/cc @aveshagarwal @abhgupta
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#58232
**Release note**:
```release-note
Balanced resource allocation priority in scheduler to include volume count on node
```
Currently, resources prefixed with *kubernetes.io/ get scheduled to any
node whether it's exposing that resource or not.
On the other hand, resources prefixed with someother.domain/ don't get
scheduled to a node until that node is exposing that resource (or if the
resource is ignored because of scheduler extender).
This commit brings the behavior of *kubernetes.io/ prefixed resources in
line with other extended resources and they will remain unscheduled
until some node exposes these resources.
This also includes renaming IsDefaultNamespaceResource() to
IsNativeResource().
Automatic merge from submit-queue (batch tested with PRs 60519, 61099, 61218, 61166, 61714). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Automatically add system critical priority classes at cluster boostrapping
**What this PR does / why we need it**:
We had two PriorityClasses that were hardcoded and special cased in our code base. These two priority classes never existed in API server. Priority admission controller had code to resolve these two names. This PR removes the hardcoded PriorityClasses and adds code to create these PriorityClasses automatically when API server starts.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#60178
ref/ #57471
**Special notes for your reviewer**:
**Release note**:
```release-note
Automatically add system critical priority classes at cluster boostrapping.
```
/sig scheduling
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Do not consider pods being deleted in the same namespace for spreading purposes for service anti-affinity priority similar to selectorspread priority.
**What this PR does / why we need it**:
Currently for service anti-affinity priority, pods being deleted in the same namespace are being considered in computation for spreading purposes, which should not happen. This PR aligns it with selectorspread priority, which also does spreading and does not consider pods being deleted in the same namespace.
@bsalamat @timothysc @kubernetes/sig-scheduling-bugs
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
None
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Remove YEAR field of all generated files and fix kubernetes boilerplate checker
**What this PR does / why we need it**:
Remove YEAR field of all generated files and fix kubernetes boilerplate checker
xref: [remove YEAR fileds in gengo #91](https://github.com/kubernetes/gengo/pull/91)
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes [#gengo/issues/24](https://github.com/kubernetes/gengo/issues/24)
**Special notes for your reviewer**:
/cc @thockin @lavalamp @sttts
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 57871, 61094, 60459, 61089, 61105). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add namespace when we describe pod
**What this PR does / why we need it**:
Add namespace when we describe pod
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 60919, 60953, 61085, 61083, 60971). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Sched cache resync
**What this PR does / why we need it**: Scheduler cache comparer
A debug tool that collects resources from api server and compares it
with the scheduler cache. It currently only compares the node list, but
it should be easy to extend. The compare is triggered by signal USER2,
by doing
kill -12 ${SCHED_PID}
The compare result goes to scheduler log.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Towards #60860
**Special notes for your reviewer**: @bsalamat
**Release note**:
```release-note
None
```
Automatic merge from submit-queue (batch tested with PRs 60898, 60912, 60753, 61002, 60796). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Change to fix scheduler extender error return message
**What this PR does / why we need it**:
As of now, scheduler always logs extender endpoint without verb like "filter", "prioritize" etc. With this change, we are including the verb as well while logging which helps in debugging
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 60898, 60912, 60753, 61002, 60796). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Revert revert of equivalence class hash calculation in scheduler
**What this PR does / why we need it**:
NOTE: This is a revert revert of https://github.com/kubernetes/kubernetes/pull/58555
But since the original PR has been changed, I have to copy the original changes and resend this new PR. See: https://github.com/kubernetes/kubernetes/pull/58555#issuecomment-364345972
And I kept @misterikkit 's change as the first commit (by co-author feature of github) in the history.
We decide to do revert revert because #58989 has been fixed, which should help to improve the time consumed by integration test.
**But** we should still pay attention to integration tests to see if there's frequent timeout happen.
**Special notes for your reviewer**:
**Release note**:
```release-note
Improve equivalence class hash calculation in scheduler
```
Automatic merge from submit-queue (batch tested with PRs 60759, 60531, 60923, 60851, 58717). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Implement preemption for extender with a verb and new interface
**What this PR does / why we need it**:
This is an alternative way of implementing #51656
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#51656
**Special notes for your reviewer**:
We will also want to compare with #56296 to see which one is the best solution. See: https://github.com/kubernetes/kubernetes/pull/56296#discussion_r163381235
cc @ravigadde @bsalamat
**Release note**:
```release-note
Implement preemption for extender with a verb and new interface
```
Automatic merge from submit-queue (batch tested with PRs 59740, 59728, 60080, 60086, 58714). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
more concise to merge the slice
**What this PR does / why we need it**:
more concise to merge the slice
**Special notes for your reviewer**:
Pods in scheduler cache contains both the scheduled pods and those not
scheduled yet in scheduling queue. This commit adds the second group of
pods into consideration while comparing the cache.
UID uniquely identifies pods across lifecycles, while namespace/name
could be 2 different pods across lifecycles. This could result in
tricky scheduler bugs.
Fixes#60966
A debug tool that collects resources from api server and compares it
with the scheduler cache. It currently only compares the node list, but
it should be easy to extend. The compare is triggered by signal USER2,
by doing
kill -12 ${SCHED_PID}
The compare result goes to scheduler log.
Towards #60860
Equivalence cache for CheckNodeConditionPred becomes invalid when
Node.Spec.Unschedulable changes. This can happen even if
Node.Status.Conditions does not change, so move the logic around.
This logic is covered by integration test
"test/integration/scheduler".TestUnschedulableNodes but equivalence
cache is currently skipped when test pods have no OwnerReference.
Add benchmark for equivalence hashing.
Change equivalence hash function.
This changes the equivalence class hashing function to use as inputs all
the Pod fields which are read by FitPredicates. Before we used a
combination of OwnerReference and PersistentVolumeClaim info, which was
a close approximation. The new method ensures that hashing remains
correct regardless of controller behavior.
The PVCSet field can be removed from equivalencePod because it is
implicitly included in the Volume list.
Tests are now broken.
Move equivalence class hash code.
This moves the equivalence hashing code from
algorithm/predicates/utils.go to core/equivalence_cache.go.
In the process, making the hashing function and hashing function factory
both injectable dependencies is removed.
Fix equivalence cache hash tests.
Co-authored-by: Jonathan Basseri <misterikkit@google.com>
Co-authored-by: Harry Zhang <resouer@gmail.com>
Add verb and preemption for scheduler extender
Update bazel
Use simple preemption in extender
Use node name instead of v1.Node
Fix support method
Fix preemption dup
Remove uneeded logics
Remove nodeInfo from param to extender
Update bazel for scheduler types
Mock extender cache with nodeInfo
Add nodeInfo as extender cache
Choose node name or node based on cache flag
Always return meta victims in result
Automatic merge from submit-queue (batch tested with PRs 60683, 60386). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Added unschedulabe predicate.
Signed-off-by: Da K. Ma <madaxa@cn.ibm.com>
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#60163
**Release note**:
```release-note
None
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Critical pod priorityClass addition
**What this PR does / why we need it**:
@bsalamat - Apologies for the delay. This PR is to ensure that all pods with priorityClassName `system-node-critical` and `system-cluster-critical` will be critical pods while preserving backwards compatibility.
**Special notes for your reviewer**:
- Moved some constants and other data structures to scheduler/api/types.go where other constants are present.
- An automatic assignment of critical priorities to pods based on critical pod annotation for backwards compatibility including some unit tests.
xref: https://github.com/kubernetes/kubernetes/issues/57471
**Release note**:
```release-note
Critical pods to use priorityClasses.
```
Automatic merge from submit-queue (batch tested with PRs 60106, 59510, 60263, 60063, 59088). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Reuse the `min*Nodes` slices in order to save GC time
**What this PR does / why we need it**:
Reuse the `min*Nodes` slices to save GC time when executing `pickOneNodeForPreemption`.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#59748
**Special notes for your reviewer**:
**Release note**:
```release-note
None
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Use consts as predicate key names in handlers
**What this PR does / why we need it**:
Per discussion in: https://github.com/kubernetes/kubernetes/pull/59335/files#r168351460
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#59951
**Special notes for your reviewer**:
**Release note**:
```release-note
Use consts as predicate name in handlers
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Improve scheduling queue's logic
**What this PR does / why we need it**:
Improves scheduling queue's code based on some recent comments on [the original PR](https://github.com/kubernetes/kubernetes/pull/55109).
This PR does not fix any bugs or make any change of behavior.
**Release note**:
```release-note
NONE
```
/sig scheduling
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Move volume scheduling and local storage to beta
**What this PR does / why we need it**:
* Move the feature gates and APIs for volume scheduling and local storage to beta
* Update tests to use the beta fields
@kubernetes/sig-storage-pr-reviews
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#59390
**Special notes for your reviewer**:
**Release note**:
```release-note
ACTION REQUIRED: VolumeScheduling and LocalPersistentVolume features are beta and enabled by default. The PersistentVolume NodeAffinity alpha annotation is deprecated and will be removed in a future release.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Taint node when it under PID pressure.
Signed-off-by: Da K. Ma <madaxa@cn.ibm.com>
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
part of #54313
**Release note**:
```release-note
If TaintNodesByCondition enabled, taint node when it under PID pressure
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Update bazelbuild/rules_go, kubernetes/repo-infra, and gazelle dependencies
**What this PR does / why we need it**: updates our bazelbuild/rules_go dependency in order to bump everything to go1.9.4. I'm separating this effort into two separate PRs, since updating rules_go requires a large cleanup, removing an attribute from most build rules.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
add node shutdown taint
**What this PR does / why we need it**: we need node stopped taint in order to detach volumes immediately without waiting timeout. More info in issue ticket #58635
**Which issue(s) this PR fixes**
Fixes#58635
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fixes volume predicate handler for equiv class
**What this PR does / why we need it**:
Per discussion in #58797 , we are missing some predicate handler in factory.go.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#58797
Ref #58222
**Special notes for your reviewer**:
Kindly ping @msau42
**Release note**:
```release-note
Fixes volume predicate handler for equiv class
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Format some import statements in scheduler pkg
**What this PR does / why we need it**:
As the title says, apply `goimports` on some files under `pkg/scheduler` pkg.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Improve performance of scheduling queue by adding a hash map to track all pods with a nominatedNodeName
**What this PR does / why we need it**:
Our investigations show that there is a performance regression in the new scheduling queue which is not enabled by default and is enabled only if "priority and preemption" which is an alpha feature is enabled. This PR is an important performance improvement for those who want to use priority and preemption in larger clusters.
The PR adds a hash table to track nominated Pods so that finding such Pods will be faster.
Other than improving performance, we don't expect this PR to change behavior of scheduler.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
ref/ #56032
ref/ #57471
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
/sig scheduling
Automatic merge from submit-queue (batch tested with PRs 59479, 59246). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add tests for schedulercache
**What this PR does / why we need it**:
Add tests for `node_info.go` under `schedulercache` package.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Avoid race condition when updating equivalence cache
**What this PR does / why we need it**:
Lock the ecache to update the ecache on each predicate running, to avoid race condition.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fix#58507
**Special notes for your reviewer**:
None
**Release note**:
```release-note
None
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Subtract local ephemeral storage resource from NodeInfo when removing pod
**What this PR does / why we need it**:
When we are removing pods, we need to subtract local ephemeral storage resource from NodeInfo
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
/kind bug
/sig storage
/sig scheduling
/assign @jingxu97 @bsalamat
fix function
fix gofmt
fix function return value
fix tests
skip notimplemented error
remove factory unused
in openstack we should try to find instanceid from all states instead of ACTIVE, all other cloudproviders do this already
fix tests and lint
fix gofmt
fix nodelifecycletest
fix lint errors
shutdowned -> stopped
use shutdown everywhere
use patch in taints api call
use notimplemented in clouds use AddOrUpdateTaintOnNode
correct log text
add fake cloud
try to fix bazel
add shutdown tests
add context
Automatic merge from submit-queue (batch tested with PRs 59010, 59212, 59281, 59014, 59297). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Replace nominateNodeName annotation with PodStatus.NominatedNodeName
**What this PR does / why we need it**:
Replaces nominateNodeName annotation with PodStatus.NominatedNodeName in scheudler's logic. We don't expect any logic/behavior changes.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
ref #57471
/sig scheduling
cc: @k82cn @aveshagarwal @resouer
Automatic merge from submit-queue (batch tested with PRs 58444, 59283, 59437, 59325, 59449). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix to register priority function ResourceLimitsPriority correctly.
**What this PR does / why we need it**:
This PR fixes registration of priority function ResourceLimitsPriority. Previously this function was being registered inside `init()`. Since this priority function ResourceLimitsPriority is behind feature gate `ResourceLimitsPriorityFunction` and if the feature is enabled, it was not visible in `init()` function. So now the registration of this priority function is moved inside `ApplyFeatureGates()` in scheduler where it can be correctly registered after the feature has been enabled.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
None
```
@kubernetes/sig-scheduling-pr-reviews @bsalamat @ravisantoshgudimetla
Automatic merge from submit-queue (batch tested with PRs 59394, 58769, 59423, 59363, 59245). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Ensure euqiv hash calculation is per schedule
**What this PR does / why we need it**:
Currently, equiv hash is calculated per schedule, but also, per node. This is a potential cause of dragging integration test, see #58881
We should ensure this only happens once during scheduling of specific pod no matter how many nodes we have.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#58989
**Special notes for your reviewer**:
**Release note**:
```release-note
Ensure euqiv hash calculation is per schedule
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Make predicate errors more human readable
**What this PR does / why we need it**:
Make predicate errors more human readable
Thanks.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
ref #58546
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
remove unused func in FakeConfigurator of scheduler
**What this PR does / why we need it**:
Current scheduler `Configurator` interface looks like this:
```
type Configurator interface {
GetPriorityFunctionConfigs(priorityKeys sets.String) ([]algorithm.PriorityConfig, error)
GetPriorityMetadataProducer() (algorithm.PriorityMetadataProducer, error)
GetPredicateMetadataProducer() (algorithm.PredicateMetadataProducer, error)
GetPredicates(predicateKeys sets.String) (map[string]algorithm.FitPredicate, error)
GetHardPodAffinitySymmetricWeight() int32
GetSchedulerName() string
MakeDefaultErrorFunc(backoff *util.PodBackoff, podQueue core.SchedulingQueue) func(pod *v1.Pod, err error)
// Needs to be exposed for things like integration tests where we want to make fake nodes.
GetNodeLister() corelisters.NodeLister
GetClient() clientset.Interface
GetScheduledPodLister() corelisters.PodLister
Create() (*Config, error)
CreateFromProvider(providerName string) (*Config, error)
CreateFromConfig(policy schedulerapi.Policy) (*Config, error)
CreateFromKeys(predicateKeys, priorityKeys sets.String, extenders []algorithm.SchedulerExtender) (*Config, error)
}
```
Funcs `ResponsibleForPod` and `Run` once existed have been removed, so the funcs in `FakeConfigurator` should be removed accordingly.
**Special notes for your reviewer**:
/kind cleanup
/sig scheduling
**Release note**:
```release-note
NONE
```
This moves the equivalence hashing code from
algorithm/predicates/utils.go to core/equivalence_cache.go.
In the process, making the hashing function and hashing function factory
both injectable dependencies is removed.
This changes the equivalence class hashing function to use as inputs all
the Pod fields which are read by FitPredicates. Before we used a
combination of OwnerReference and PersistentVolumeClaim info, which was
a close approximation. The new method ensures that hashing remains
correct regardless of controller behavior.
The PVCSet field can be removed from equivalencePod because it is
implicitly included in the Volume list.
Tests are now broken.
Equivalence cache for CheckNodeConditionPred becomes invalid when
Node.Spec.Unschedulable changes. This can happen even if
Node.Status.Conditions does not change, so move the logic around.
This logic is covered by integration test
"test/integration/scheduler".TestUnschedulableNodes but equivalence
cache is currently skipped when test pods have no OwnerReference.
Automatic merge from submit-queue (batch tested with PRs 58595, 58689). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Checked node.Unscheulable in Toleration predicate.
Signed-off-by: Da K. Ma <madaxa@cn.ibm.com>
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#58648
**Release note**:
```release-note
None
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fix the wrong err print of assumepod
**What this PR does / why we need it**:
I think the err print is wrong, just opposite the original meaning.
/cc @timothysc
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
-Add scheduler optimization options, short circuit all predicates if …
…one predicate fails
Signed-off-by: Wang Guoliang <iamwgliang@gmail.com>
**What this PR does / why we need it**:
Short circuit all predicates if one predicate fails.
I think we can add a switch to control it, maybe some scenes do not need to know all the causes of failure, but also can get a great performance improvement; if you need to fully understand the reasons for the failure, and accept the current performance requirements, can maintain the current logic. It should expose this switch to the user.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#56889 and #48186
**Special notes for your reviewer**:
@davidopp
**Release note**:
```
Allow scheduler set AlwaysCheckAllPredicates, short circuit all predicates if one predicate fails can greatly improve the scheduling performance.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Improved readability for messages being logged
**What this PR does / why we need it**:
This improves the readability for messages seen by end-user. /cc @jwforres @bsalamat - For UX
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#57152
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 57266, 58187, 58186, 46245, 56509). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Added metrics for predicate and priority evaluation
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#45972
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
- Remove string decode logic. It's not really helping to find the
conflict ports, and it's expensive to do encoding/decoding
- Not to parse the container ports information in predicate meta, use
straight []*v1.ContainerPort
- Use better data structure to search port conflict based on ip
addresses
- Collect scattered source code into common place
This moves plugin/pkg/scheduler to pkg/scheduler and
plugin/cmd/kube-scheduler to cmd/kube-scheduler.
Bulk of the work was done with gomvpkg, except for kube-scheduler main
package.
Balanced Resource Allocation policy can now be enabled to so that
host(s) with balanced resource usage would be preferred. The score
given by BRA also scales from 0 to 10 with 10 representing that the
resource usage is well balanced.
* Improper format specifier (e.g. %s for bools or %s for ints)
* More or less parameters than format specifiers
* Not calling a formatting function when it should have (e.g. Error() instead of Errorf())
Replaces the client public interface but leaves old references to "minions"
for a later refactor. Selects the path "nodes" for v1beta3 and "minions"
for older versions.
Also rename some to other names that make better reading. There are still a
bunch of "make" functions but they do things like assemble a string from parts
or build an array of things. It seemed that "make" there seemed fine. "New"
is for "constructors".