mirror of https://github.com/k3s-io/k3s
![]() Automatic merge from submit-queue (batch tested with PRs 62467, 62482, 62211). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Improve performance of affinity/anti-affinity predicate by 20x in large clusters **What this PR does / why we need it**: Improves performance of affinity/anti-affinity predicate by over 20x in large clusters. Performance improvement is smaller in small clusters, but it is still very significant and is about 4x. Also, before this PR, performance of the predicate was dropping quadratically with increasing size of nodes and pods. As the results shows, the slow down is now linear in larger clusters. Affinity/anti-affinity predicate was checking all pods of the cluster for each node in the cluster to determine feasibility of affinit/anti-affinity terms of the pod being scheduled. This optimization first finds all the pods in a cluster that match the affinity/anti-affinity terms of the pod being scheduled once and stores the metadata. It then only checks the topology of the matching pods for each node in the cluster. This results in major reduction of the search space per node and improves performance significantly. Below results are obtained by running scheduler benchmarks: ``` make test-integration WHAT=./test/integration/scheduler_perf KUBE_TEST_ARGS="-run=xxx -bench=.*BenchmarkSchedulingAntiAffinity" ``` ``` AntiAffinity Topology: Hostname before: BenchmarkSchedulingAntiAffinity/500Nodes/250Pods-12 37031638 ns/op after: BenchmarkSchedulingAntiAffinity/500Nodes/250Pods-12 10373222 ns/op before: BenchmarkSchedulingAntiAffinity/500Nodes/5000Pods-12 134205302 ns/op after: BenchmarkSchedulingAntiAffinity/500Nodes/5000Pods-12 12000580 ns/op befor: BenchmarkSchedulingAntiAffinity/1000Nodes/10000Pods-12 498439953 ns/op after: BenchmarkSchedulingAntiAffinity/1000Nodes/10000Pods-12 24692552 ns/op AntiAffinity Topology: Region before: BenchmarkSchedulingAntiAffinity/500Nodes/250Pods-12 60003672 ns/op after: BenchmarkSchedulingAntiAffinity/500Nodes/250Pods-12 13346400 ns/op before: BenchmarkSchedulingAntiAffinity/1000Nodes/10000Pods-12 600085491 ns/op after: BenchmarkSchedulingAntiAffinity/1000Nodes/10000Pods-12 27783333 ns/op ``` **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: Fixes # ref/ #56032 #47318 #25319 **Release note**: ```release-note improve performance of affinity/anti-affinity predicate of default scheduler significantly. ``` /sig scheduling |
||
---|---|---|
.. | ||
algorithm | ||
algorithmprovider | ||
api | ||
core | ||
factory | ||
metrics | ||
schedulercache | ||
testing | ||
util | ||
volumebinder | ||
BUILD | ||
OWNERS | ||
scheduler.go | ||
scheduler_test.go | ||
testutil.go |