k3s/pkg/scheduler
Kubernetes Submit Queue 3cdf5eecd7
Merge pull request #62211 from bsalamat/affinity_performance
Automatic merge from submit-queue (batch tested with PRs 62467, 62482, 62211). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Improve performance of affinity/anti-affinity predicate by 20x in large clusters

**What this PR does / why we need it**:
Improves performance of affinity/anti-affinity predicate by over 20x in large clusters. Performance improvement is smaller in small clusters, but it is still very significant and is about 4x. Also, before this PR, performance of the predicate was dropping quadratically with increasing size of nodes and pods. As the results shows, the slow down is now linear in larger clusters.

Affinity/anti-affinity predicate was checking all pods of the cluster for each node in the cluster to determine feasibility of affinit/anti-affinity terms of the pod being scheduled. This optimization first finds all the pods in a cluster that match the affinity/anti-affinity terms of the pod being scheduled once and stores the metadata. It then only checks the topology of the matching pods for each node in the cluster. 
 This results in major reduction of the search space per node and improves performance significantly. 

Below results are obtained by running scheduler benchmarks:
```
make test-integration WHAT=./test/integration/scheduler_perf KUBE_TEST_ARGS="-run=xxx -bench=.*BenchmarkSchedulingAntiAffinity"
```
```
AntiAffinity Topology: Hostname
before: BenchmarkSchedulingAntiAffinity/500Nodes/250Pods-12         	     	  37031638 ns/op
after:  BenchmarkSchedulingAntiAffinity/500Nodes/250Pods-12         	     	  10373222 ns/op

before: BenchmarkSchedulingAntiAffinity/500Nodes/5000Pods-12        	     	 134205302 ns/op
after:  BenchmarkSchedulingAntiAffinity/500Nodes/5000Pods-12        	     	  12000580 ns/op

befor: BenchmarkSchedulingAntiAffinity/1000Nodes/10000Pods-12         	     	 498439953 ns/op
after: BenchmarkSchedulingAntiAffinity/1000Nodes/10000Pods-12         	     	  24692552 ns/op


AntiAffinity Topology: Region
before: BenchmarkSchedulingAntiAffinity/500Nodes/250Pods-12         	     	  60003672 ns/op
after:  BenchmarkSchedulingAntiAffinity/500Nodes/250Pods-12         	     	  13346400 ns/op

before: BenchmarkSchedulingAntiAffinity/1000Nodes/10000Pods-12         	     	 600085491 ns/op
after: BenchmarkSchedulingAntiAffinity/1000Nodes/10000Pods-12         	     	  27783333 ns/op
```

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

ref/ #56032 #47318 #25319

**Release note**:

```release-note
improve performance of affinity/anti-affinity predicate of default scheduler significantly.
```

/sig scheduling
2018-04-13 07:25:21 -07:00
..
algorithm Merge pull request #62211 from bsalamat/affinity_performance 2018-04-13 07:25:21 -07:00
algorithmprovider Merge pull request #60398 from k82cn/k8s_60397 2018-04-04 15:06:19 -07:00
api Add Ignorable flag to extender 2018-03-30 15:10:31 -07:00
core Add test to verify preempt ignore 2018-04-04 16:28:15 -07:00
factory spec.SchedulerName should be spec.schedulerName in kube-scheduler help 2018-04-07 18:06:17 +08:00
metrics Fix golint errors in `pkg/scheduler` based on golint check 2018-02-08 15:22:47 +08:00
schedulercache Delete in-tree support for NVIDIA GPUs. 2018-04-02 20:17:01 -07:00
testing Scheduler cache comparer 2018-03-09 15:10:22 -08:00
util Autogenerated: hack/update-bazel.sh 2018-02-16 13:43:01 -08:00
volumebinder Use provided node object in volume binding predicate 2018-04-05 14:35:55 -07:00
BUILD Use pod UID as cache key instead of namespace/name 2018-03-13 10:25:37 -07:00
OWNERS Update OWNERS labels for cluster-lifecycle and scheduling 2018-04-05 16:25:04 -05:00
scheduler.go add one placeholder for err in scheduelr.go 2018-04-08 14:14:13 +08:00
scheduler_test.go Use pod UID as cache key instead of namespace/name 2018-03-13 10:25:37 -07:00
testutil.go remove unused func in FakeConfigurator of scheduler 2018-01-25 16:08:13 +08:00