Commit Graph

69 Commits (70f923ed057de4078394314cd2c51992af1a5151)

Author SHA1 Message Date
Bobby (Babak) Salamat 70f923ed05 Avoid copying PriorityConfig struct while running priority functions 2018-12-04 11:28:25 -08:00
Bobby (Babak) Salamat f74b30868c Add plugin invocation for 'reserve' and 'prebind' plugins to the scheduler. 2018-11-30 16:03:47 -08:00
k8s-ci-robot 527d1c34cc
Merge pull request #70947 from Adirio/nodetree-thread-safety
Scheduler internal NodeTree thread-safe NumNodes
2018-11-29 07:36:48 -08:00
k8s-ci-robot 7e621ccb08
Merge pull request #71063 from Huang-Wei/nodeinfo-clone-panic
fix a scheduler panic due to internal cache inconsistency
2018-11-16 20:27:44 -08:00
Wei Huang b4fd11512a
ensure scheduler preemptor behaves in an efficient/correct path
- don't update nominatedMap cache when Pop() an element from activeQ
- instead, delete the nominated info from cache when it's "assumed"
- unit test behavior adjusted
- expose SchedulingQueue in factory.Config
2018-11-16 14:22:15 -08:00
Wei Huang a86ba8b3c4
fix a scheduler panic due to internal cache inconsistency 2018-11-16 13:02:13 -08:00
Adrián Orive c7cba7370f Scheduler internal NodeTree thread-safe NumNodes
Signed-off-by: Adrián Orive <adrian.orive.oneca@gmail.com>
2018-11-13 08:40:48 +01:00
Davanum Srinivas 954996e231
Move from glog to klog
- Move from the old github.com/golang/glog to k8s.io/klog
- klog as explicit InitFlags() so we add them as necessary
- we update the other repositories that we vendor that made a similar
change from glog to klog
  * github.com/kubernetes/repo-infra
  * k8s.io/gengo/
  * k8s.io/kube-openapi/
  * github.com/google/cadvisor
- Entirely remove all references to glog
- Fix some tests by explicit InitFlags in their init() methods

Change-Id: I92db545ff36fcec83afe98f550c9e630098b3135
2018-11-10 07:50:31 -05:00
Mike Danese 62c3ec969d Fix a race in the scheduler.
Loop over priorityConfigs seperately. The node loop can only safely
modify result[i][index]. Before this change it sometimes modified
result[i] concurrently with other loops.

Fixes: 7164967662

==================== Test output for //pkg/scheduler/core:go_default_test:
==================
WARNING: DATA RACE
Read at 0x00c0005e8ed0 by goroutine 22:
  k8s.io/kubernetes/pkg/scheduler/core.PrioritizeNodes.func2()
      pkg/scheduler/core/generic_scheduler.go:667 +0x2ea
  k8s.io/kubernetes/vendor/k8s.io/client-go/util/workqueue.ParallelizeUntil.func1()
      staging/src/k8s.io/client-go/util/workqueue/parallelizer.go:65 +0x9e

Previous write at 0x00c0005e8ed0 by goroutine 21:
  k8s.io/kubernetes/pkg/scheduler/core.PrioritizeNodes.func2()
      pkg/scheduler/core/generic_scheduler.go:668 +0x450
  k8s.io/kubernetes/vendor/k8s.io/client-go/util/workqueue.ParallelizeUntil.func1()
      staging/src/k8s.io/client-go/util/workqueue/parallelizer.go:65 +0x9e

Goroutine 22 (running) created at:
  k8s.io/kubernetes/vendor/k8s.io/client-go/util/workqueue.ParallelizeUntil()
      staging/src/k8s.io/client-go/util/workqueue/parallelizer.go:57 +0x1a3
  k8s.io/kubernetes/pkg/scheduler/core.PrioritizeNodes()
      pkg/scheduler/core/generic_scheduler.go:682 +0x592
  k8s.io/kubernetes/pkg/scheduler/core.(*genericScheduler).Schedule()
      pkg/scheduler/core/generic_scheduler.go:186 +0x77d
  k8s.io/kubernetes/pkg/scheduler/core.TestGenericScheduler.func1()
      pkg/scheduler/core/generic_scheduler_test.go:464 +0x91f
  testing.tRunner()
      GOROOT/src/testing/testing.go:827 +0x162

Goroutine 21 (running) created at:
  k8s.io/kubernetes/vendor/k8s.io/client-go/util/workqueue.ParallelizeUntil()
      staging/src/k8s.io/client-go/util/workqueue/parallelizer.go:57 +0x1a3
  k8s.io/kubernetes/pkg/scheduler/core.PrioritizeNodes()
      pkg/scheduler/core/generic_scheduler.go:682 +0x592
  k8s.io/kubernetes/pkg/scheduler/core.(*genericScheduler).Schedule()
      pkg/scheduler/core/generic_scheduler.go:186 +0x77d
  k8s.io/kubernetes/pkg/scheduler/core.TestGenericScheduler.func1()
      pkg/scheduler/core/generic_scheduler_test.go:464 +0x91f
  testing.tRunner()
      GOROOT/src/testing/testing.go:827 +0x162
==================
--- FAIL: TestGenericScheduler (0.01s)
    --- FAIL: TestGenericScheduler/test_6 (0.00s)
        testing.go:771: race detected during execution of test
    testing.go:771: race detected during execution of test
FAIL
2018-11-09 15:21:22 -08:00
Jun Gong 9fc369dd0d Add debug info: scheduler extenders's score and its name for each pod 2018-11-08 13:02:57 +08:00
k8s-ci-robot c0daab0e03
Merge pull request #70274 from zhangmingld/combinesimilercode
combine similar code where calucate schedule priority
2018-11-05 08:14:05 -08:00
zhangmingld 7164967662 combine similar code where calucate schedule priority 2018-10-31 08:59:53 +08:00
zhangmingld 429e67a12f duplicated glog.V(10) when had a if glog.V(10) 2018-10-29 11:30:16 +08:00
k8s-ci-robot c00f19bd15
Merge pull request #68403 from wgliang/master.deprecate-Parallelize
Replace Parallelize with function ParallelizeUntil and formally depre…
2018-10-06 09:40:07 -07:00
Guoliang Wang 187e2e01c9 Move scheduler cache interface and implementation to pkg/scheduler/internal/cache 2018-10-06 20:48:59 +08:00
Guoliang Wang c2622dd9d8 Replace Parallelize with function ParallelizeUntil and formally deprecate the Parallelize 2018-10-05 17:56:56 +08:00
Wei Huang 9da576f03c
move SchedulingQueue to pkg/scheduler/internal/queue 2018-09-28 11:51:02 -07:00
k8s-ci-robot a6bc5aa49e
Merge pull request #68563 from DylanBLE/dev
fix scheduler crash when Prioritize Map function failed
2018-09-26 22:59:04 -07:00
Bobby (Babak) Salamat f340f8baf8 Remove PDB and its event handlers from the scheduler cache 2018-09-26 14:22:21 -07:00
hongjian.sun f33c2c11f2 fix scheduler crash when Prioritize Map function failed 2018-09-26 20:16:05 +08:00
Yecheng Fu 2f46bc8a18 Use seqeuence number to represent generation of equivalence cache.
- snapshot equivalence cache generation numbers before snapshotting the
scheduler cache
- skip update when generation does not match live generation
- keep the node and increment its generation to invalidate it instead of
deletion
- use predicates order ID as key to improve performance
2018-09-22 12:08:21 +08:00
Guoliang Wang 6c63dcfffe Not split nodes when searching for nodes but doing it all at once 2018-09-04 14:07:24 +08:00
Bobby (Babak) Salamat abb70aee98 Add a scheduler config argument to set the percentage of nodes to score 2018-08-17 11:18:51 -07:00
Chao Wang 895b6d441d add space for output 2018-08-01 18:08:31 +08:00
Harry Zhang d644162a29 Extender preemption should respect IsInterested()
Co-authored-by: Harry Zhang <resouer@gmail.com>
Co-authored-by: Chun Chen <ramichen@tencent.com>
2018-07-23 10:13:38 +08:00
Harry Zhang e5a7a4caf7 Fist level ecache for nodeMap
Use new cache map in scheduler

Add a integration test

Move init before schedudling

Add lock for first level cache
2018-07-18 15:11:59 +08:00
tanshanshan 06fb64cdf8 fix glogformat 2018-07-14 10:22:12 +08:00
Kubernetes Submit Queue f0311d8232
Merge pull request #65396 from bsalamat/sched_no_sort
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Improve scheduler's performance by eliminating sorting of nodes by their score

**What this PR does / why we need it**:
Profiling scheduler, I noticed that scheduler spends a significant amount of time in sorting the nodes after we score them to find nodes with the highest score. Finding nodes with the highest score does not need sorting the array. This PR replaces the sort with a linear scan.

Eliminating the sort results in over 10% improvement in throughput of the scheduler.

Before (3 runs for 5000 nodes, scheduling 1000 pods in a cluster running 2000 pods):
BenchmarkScheduling/5000Nodes/2000Pods-12         	    1000	  20682552 ns/op
BenchmarkScheduling/5000Nodes/2000Pods-12         	    1000	  20464729 ns/op
BenchmarkScheduling/5000Nodes/2000Pods-12         	    1000	  21188906 ns/op

After:
BenchmarkScheduling/5000Nodes/2000Pods-12         	    1000	  18485866 ns/op
BenchmarkScheduling/5000Nodes/2000Pods-12         	    1000	  18457749 ns/op
BenchmarkScheduling/5000Nodes/2000Pods-12         	    1000	  18418200 ns/op

**Release note**:

```release-note
Improve scheduler's performance by eliminating sorting of nodes by their score.
```
2018-06-23 20:12:01 -07:00
Bobby (Babak) Salamat ffc8cc2f50 Improve scheduler's performance by eliminating sorting when finding the host with the highest score 2018-06-23 11:24:43 -07:00
Bobby (Babak) Salamat fab26e470c Add more unresolvable conditions to optimize preemption logic 2018-06-22 17:04:55 -07:00
Shyam Jeedigunta b9ae20c99e Split scheduler latency metric to fine-grained steps 2018-06-21 14:19:39 +02:00
Jonathan Basseri b571065bc4 Clean up names in equivalence package.
Remove stutter from names and provide more idiomatic patterns.

This makes call sites that use equivalence cache easier to read.
2018-06-20 10:52:33 -07:00
Jonathan Basseri 31c746d960 Move equivalence cache into new package.
This moves the equivalence cache implementation out of the 'core'
package and into k8s.io/kubernetes/pkg/scheduler/core/equivalence.

Separating the equiv. cache from the genericScheduler implementation
make their interaction points easier to follow, and prevents us from
accidentally accessing unexported fields.
2018-06-20 10:52:32 -07:00
Kubernetes Submit Queue dd040d6010
Merge pull request #63942 from misterikkit/ecache-cleanup
Automatic merge from submit-queue (batch tested with PRs 64142, 64426, 62910, 63942, 64548). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

scheduler: further cleanup of equivalence cache

**What this PR does / why we need it**:
This improves comments and simplifies some names/logic in equivalence_cache.go, as well as changing the order of some items in the file.


**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
/kind cleanup
2018-06-20 00:05:18 -07:00
Kubernetes Submit Queue 53d03c58cd
Merge pull request #64179 from wgliang/master.scheduler-unused-para
Automatic merge from submit-queue (batch tested with PRs 64252, 64307, 64163, 64378, 64179). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Remove unused parameter (pod) in `pkg/scheduler/core/generic_scheduler`

**What this PR does / why we need it**:

Remove unused parameter (pod) in `pkg/scheduler/core/generic_scheduler`

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2018-06-19 21:45:21 -07:00
Guoliang Wang 761cf41427 Move pkg/scheduler/schedulercache -> pkg/scheduler/cache 2018-05-31 22:55:34 +08:00
Jonathan Basseri 9b06870620 Clean up names and comments in equivalence cache. 2018-05-29 11:22:02 -07:00
Guoliang Wang 097094e5fa Remove unused parameter (pod) 2018-05-23 13:56:17 +08:00
Jonathan Basseri 55662f26f1 Simplify logic in podFitsOnNode.
Use new (*EquivalenceCache).RunPredicate to simplify how we read and
update the equivalence cache items.
2018-04-27 15:55:10 -07:00
Jonathan Basseri e67b3225a4 Remove predicateResults map from podFitsOnNode.
The purpose of this map is to combine two predicate results before
writing to the equivalence cache. However, the branch that combines
results is unreachable.

1. Combining results happens in the second iteration of the outer loop.
2. There is only a second iteration when podsAdded is true.
3. We skip equiv. cache when podsAdded is true.
2018-04-27 15:55:10 -07:00
Da K. Ma 2c10d15ae5 Do not schedule pod to the node under PID pressure.
Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>
2018-04-26 10:07:42 +08:00
Jonathan Basseri dacc1a8d52 Check for old NodeInfo when updating equiv. cache.
Because the scheduler takes a snapshot of cache data at the start of
each scheduling cycle, updates to the equivalence cache should be
skipped if there was a cache update during the cycle.

If the current NodeInfo becomes stale while we evaluate predicates, we
will not write any results into the equivalence cache. We will still use
the results for the current scheduling cycle, though.
2018-04-25 10:18:40 -07:00
Harry Zhang 4f0bd4121e Disable pod preemption by config 2018-04-12 21:11:51 -07:00
Harry Zhang 083684d771 Add test to verify preempt ignore 2018-04-04 16:28:15 -07:00
Harry Zhang 7f04129736 Add Ignorable flag to extender
Ignore extender in generic scheduler

Add test to verify the ignorable flag

Fix warning msg
2018-03-30 15:10:31 -07:00
Harry Zhang 202c6b68ee Use inclien func to ensure unlock is executed 2018-03-25 11:54:16 -07:00
Bobby (Babak) Salamat 4386751b5d
Revert "Revert revert of equivalence class hash calculation in scheduler" 2018-03-23 17:34:49 -07:00
Kubernetes Submit Queue 9a3b0bd74f
Merge pull request #60753 from resouer/equiv-hash
Automatic merge from submit-queue (batch tested with PRs 60898, 60912, 60753, 61002, 60796). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Revert revert of equivalence class hash calculation in scheduler

**What this PR does / why we need it**:

NOTE: This is a revert revert of https://github.com/kubernetes/kubernetes/pull/58555

But since the original PR has been changed, I have to copy the original changes and resend this new PR. See: https://github.com/kubernetes/kubernetes/pull/58555#issuecomment-364345972

And I kept @misterikkit 's change as the first commit (by co-author feature of github) in the history. 

We decide to do revert revert because #58989 has been fixed, which should help to improve the time consumed by integration test.

**But** we should still pay attention to integration tests to see if there's frequent timeout happen.

**Special notes for your reviewer**:

**Release note**:

```release-note
Improve equivalence class hash calculation in scheduler
```
2018-03-20 17:37:14 -07:00
Harry Zhang 5cc841a337 Use inline func to fix deadlock 2018-03-09 10:57:03 -08:00
Harry Zhang b62d82422d Fix golints in extender 2018-03-02 17:12:02 -08:00