Commit Graph

684 Commits (750881c0ab62a9f59d53582bc33826caa167aa83)

Author SHA1 Message Date
Jordan Liggitt 2498ca7606 drop VerifyFeatureGatesUnchanged 2018-11-21 11:51:33 -05:00
Yecheng Fu 8fc00ebda6 Clear pod binding cache. 2018-11-21 11:24:53 +08:00
k8s-ci-robot 7e621ccb08
Merge pull request #71063 from Huang-Wei/nodeinfo-clone-panic
fix a scheduler panic due to internal cache inconsistency
2018-11-16 20:27:44 -08:00
k8s-ci-robot 1f3057b7fb
Merge pull request #70898 from Huang-Wei/preemption-issue
ensure scheduler preemptor behaves in an efficient/correct path
2018-11-16 20:27:35 -08:00
Wei Huang b4fd11512a
ensure scheduler preemptor behaves in an efficient/correct path
- don't update nominatedMap cache when Pop() an element from activeQ
- instead, delete the nominated info from cache when it's "assumed"
- unit test behavior adjusted
- expose SchedulingQueue in factory.Config
2018-11-16 14:22:15 -08:00
Wei Huang a86ba8b3c4
fix a scheduler panic due to internal cache inconsistency 2018-11-16 13:02:13 -08:00
Jordan Liggitt 733dd9dfd7 Add tests to ensure feature gate changes don't escape kubelet/scheduler packages 2018-11-16 10:52:53 -05:00
Jordan Liggitt de8bf9b63d fix scheduler and kubelet unit tests leaking feature flag changes 2018-11-16 10:52:53 -05:00
Jordan Liggitt 358c092abe fix storage unit tests leaking feature flag changes 2018-11-16 10:52:52 -05:00
PingWang 9e760732c3 Refactor New function
Signed-off-by: PingWang <wang.ping5@zte.com.cn>

add comments for InitPolicyFromFile

Signed-off-by: PingWang <wang.ping5@zte.com.cn>

make the methods package private

Signed-off-by: PingWang <wang.ping5@zte.com.cn>
2018-11-15 14:30:19 +08:00
Adrián Orive c7cba7370f Scheduler internal NodeTree thread-safe NumNodes
Signed-off-by: Adrián Orive <adrian.orive.oneca@gmail.com>
2018-11-13 08:40:48 +01:00
Davanum Srinivas 954996e231
Move from glog to klog
- Move from the old github.com/golang/glog to k8s.io/klog
- klog as explicit InitFlags() so we add them as necessary
- we update the other repositories that we vendor that made a similar
change from glog to klog
  * github.com/kubernetes/repo-infra
  * k8s.io/gengo/
  * k8s.io/kube-openapi/
  * github.com/google/cadvisor
- Entirely remove all references to glog
- Fix some tests by explicit InitFlags in their init() methods

Change-Id: I92db545ff36fcec83afe98f550c9e630098b3135
2018-11-10 07:50:31 -05:00
k8s-ci-robot b8fece50f5
Merge pull request #70892 from mikedanese/schedrace
Fix a race in the scheduler.
2018-11-09 23:01:15 -08:00
Mike Danese 62c3ec969d Fix a race in the scheduler.
Loop over priorityConfigs seperately. The node loop can only safely
modify result[i][index]. Before this change it sometimes modified
result[i] concurrently with other loops.

Fixes: 7164967662

==================== Test output for //pkg/scheduler/core:go_default_test:
==================
WARNING: DATA RACE
Read at 0x00c0005e8ed0 by goroutine 22:
  k8s.io/kubernetes/pkg/scheduler/core.PrioritizeNodes.func2()
      pkg/scheduler/core/generic_scheduler.go:667 +0x2ea
  k8s.io/kubernetes/vendor/k8s.io/client-go/util/workqueue.ParallelizeUntil.func1()
      staging/src/k8s.io/client-go/util/workqueue/parallelizer.go:65 +0x9e

Previous write at 0x00c0005e8ed0 by goroutine 21:
  k8s.io/kubernetes/pkg/scheduler/core.PrioritizeNodes.func2()
      pkg/scheduler/core/generic_scheduler.go:668 +0x450
  k8s.io/kubernetes/vendor/k8s.io/client-go/util/workqueue.ParallelizeUntil.func1()
      staging/src/k8s.io/client-go/util/workqueue/parallelizer.go:65 +0x9e

Goroutine 22 (running) created at:
  k8s.io/kubernetes/vendor/k8s.io/client-go/util/workqueue.ParallelizeUntil()
      staging/src/k8s.io/client-go/util/workqueue/parallelizer.go:57 +0x1a3
  k8s.io/kubernetes/pkg/scheduler/core.PrioritizeNodes()
      pkg/scheduler/core/generic_scheduler.go:682 +0x592
  k8s.io/kubernetes/pkg/scheduler/core.(*genericScheduler).Schedule()
      pkg/scheduler/core/generic_scheduler.go:186 +0x77d
  k8s.io/kubernetes/pkg/scheduler/core.TestGenericScheduler.func1()
      pkg/scheduler/core/generic_scheduler_test.go:464 +0x91f
  testing.tRunner()
      GOROOT/src/testing/testing.go:827 +0x162

Goroutine 21 (running) created at:
  k8s.io/kubernetes/vendor/k8s.io/client-go/util/workqueue.ParallelizeUntil()
      staging/src/k8s.io/client-go/util/workqueue/parallelizer.go:57 +0x1a3
  k8s.io/kubernetes/pkg/scheduler/core.PrioritizeNodes()
      pkg/scheduler/core/generic_scheduler.go:682 +0x592
  k8s.io/kubernetes/pkg/scheduler/core.(*genericScheduler).Schedule()
      pkg/scheduler/core/generic_scheduler.go:186 +0x77d
  k8s.io/kubernetes/pkg/scheduler/core.TestGenericScheduler.func1()
      pkg/scheduler/core/generic_scheduler_test.go:464 +0x91f
  testing.tRunner()
      GOROOT/src/testing/testing.go:827 +0x162
==================
--- FAIL: TestGenericScheduler (0.01s)
    --- FAIL: TestGenericScheduler/test_6 (0.00s)
        testing.go:771: race detected during execution of test
    testing.go:771: race detected during execution of test
FAIL
2018-11-09 15:21:22 -08:00
Babak "Bobby" Salamat a2c0958428
Revert "Hold mutex lock shorter when processing inter-pod affinity/anti-affin…" 2018-11-08 18:26:26 -08:00
k8s-ci-robot be800e623a
Merge pull request #69663 from sttts/sttts-scheduler-secure-serving
scheduler: enable secure port and authn/z
2018-11-08 17:36:14 -08:00
Dr. Stefan Schimanski d91feb6d18 kube-scheduler: move stopCh creation out of scheduler factory code
Enforces clean ownership of the channel.
2018-11-08 16:43:59 +01:00
tanshanshan cb95edafe8 kube-scheduler: enable secure ports 10259 2018-11-08 16:43:59 +01:00
k8s-ci-robot c5b453b717
Merge pull request #70783 from hex108/debug_extender
Add debug info: scheduler extenders's score and its name for each pod
2018-11-08 02:51:12 -08:00
Jun Gong 9fc369dd0d Add debug info: scheduler extenders's score and its name for each pod 2018-11-08 13:02:57 +08:00
Babak "Bobby" Salamat 2c8e73a16b
Revert "Activate unschedulable pods only if the node became more schedulable" 2018-11-07 16:57:47 -08:00
k8s-ci-robot ed06cbe3e3
Merge pull request #70500 from bsalamat/scheduler_debuger
Add a scheduler cache dumper for debugging purposes
2018-11-06 16:12:54 -08:00
Bobby (Babak) Salamat 48557a163a fixup! Add a scheduler cache dumper 2018-11-06 10:08:22 -08:00
k8s-ci-robot 7984a2bf60
Merge pull request #70564 from KevinWang15/master
Fix typos
2018-11-05 19:04:45 -08:00
Bobby (Babak) Salamat 4bb57c440e Autogenerated files 2018-11-05 13:31:51 -08:00
Bobby (Babak) Salamat 7ce3245ca9 Add a scheduler cache dumper 2018-11-05 13:31:51 -08:00
k8s-ci-robot c0daab0e03
Merge pull request #70274 from zhangmingld/combinesimilercode
combine similar code where calucate schedule priority
2018-11-05 08:14:05 -08:00
Ke Wang 946c701b05 Fix Typo: mataData -> metaData; masquared -> masquerade 2018-11-05 21:19:25 +08:00
k8s-ci-robot 774b18491f
Merge pull request #70605 from bsalamat/affinity_lock_opt
Hold mutex lock shorter when processing inter-pod affinity/anti-affin…
2018-11-04 11:59:05 -08:00
Bobby (Babak) Salamat aa8b5b431b Hold mutex lock shorter when processing inter-pod affinity/anti-affinity priority function 2018-11-02 20:58:07 -07:00
Bobby (Babak) Salamat 7a352b2b92 Do not allocate memory for pods that do not have inter-pod affinity/anti-affinity 2018-11-02 15:15:45 -07:00
k8s-ci-robot b53edbc695
Merge pull request #70348 from zhangmingld/unnecessaryglogv10
duplicated glog.V(10) when had a if glog.V(10)
2018-10-31 01:07:32 -07:00
zhangmingld 7164967662 combine similar code where calucate schedule priority 2018-10-31 08:59:53 +08:00
k8s-ci-robot fda41d14c4
Merge pull request #70366 from mlmhl/scheduler_optimization
Activate unschedulable pods only if the node became more schedulable
2018-10-30 04:57:12 -07:00
k8s-ci-robot 2f175c1b41
Merge pull request #70290 from tossmilestone/scheduler-test-refactor
Refactor scheduler_test.go to use Clientset
2018-10-29 22:07:52 -07:00
mlmhl c50f89dd43 activate unschedulable pods only if the node became more schedulable 2018-10-30 10:45:59 +08:00
zhangmingld 429e67a12f duplicated glog.V(10) when had a if glog.V(10) 2018-10-29 11:30:16 +08:00
He Xiaoxi 12634bf136 Refactor scheduler_test.go to use Clientset
Signed-off-by: He Xiaoxi <xxhe@alauda.io>
2018-10-29 09:51:07 +08:00
k8s-ci-robot 7b5705c619
Merge pull request #70203 from ravisantoshgudimetla/fix-e2e-resource-limits
Fix e2e resource limits
2018-10-26 19:13:41 -07:00
ravisantoshgudimetla fad6b326e3 Fix default algorithm provider priority insertion 2018-10-26 13:48:44 -04:00
zhangmingld cbfaf3856f fix typo in predicates_test.go 2018-10-26 09:59:40 +08:00
k8s-ci-robot 101d26c613
Merge pull request #59529 from wackxu/addmetricvol
Add metrics for volume scheduling operations
2018-10-23 13:52:29 -07:00
wackxu d5edcd3dc3 Add metrics to volume scheduling operations 2018-10-23 20:59:12 +08:00
k8s-ci-robot 060218a862
Merge pull request #69412 from tossmilestone/scheduler-factory-test
Refactor scheduler factory test
2018-10-15 13:52:41 -07:00
Bobby (Babak) Salamat 141b55abf5 Fix a bug in node tree when all nodes in a zone are removed 2018-10-12 21:39:38 -07:00
tanshanshan b7c7966b9f Move pkg/scheduler/algorithm/well_known_labels.go out 2018-10-13 09:10:00 +08:00
He Xiaoxi a96a390d92 Refactor scheduler factory test
Use `k8s.io/client-go/kubernetes/fake.Clientset` as the fake k8s client.

Signed-off-by: He Xiaoxi <xxhe@alauda.io>
2018-10-12 14:39:08 +08:00
k8s-ci-robot 94306c12f5
Merge pull request #69057 from denkensk/create-a-new-scheduler-constructor
create-a-new-scheduler-constructor
2018-10-11 13:45:02 -07:00
k8s-ci-robot 539bdbc355
Merge pull request #69495 from wgliang/feature/movenodoinfofunctions
[scheduler cleanup phase 1]: Move NodeInfo utils into pkg/scheduler/cache
2018-10-11 07:12:40 -07:00
Guoliang Wang a50404d441 [scheduler cleanup phase 1]: Move NodeInfo utils into pkg/scheduler/cache 2018-10-11 11:04:23 +08:00
Guoliang Wang f44f55cdd7 [scheduler cleanup phase 1]: Move CacheComparer to pkg/scheduler/internal/cache/comparer 2018-10-10 21:28:14 +08:00
wangqingcan 608911d5ac add test for new constructor 2018-10-10 17:15:10 +08:00
wangqingcan a74fd15e62 create a new scheduler constructor 2018-10-10 17:10:10 +08:00
k8s-ci-robot d3fe0ea7ff
Merge pull request #69318 from wgliang/feature/move-fakecache
[scheduler cleanup phase 1]: Move FakeCache to pkg/scheduler/internal…
2018-10-09 23:21:49 -07:00
Bobby (Babak) Salamat 5f2555e3ad Remove pod status.phase check from pod event handlers 2018-10-09 13:39:04 -07:00
k8s-ci-robot c00f19bd15
Merge pull request #68403 from wgliang/master.deprecate-Parallelize
Replace Parallelize with function ParallelizeUntil and formally depre…
2018-10-06 09:40:07 -07:00
Guoliang Wang 558e1960b8 [scheduler cleanup phase 1]: Move FakeCache to pkg/scheduler/internal/cache/fake 2018-10-06 23:58:53 +08:00
Guoliang Wang 187e2e01c9 Move scheduler cache interface and implementation to pkg/scheduler/internal/cache 2018-10-06 20:48:59 +08:00
Christoph Blecker 97b2992dc1
Update gofmt for go1.11 2018-10-05 12:59:38 -07:00
Guoliang Wang c2622dd9d8 Replace Parallelize with function ParallelizeUntil and formally deprecate the Parallelize 2018-10-05 17:56:56 +08:00
Bobby (Babak) Salamat 2d9d8c405d Add validation for percentage-of-nodes-to-score of the scheduler config 2018-10-02 17:09:47 -07:00
k8s-ci-robot 14949b302f
Merge pull request #68724 from krmayankk/sched1
scheduler: improve readability of bind function
2018-10-01 18:04:04 -07:00
k8s-ci-robot 5b658bb65e
Merge pull request #69136 from zhangmingld/deletecommentgetPodsMatchingAffinity
func getPodsMatchingAffinity no longer exists,dont need it in comment
2018-09-30 02:37:03 -07:00
Mayank Kumar 1ae12337db scheduler: add error handling for bind 2018-09-28 21:13:21 -07:00
Wei Huang 9da576f03c
move SchedulingQueue to pkg/scheduler/internal/queue 2018-09-28 11:51:02 -07:00
Wei Huang 2e7461c087
auto-generated files 2018-09-28 11:51:01 -07:00
k8s-ci-robot db1d1c8674
Merge pull request #68700 from Huang-Wei/schedulingQ-graceful-shutdown
shutdown schedulingQueue gracefully
2018-09-28 00:46:14 -07:00
Wei Huang be661fddb4
shutdown schedulingQueue gracefully
- add Close() to interface SchedulingQueue
- implement Close() for FIFO and PriorityQueue
- add unit test
2018-09-27 14:32:58 -07:00
Wei Huang 56fcfc5fee
cleanup duplicate assignment logic in scheduler 2018-09-27 11:38:05 -07:00
k8s-ci-robot 762ddf792d
Merge pull request #68984 from k82cn/k8s_68899_m
Keep backward compatibility for 'node.Spec.Unschedulable'.
2018-09-27 04:48:49 -07:00
zhangmingld 061e0dbb7e func getPodsMatchingAffinity no longer exists,dont need it in comment 2018-09-27 14:42:49 +08:00
k8s-ci-robot 823530d4db
Merge pull request #68917 from zhangmingld/fixwordserr
Fix some typo err
2018-09-26 22:59:32 -07:00
k8s-ci-robot a6bc5aa49e
Merge pull request #68563 from DylanBLE/dev
fix scheduler crash when Prioritize Map function failed
2018-09-26 22:59:04 -07:00
Bobby (Babak) Salamat c051f0d31a autogenerated files 2018-09-26 14:22:21 -07:00
Bobby (Babak) Salamat f340f8baf8 Remove PDB and its event handlers from the scheduler cache 2018-09-26 14:22:21 -07:00
hongjian.sun f33c2c11f2 fix scheduler crash when Prioritize Map function failed 2018-09-26 20:16:05 +08:00
k8s-ci-robot 28d86ac47d
Merge pull request #67308 from cofyc/fix67260
Use monotonically increasing generation to prevent equivalence cache race
2018-09-25 00:18:00 -07:00
Jonathan Basseri b0a8dbbc9d Add scheduler throughput metric.
This adds a counter to the scheduler that can be used to calculate
throughput and error ratio. Pods which fail to schedule are not counted
as errors, but can still be tracked separately from successes.

We already measure scheduler latency, but throughput was missing. This
should be considered a key metric for the scheduler.
2018-09-24 14:38:39 -07:00
Da K. Ma 78f6484e14 Keep backward compatibility for 'node.Spec.Unschedulable'.
Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>
2018-09-23 10:33:51 +08:00
Yecheng Fu b3f1e1200b Update notes to document why invalidation order is important. 2018-09-22 12:09:24 +08:00
Yecheng Fu 2f46bc8a18 Use seqeuence number to represent generation of equivalence cache.
- snapshot equivalence cache generation numbers before snapshotting the
scheduler cache
- skip update when generation does not match live generation
- keep the node and increment its generation to invalidate it instead of
deletion
- use predicates order ID as key to improve performance
2018-09-22 12:08:21 +08:00
Yecheng Fu a2cc1b1a20 Revert "Use sync.map to scale ecache better"
This reverts commit 17d0190706.
2018-09-22 11:33:06 +08:00
zhangmingld 6aaeb209eb fix some typo 2018-09-21 10:19:35 +08:00
Wei Huang 7490542156
fix PodAntiAffinity issues
- update logic of verifying incoming pod's anti-affinity
- rename podMatchesAffinityTermProperties to podMatchesAllAffinityTermProperties
- add podMatchesAnyAffinityTermProperties which is used in some PodAntiAffinity cases
- rename some functions to make it more readable
- add unit tests to verify correctness of PodAffinity and PodAntiAffinity
  - verifying "Existing pod anti-affinity"
  - verifying "incoming pod's anti-affinity"
  - verifying "incoming pod's affinity"
2018-09-13 18:32:33 -07:00
Kubernetes Submit Queue a6eb49f0dc
Merge pull request #68195 from luxas/consolidate_componentconfig_code_standards
Automatic merge from submit-queue (batch tested with PRs 67950, 68195). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.

Consolidate componentconfig code standards

**What this PR does / why we need it**:

This PR fixes a bunch of very small misalignments in ComponentConfig packages:
 - Add sane comments to all functions/variables in componentconfig `register.go` files
 - Make the `register.go` files of componentconfig pkgs follow the same pattern and not differ from each other like they do today.
 - Register the `openapi-gen` tag in all `doc.go` files where the pkg contains _external_ types.
 - Add the `groupName` tag where missing
 - Fix cases where `addKnownTypes` was registered twice in the `SchemeBuilder`
 - Add `Readme` and `OWNERS` files to `Godeps` directories if missing.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:


**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```

/assign @sttts @thockin
2018-09-07 11:19:40 -07:00
Lucas Käldström 83d53ea1c2
Standardize componentconfig code/comment patterns 2018-09-06 13:42:02 +03:00
Stephen Cuppett d85daf0f4c Resolves #59015, extends existing regex to cover t3, r5(d) & z1d instance types
From current AWS documentation:

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/volume_limits.html

T3, C5, C5d, M5, M5d, R5, R5d, and z1d instances support a maximum of
28 attachments, and every instance has at least one network interface
attachment. If you have no additional network interface attachments on
these instances, you could attach 27 EBS volumes.
2018-09-05 21:24:09 -04:00
Kubernetes Submit Queue ca43f007a3
Merge pull request #67731 from gnufied/fix-csi-attach-limit
Automatic merge from submit-queue (batch tested with PRs 68161, 68023, 67909, 67955, 67731). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.

Fix csi attach limit

Add support for volume limits for CSI.

xref: https://github.com/kubernetes/community/pull/2051

```release-note
Add support for volume attach limits for CSI volumes
```
2018-09-05 14:51:55 -07:00
Hemant Kumar fc61620db5 Fix compatibility tests for scheduler 2018-09-05 12:29:00 -04:00
Michelle Au e124159990 Add scheduler option for bind timeout 2018-09-04 17:25:23 -07:00
Michelle Au ce2dfac296 generated files 2018-09-04 16:47:43 -07:00
Michelle Au 01d83fa104 Scheduler changes to assume volume and pod together, and then bind
volume and pod asynchronously afterwards. This will also make it easier
to migrate to the scheduler framework.
2018-09-04 16:30:14 -07:00
Kubernetes Submit Queue 6b02edd369
Merge pull request #68196 from losipiuk/lo/revert-delte-predicate-funs
Automatic merge from submit-queue (batch tested with PRs 67555, 68196). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.

Add back predicate related accessors to scheduler.Configurator

```release-note
NONE
```
2018-09-04 11:41:37 -07:00
Kubernetes Submit Queue a0b457d0e5
Merge pull request #67555 from wgliang/opt/improve-performance
Automatic merge from submit-queue (batch tested with PRs 67555, 68196). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.

Not split nodes when searching for nodes but doing it all at once

**What this PR does / why we need it**:
Not split nodes when searching for nodes but doing it all at once.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:
@bsalamat 

This is a follow up PR of #66733.

https://github.com/kubernetes/kubernetes/pull/66733#discussion_r205932531

**Release note**:

```release-note
Not split nodes when searching for nodes but doing it all at once.
```
2018-09-04 11:41:34 -07:00
Guoliang Wang 6c63dcfffe Not split nodes when searching for nodes but doing it all at once 2018-09-04 14:07:24 +08:00
Kubernetes Submit Queue f3b98a08b0
Merge pull request #66799 from noqcks/master
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.

Add validation for kube-scheduler configuration options

**What this PR does / why we need it**: This adds validation to the kube-scheduler so that we're not accepting bogus values to the kube-scheduler. As requested by @bsalamat in issue https://github.com/kubernetes/kubernetes/issues/66743

**Which issue(s) this PR fixes**:
Fixes #66743

**Special notes for your reviewer**:
- Not sure if this validation is too heavy handed. Would love some feedback. 
- I started working on this before I realized @islinwb was also working on this same problem... https://github.com/kubernetes/kubernetes/pull/66787 I put this PR up anyways since I'm sure good code exists in both. I wasn't aware of the /assign command so didn't assign myself before starting work. 
- I didn't have time to work on adding validation to deprecated cli options. If the rest of this looks ok, I can finish that up.
- I hope the location of IsValidSocketAddr is correct. Lmk if it isn't. 

**Release note**:
```release-note
Adding validation to kube-scheduler at the API level
```
2018-09-03 17:17:49 -07:00
Kubernetes Submit Queue 7548764f96
Merge pull request #67788 from mohamed-mehany/inter-pod-affinity-optimization
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.

Affinity/Anti-Affinity Optimization of Pod Being Scheduled

**What this PR does / why we need it**:
Following #66948, it was noticed that the applied optimizations for anti-affinity rules lookup of existing pods could be further applied to checking affinity and anti-affinity terms of the Pod being scheduled. This is done by mapping topology pairs to pods that potentially match the pod being scheduled instead of mapping nodes to matching pods, and accordingly the search space is reduced.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #67738

**Special notes for your reviewer**:
/sig scheduling
/sig scalability

**Release note**:

```release-note
Improve performance of Pod affinity/anti-affinity in the scheduler
```
2018-09-03 11:40:09 -07:00
Łukasz Osipiuk fa39053412 Add back predicate related accessors to scheduler.Configurator 2018-09-03 19:12:01 +02:00
Kubernetes Submit Queue da25aaa39e
Merge pull request #68081 from silveryfu/image-locality-tests-new
Automatic merge from submit-queue (batch tested with PRs 63437, 68081). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.

Enable ImageLocalityPriority by default with integration tests

**What this PR does / why we need it**:

This PR is a follow-up to [#63842](https://github.com/kubernetes/kubernetes/issues/63842). It moves the ImageLocalityPriority function to default priority functions of the default algorithm provider and adds integration tests for the updated scheduling policy.

- Compared to [#64662](https://github.com/kubernetes/kubernetes/pull/64662), this PR does note provide e2e test due to concerns about a large image may add too much overhead to the testing infrastructure and pipeline. We should add e2e tests in the future with the use of large enough image(s) in following PRs. 

- Compared to [#64662](https://github.com/kubernetes/kubernetes/pull/64662), this PR simplifies the code changes and keeps code changes under test/integration/scheduler/.

- The PR contains a bug fix for [#65745](https://github.com/kubernetes/kubernetes/pull/65745) - caught by the integration test - where the image states are not properly cloned to the scheduler's cachedNodeInfoMap. We might split this fix into a separate PR.

The integration test covers what follows: a pod requiring a large image (~= 3GB) is submitted to the cluster and there is a single node in the cluster has the same large image; the pod should get scheduled to that node. We might also consider whether more scenarios are desired.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

Kindly ping @resouer and @bsalamat 

**Release note**:

```release-note
None
```
2018-09-01 16:58:33 -07:00
noqcks 0334a34e4a
Add validation for kube-scheduler
adding validation for componentconfig

adding validation to cmd kube-scheduler

Add support for ipv6 in IsValidSocketAddr function

updating copyright date in componentconfig/validation/validation.go

updating copyright date in componentconfig/validation/validation_test.go

adding validation for cli options

adding BUILD files

updating validate function to return []errors in cmd/kube-scheduler

ok, really returning []error this time

adding comments for exported componentconfig Validation functions

silly me, not checking structs along the way :'(

refactor to avoid else statement

moving policy nil check up one function

rejigging some deprecated cmd validations

stumbling my way around validation slowly but surely

updating according to review from @bsalamat

- not validating leader election config unless leader election is enabled
- leader election time values cannot be zero
- removing validation for KubeConfigFile
- removing validation for scheduler policy

leader elect options should be non-negative

adding test cases for renewDeadline and leaseDuration being zero

fixing logic in componentconfig validation 😅

removing KubeConfigFile reference from tests as it was removed in master

2ff9bd6699

removing bogus space after var assignment

adding more tests for componentconfig based on feedback

making updates to validation because types were moved on master

update bazel build

adding validation for staging/apimachinery

adding validation for staging/apiserver

adding fieldPaths for staging validations

moving staging validations out of componentconfig

updating test case scenario for staging/apimachinery

./hack/update-bazel.sh

moving kube-scheduler validations from componentconfig

./hack/update-bazel.sh

removing non-negative check for QPS

resourceLock required

adding HardPodAffinitySymmetricWeight 0-100 range to cmd flag help section
2018-08-31 22:29:19 -04:00
Klaus Ma 26142e614c Wait for Scheduler cache empty.
Signed-off-by: Klaus Ma <klaus1982.cn@gmail.com>
2018-08-31 19:42:10 +08:00
Silvery Fu 74fbbe8e52 Fix cloning image states from node info 2018-08-31 00:58:29 -07:00
Silvery Fu c06dcb2d6a Move image locality to default priority functions 2018-08-31 00:58:09 -07:00
Ahmad Diaa ee393b4e06 addressed reviewer comments 2018-08-30 01:48:55 +02:00
Ahmad Diaa ac4a082b33 use topologyPairsMaps for inter pod affinity/anti-affinity maps 2018-08-30 01:48:54 +02:00
Kubernetes Submit Queue f07d0955e9
Merge pull request #67642 from tizhou86/newUnitTest7
Automatic merge from submit-queue (batch tested with PRs 67766, 67642, 67772). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add unit test cases for scheduler/algorithm/predicates.

**What this PR does / why we need it**:
Add unit test cases for scheduler/algorithm/predicates for more code coverage.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
NONE

**Special notes for your reviewer**:
NONE

**Release note**:

```release-note
NONE
```
2018-08-27 06:14:14 -07:00
Kubernetes Submit Queue 459b537885
Merge pull request #67750 from tizhou86/newUnitTest9
Automatic merge from submit-queue (batch tested with PRs 66257, 67750). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add unit test cases for scheduler/util.

**What this PR does / why we need it**:
Add unit test cases for scheduler/util for more code coverage.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
NONE

**Special notes for your reviewer**:
NONE

**Release note**:

```release-note
NONE
```
2018-08-26 23:01:13 -07:00
tizhou86 3e0fdc28c9 Add unit test cases for scheduler/algorithm/predicates. 2018-08-27 10:57:04 +08:00
Kubernetes Submit Queue 66fa85c837
Merge pull request #67760 from houjun41544/20180823
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

 Complement unit test case TestNodesWherePreemptionMightHelp for scheduler/core

**What this PR does / why we need it**:

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note

```
2018-08-26 19:39:06 -07:00
Kubernetes Submit Queue 92f2319379
Merge pull request #67759 from tizhou86/newPR3
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix golint error under scheduler/factory.

**What this PR does / why we need it**:
Fix golint error under scheduler/factory.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
NONE

**Special notes for your reviewer**:
NONE

**Release note**:

```release-note
NONE
```
2018-08-26 00:06:26 -07:00
Kubernetes Submit Queue a697d71cb5
Merge pull request #66916 from dixudx/kubeadm_scheduler_api
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Moving KubeSchedulerConfiguration from ComponentConfig API types to staging repos

**What this PR does / why we need it**:

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes kubernetes/kubeadm#528

**Special notes for your reviewer**:
/cc luxas timothysc 
/cc @kubernetes/sig-cluster-lifecycle-pr-reviews 
**Release note**:

```release-note
Moving KubeSchedulerConfiguration from ComponentConfig API types to staging repos
```
2018-08-24 14:38:58 -07:00
Di Xu 7f8a59162b auto-generated 2018-08-24 10:58:09 +08:00
Di Xu 9f506da2d4 add new GVK kubescheduler.config.k8s.io/v1alpha1.KubeSchedulerConfiguration 2018-08-24 10:58:09 +08:00
houjun 08e5f4573a Complement unit test case TestNodesWherePreemptionMightHelp for scheduler/core 2018-08-23 18:54:23 +08:00
tizhou86 2a2c0c37d4 Fix golint error under scheduler/factory. 2018-08-23 17:24:32 +08:00
tizhou86 6fbc7d51be Fix golint error under scheduler/algorithm/priorities. 2018-08-23 17:20:55 +08:00
tizhou86 d971778844 Add unit test cases for scheduler/util. 2018-08-23 15:26:27 +08:00
Hemant Kumar 8e4b33d1a8 Move volume limit feature to beta 2018-08-22 19:36:01 -04:00
Hemant Kumar 4b17a48def Implement support for updating volume limits
Create a new predicate to count CSI volumes
2018-08-22 19:36:00 -04:00
Kubernetes Submit Queue 4cca6a89a0
Merge pull request #66862 from resouer/sync-map
Automatic merge from submit-queue (batch tested with PRs 66862, 67618). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Use sync.map to scale equiv class cache better

**What this PR does / why we need it**:

Change the current lock in first level ecache into  `sync.Map`, which is known for scaling better than `sync. Mutex ` on machines with >8 CPUs

ref: https://golang.org/pkg/sync/#Map
 
And the code is much cleaner in this way.

5k Nodes, 10k Pods benchmark with ecache enabled in 64 cores VM:

```bash
// before
BenchmarkScheduling/5000Nodes/0Pods-64             10000          17550089 ns/op

// after
BenchmarkScheduling/5000Nodes/0Pods-64             10000          16975098 ns/op
```
Comparing to current implementation, the improvement after this change is noticeable, and the test is stable in 8, 16, 64 cores VM.

**Special notes for your reviewer**:

**Release note**:

```release-note
Use sync.map to scale ecache better
```
2018-08-21 00:24:01 -07:00
Kubernetes Submit Queue ef388fee53
Merge pull request #66948 from mohamed-mehany/anti-affinity-optimization
Automatic merge from submit-queue (batch tested with PRs 67041, 66948). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Anti affinity optimization

**What this PR does / why we need it**:
This pull request aims to optimize the performance of anti-affinity rules lookup of existing pods
This optimization maps the topology values to a list of pods running on nodes that match this value and store that map in the pod metadata. Accordingly, when validating anti-affinity rules of existing pods we will only check those running on nodes with similar topology values to the current candidate (node) for scheduling.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #63937

**Special notes for your reviewer**:
/sig scalability
/sig scheduling
**Release note**:

```release-note
improve performance of anti-affinity predicate of default scheduler.
```
2018-08-17 19:14:08 -07:00
Ahmad Diaa b4c7d190cd using set instead of lists for topologyPairsMaps attributes 2018-08-18 01:02:48 +02:00
Ahmad Diaa 0f4c3064fd created struct for topologyPairs maps 2018-08-18 01:02:48 +02:00
Ahmad Diaa f6659e4543 further enhancements removing matchingTerms from metadata 2018-08-18 01:02:47 +02:00
Mohamed Mehany 3fb6912d08 add topologyValue map to reduce search space 2018-08-18 01:02:46 +02:00
Bobby (Babak) Salamat 2860743c86 Autogenerated files 2018-08-17 11:18:52 -07:00
Bobby (Babak) Salamat abb70aee98 Add a scheduler config argument to set the percentage of nodes to score 2018-08-17 11:18:51 -07:00
Bobby (Babak) Salamat a5045d107e Add NodeTree to the scheduler cache 2018-08-17 09:56:51 -07:00
Bobby (Babak) Salamat c1896c97ea Add a node tree that allows iterating over nodes in regions and zones 2018-08-17 09:56:51 -07:00
Kubernetes Submit Queue eeb3389f3b
Merge pull request #63260 from misterikkit/ecache-metrics
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

scheduler: add metrics to equivalence cache

This adds counters to equiv. cache reads & writes. Reads are labeled by
hit/miss, while writes are labeled to indicate whether the write was
discarded.

This will give us visibility into,
- hit rate of cache reads
- ratio of reads to writes
- rate of discarded writes



**What this PR does / why we need it**:

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/kubernetes/issues/63259

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2018-08-17 01:10:51 -07:00
Kubernetes Submit Queue 825548df95
Merge pull request #67464 from misterikkit/deadcode
Automatic merge from submit-queue (batch tested with PRs 67461, 67464, 67416). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Delete dead code in pkg/scheduler

**What this PR does / why we need it**:
This is just some cleanup. I found some unused code while evaluating the scheduler code.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
/kind cleanup
/sig scheduling
2018-08-15 20:09:09 -07:00
Jonathan Basseri fbf3d2b84c Delete dead code in pkg/scheduler.
This deletes some unused functions from the `Configurator` interface.
2018-08-15 17:14:38 -07:00
Jonathan Basseri a77e3bd16b Delete dead code.
This removes a fake Cache implementation that is not used anywhere
(anymore).
2018-08-15 17:14:37 -07:00
Jonathan Basseri b874d2789b Add metrics to equivalence cache.
This adds counters to equiv. cache reads & writes. Reads are labeled by
hit/miss, while writes are labeled to indicate whether the write was
discarded.

This will give us visibility into,
- hit rate of cache reads
- ratio of reads to writes
- rate of discarded writes
2018-08-15 15:51:13 -07:00
Wei Huang 976797c0b8
fix an issue in NodeInfo.Clone()
- usedPorts is a map-in-map struct, add fix to ensure it's deep copied
- updated unit test
2018-08-15 13:31:16 -07:00
Kubernetes Submit Queue d7634dcf23
Merge pull request #66856 from charrywanganthony/scheduler_space
Automatic merge from submit-queue (batch tested with PRs 66491, 66587, 66856, 66657, 66923). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

add space for output

**Release note**:
```release-note
NONE
```
2018-08-14 17:55:11 -07:00
Kubernetes Submit Queue 6274590518
Merge pull request #66656 from wackxu/fixappversion
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

 use apps/v1 version for scheduler

/kind cleanup

**Release note**:

```release-note
NONE
```
2018-08-11 23:25:33 -07:00
Avesh Agarwal be741feb1a Ouput volumes (total capacity and requests) too along with cpu and memory
when the feature BalanceAttachedNodeVolumes is used.
2018-08-07 15:40:33 -04:00
Avesh Agarwal ea7f711ae2 Fix incorrect reporting of total request including current pod in the
resource allocation priority function.
2018-08-07 15:37:55 -04:00
Harry Zhang 17d0190706 Use sync.map to scale ecache better 2018-08-07 14:06:09 +08:00
Chao Wang 895b6d441d add space for output 2018-08-01 18:08:31 +08:00
Kubernetes Submit Queue f4d8220df5
Merge pull request #65616 from cofyc/fix56163
Automatic merge from submit-queue (batch tested with PRs 65570, 65616). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Retry scheduling on StorageClass events

**What this PR does / why we need it**:

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #56163

**Special notes for your reviewer**:

I have taken over #60006.
It's hard to test in e2e, because we cannot know reschedule of pod is triggered by which event (periodically service/node events will move pods to active queue too). ~~I'll add integration tests for this functionality after [this PR](https://github.com/kubernetes/kubernetes/pull/65296) get merged.~~ (already added)

**Release note**:

```release-note
NONE
```
2018-07-31 19:18:00 -07:00
Kubernetes Submit Queue 0e9b1dd20f
Merge pull request #66671 from hanxiaoshuai/cleanup07261
Automatic merge from submit-queue (batch tested with PRs 63955, 66685, 66671). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

remove unused code in pkg/scheduler/algorithm/scheduler_interface_test.go

**What this PR does / why we need it**:
remove unused code in pkg/scheduler/algorithm/scheduler_interface_test.go
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2018-07-26 21:05:11 -07:00
Kubernetes Submit Queue fea4ad2783
Merge pull request #66670 from foxyriver/fix-log
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

fix error log

**What this PR does / why we need it**:

fix error log



**Release note**:

```release-note
NONE
```
2018-07-26 19:43:19 -07:00
Mayank Kumar a5b6d805ea Use GetControllerOf from apimachinery and remove kubernetes copy 2018-07-26 12:20:35 -07:00
hangaoshuai f3fb9e0f33 remove unused code in pkg/scheduler/algorithm/scheduler_interface_test.go 2018-07-26 21:01:50 +08:00
foxyriver 3b4f250c4a fix error log 2018-07-26 19:48:48 +08:00
wackxu ab35fa0414 update bazel 2018-07-26 17:37:29 +08:00
xushiwei 00425595 fed8572745 use apps/v1 version for scheduler 2018-07-26 17:37:29 +08:00
Kubernetes Submit Queue e4465b6e2f
Merge pull request #66599 from cofyc/fixfeaturegate
Automatic merge from submit-queue (batch tested with PRs 66540, 66599). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Invalidate CheckVolumeBinding predicate only when VolumeScheduling feature is enabled

**What this PR does / why we need it**:

Invalidate CheckVolumeBinding predicate only when VolumeScheduling feature is enabled.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2018-07-26 01:55:17 -07:00
Kubernetes Submit Queue 84a15d0291
Merge pull request #66540 from hanxiaoshuai/fixut0724
Automatic merge from submit-queue (batch tested with PRs 66540, 66599). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

replace predicates string with corresponding const in TestDefaultPredicates

**What this PR does / why we need it**:
replace predicates string with corresponding const in TestDefaultPredicates. Unify with the const in func defaultPredicates().
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2018-07-26 01:55:14 -07:00
Bobby (Babak) Salamat be55371ff2 minor cleanup of selector_spreading priority function 2018-07-25 13:43:37 -07:00
Yecheng Fu d2fc875489 Invalidate CheckVolumeBinding predicate only when VolumeScheduling
feature is enabled.
2018-07-25 15:11:23 +08:00
Kubernetes Submit Queue 4dbcf32b3c
Merge pull request #66471 from islinwb/improve_TestZeroRequest
Automatic merge from submit-queue (batch tested with PRs 66291, 66471, 66499). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Improve unit test TestZeroRequest

**What this PR does / why we need it**:

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #66468

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2018-07-24 13:59:58 -07:00
Kubernetes Submit Queue 2119d349b0
Merge pull request #66291 from resouer/fix-extender
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Extender preemption should respect IsInterested()

**What this PR does / why we need it**:

Extender preemption should respect IsInterested()

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #66289 

**Special notes for your reviewer**:

The bug is reported and the first commit is co-authored by: @chenchun

**Release note**:

```release-note
Extender preemption should respect IsInterested()
```
2018-07-24 13:48:38 -07:00
hangaoshuai 2c59a683a2 replace predicates string with corresponding const in TestDefaultPredicates 2018-07-24 14:27:36 +08:00
Weibin Lin 972e78748a add pod UID 2018-07-23 10:44:31 +08:00
Harry Zhang d644162a29 Extender preemption should respect IsInterested()
Co-authored-by: Harry Zhang <resouer@gmail.com>
Co-authored-by: Chun Chen <ramichen@tencent.com>
2018-07-23 10:13:38 +08:00
Weibin Lin 5449d153bb Improve unit test TestZeroRequest 2018-07-23 09:15:19 +08:00
Kubernetes Submit Queue 4797c8df8f
Merge pull request #63665 from xchapter7x/pkg-scheduler-core
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

use subtest for table units (pkg/scheduler/core)

**What this PR does / why we need it**: Update scheduler's unit table tests to use subtest

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:

**Special notes for your reviewer**:
breaks up PR: https://github.com/kubernetes/kubernetes/pull/63281
/ref #63267

**Release note**:

```release-note
This PR will leverage subtests on the existing table tests for the scheduler units.
Some refactoring of error/status messages and functions to align with new approach.

```
2018-07-21 01:52:30 -07:00
Kubernetes Submit Queue 827aa934ac
Merge pull request #66397 from gnufied/fix-default-max-volume-ebs
Automatic merge from submit-queue (batch tested with PRs 66410, 66398, 66061, 66397, 65558). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix volume limit for EBS on m5 and c5 instances

This is a fix for lower volume limits on m5 and c5 instance types while we wait for https://github.com/kubernetes/features/issues/554 to land GA.

This problem became urgent because many of our users are trying to migrate to those instance types in light of spectre/meltdown vulnerability but  lower volume limit on those instance types often causes cluster instability. Yes they can workaround by configuring the scheduler with lower limit but often this becomes somewhat difficult to do when cluster is mixed. 

The newer default limits were picked from https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/volume_limits.html

Text about spectre/meltdown is available on - https://community.bitnami.com/t/spectre-variant-2/54961/5

/sig storage
/sig scheduling

```release-note
Fix volume limit for EBS on m5 and c5 instance types
```
2018-07-20 18:51:11 -07:00
John Calabrese ad234e58be use subtest for table units
remove duplicate testname from error msg

remove subtest for test setup loop

do not break on test failure

  https://github.com/kubernetes/kubernetes/pull/63665#discussion_r203571355

remove duplicate test.name in output

  https://github.com/kubernetes/kubernetes/pull/63665#discussion_r203574001
  https://github.com/kubernetes/kubernetes/pull/63665#discussion_r203574012
2018-07-20 16:02:50 -04:00
Yecheng Fu 8f0373792f Retry scheduling on various events. 2018-07-20 09:54:34 +08:00
Kubernetes Submit Queue 795b7da8b0
Merge pull request #65714 from resouer/fix-63784
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Re-design equivalence class cache to two level cache

**What this PR does / why we need it**:

The current ecache introduced a global lock across all the nodes, and this patch tried to assign ecache per node to eliminate that global lock. The improvement of scheduling performance and throughput are both significant.

**CPU Profile Result** 

Machine: 32-core 60GB GCE VM

1k nodes 10k pods bench test (we've highlighted the critical function):

1. Current default scheduler with ecache enabled:
![equivlance class cache bench test 001](https://user-images.githubusercontent.com/1701782/42196992-51b0a32a-7eb3-11e8-89ee-f13383091a00.jpeg)
2. Current default scheduler with ecache disabled:
![equivlance class cache bench test 002](https://user-images.githubusercontent.com/1701782/42196993-51eb0c68-7eb3-11e8-9326-1a7762072863.jpeg)
3. Current default scheduler with this patch and ecache enabled:
![equivlance class cache bench test 003](https://user-images.githubusercontent.com/1701782/42196994-52280ed8-7eb3-11e8-8100-690e2af2cf2f.jpeg)

**Throughput Test Result** 

1k nodes 3k pods `scheduler_perf` test: 

Current default scheduler, ecache is disabled:
```bash
Minimal observed throughput for 3k pod test: 200
PASS
ok      k8s.io/kubernetes/test/integration/scheduler_perf    30.091s
```
With this patch, ecache is enabled:
```bash
Minimal observed throughput for 3k pod test: 556
PASS
ok      k8s.io/kubernetes/test/integration/scheduler_perf    11.119s
```

**Design and implementation:**

The idea is: we re-designed ecache into a "two level cache". 

The first level cache holds the global lock across nodes and sync is needed only when node is added or deleted, which is of much lower frequency. 

The second level cache is assigned per node and its lock is restricted to per node level, thus there's no need to bother the global lock during whole predicate process cycle. For more detail, please check [the original discussion](https://github.com/kubernetes/kubernetes/issues/63784#issuecomment-399848349).

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #63784

**Special notes for your reviewer**:

~~Tagged as WIP to make sure this does not break existing code and tests, we can start review after CI is happy.~~

**Release note**:

```release-note
Re-design equivalence class cache to two level cache
```
2018-07-19 16:16:02 -07:00
Hemant Kumar 45b8107378 Fix volume limit for EBS on m5 and c5 instances 2018-07-19 16:27:52 -04:00
Kubernetes Submit Queue 357decc9db
Merge pull request #63666 from xchapter7x/pkg-scheduler-factory
Automatic merge from submit-queue (batch tested with PRs 58487, 63666). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

use subtest for table units (pkg/scheduler/factory)

**What this PR does / why we need it**: Update scheduler's unit table tests to use subtest

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:

**Special notes for your reviewer**:
breaks up PR: https://github.com/kubernetes/kubernetes/pull/63281
/ref #63267

**Release note**:

```release-note
This PR will leverage subtests on the existing table tests for the scheduler units.
Some refactoring of error/status messages and functions to align with new approach.

```
2018-07-19 02:09:06 -07:00
Harry Zhang e5a7a4caf7 Fist level ecache for nodeMap
Use new cache map in scheduler

Add a integration test

Move init before schedudling

Add lock for first level cache
2018-07-18 15:11:59 +08:00
Harry Zhang 17977478e7 RWLock for cache 2018-07-18 15:11:59 +08:00
Nikhita Raghunath c166743272 scheduler: fix panic while removing node from imageStates cache 2018-07-16 11:42:28 +05:30
tanshanshan 06fb64cdf8 fix glogformat 2018-07-14 10:22:12 +08:00
Kubernetes Submit Queue b883f4cff8
Merge pull request #65745 from silveryfu/image-locality-scoring
Automatic merge from submit-queue (batch tested with PRs 66011, 66111, 66106, 66039, 65745). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Enable adaptive scoring in ImageLocalityPriority

**What this PR does / why we need it**:

This PR replaces the original, pure image-size based scoring to an adaptive scoring scheme. The new scoring scheme considers not only the image size but also its `"spread" `- the definition of `"spread"` is described in what follows: 

> Given an image`i`, `spread_i = num_node_has_i / total_num_nodes`  

And the image receives the score: `score_i = size_i * spread_i`, as proposed by @resouer. The final node score is the summation of image scores for all images found existing on the node that are mentioned in the pod spec.

The goal of this heuristic is to better _balance image locality with other scheduling policies_. In particular, it aims to mitigate and prevent the undesirable "node heating problem", _i.e._, pods get assigned to the same or a few nodes due to preferred image locality. Given an image, the larger `spread` it has the more image locality we can consider for it - since we can expect more nodes having this image.

The new image state information in scheduler cache, enabled in this PR, allows other potential heuristics to be explored.

**Special notes for your reviewer**:

@resouer 

Additional unit tests are WIP. 

**Release note**:

```release-note
NONE
```
2018-07-12 17:57:16 -07:00
Silvery Fu 2003a0db97 Rework image locality with spread-based scoring 2018-07-11 23:58:23 -07:00
Silvery Fu c3f111f74a Add image states to scheduler cache 2018-07-11 23:58:02 -07:00
Silvery Fu 05293233cf Update generated bazel 2018-07-11 23:57:34 -07:00
Yecheng Fu b841b15e27 Invalidate CheckVolumeBinding predicate cache on PV update. 2018-07-12 14:55:30 +08:00
Kubernetes Submit Queue f2db955b9d
Merge pull request #64363 from idealhack/sub-benchmarks/scheduler/schedulercache
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

scheduler: update tests to use sub-benchmarks (pkg/scheduler/cache)

**What this PR does / why we need it**:

Go 1.7 added the subtest feature which can make table-driven tests much easier to run and debug. Some tests are not using this feature.

Further reading: [Using Subtests and Sub-benchmarks](https://blog.golang.org/subtests)

/kind cleanup

**Release note**:

```release-note
NONE
```
2018-07-01 19:04:19 -07:00
Yang Li d7e12ce453 scheduler: update tests to use sub-benchmarks (pkg/scheduler/cache) 2018-07-01 00:51:42 +08:00
Kubernetes Submit Queue ea3451f83e
Merge pull request #65188 from aveshagarwal/master-rhbz-1555057
Automatic merge from submit-queue (batch tested with PRs 65188, 65541, 65534). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Increase glog level of some scheduling errors.

In our production environments, we are noticing that for every scheduling error, we are logging 3 errors at following lines:

1. https://github.com/kubernetes/kubernetes/blob/master/pkg/scheduler/scheduler.go#L194

2. https://github.com/kubernetes/kubernetes/blob/master/pkg/scheduler/factory/factory.go#L1416

3. https://github.com/kubernetes/kubernetes/blob/master/pkg/scheduler/factory/factory.go#L1323

This PR increases log levels of the last 2 errors to V(3).Infof. We can discuss if it would be helpful to increase the log level of the first error too.

@kubernetes/sig-scheduling-pr-reviews 
@bsalamat @k82cn @liggitt @sjenning 

```release-note
None.
```
2018-06-29 21:42:07 -07:00
Dr. Stefan Schimanski f8de7cea40 Update generated files 2018-06-29 20:36:17 +02:00
Avesh Agarwal c0cffb8a34 Increase glog level of some scheduling errors. 2018-06-28 23:34:29 -04:00
Kubernetes Submit Queue a13fe4d15d
Merge pull request #65424 from liggitt/scheduler-config
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix scheduler config decoding

Fixes #65413

Implements a custom unmarshaler for a single scheduler config type which did not correctly specify JSON tags until https://github.com/kubernetes/kubernetes/issues/65414 is resolved

Adds missing compatibility tests for scheduler extenders back to 1.7

```release-note
Fixes incompatibility with custom scheduler extender configurations specifying `bindVerb`
```
2018-06-25 00:21:35 -07:00
Jordan Liggitt fcaaf59359
Fix scheduler config decoding 2018-06-24 23:28:56 -04:00
Kubernetes Submit Queue f0311d8232
Merge pull request #65396 from bsalamat/sched_no_sort
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Improve scheduler's performance by eliminating sorting of nodes by their score

**What this PR does / why we need it**:
Profiling scheduler, I noticed that scheduler spends a significant amount of time in sorting the nodes after we score them to find nodes with the highest score. Finding nodes with the highest score does not need sorting the array. This PR replaces the sort with a linear scan.

Eliminating the sort results in over 10% improvement in throughput of the scheduler.

Before (3 runs for 5000 nodes, scheduling 1000 pods in a cluster running 2000 pods):
BenchmarkScheduling/5000Nodes/2000Pods-12         	    1000	  20682552 ns/op
BenchmarkScheduling/5000Nodes/2000Pods-12         	    1000	  20464729 ns/op
BenchmarkScheduling/5000Nodes/2000Pods-12         	    1000	  21188906 ns/op

After:
BenchmarkScheduling/5000Nodes/2000Pods-12         	    1000	  18485866 ns/op
BenchmarkScheduling/5000Nodes/2000Pods-12         	    1000	  18457749 ns/op
BenchmarkScheduling/5000Nodes/2000Pods-12         	    1000	  18418200 ns/op

**Release note**:

```release-note
Improve scheduler's performance by eliminating sorting of nodes by their score.
```
2018-06-23 20:12:01 -07:00
Bobby (Babak) Salamat ffc8cc2f50 Improve scheduler's performance by eliminating sorting when finding the host with the highest score 2018-06-23 11:24:43 -07:00
Kubernetes Submit Queue 582b88c879
Merge pull request #64995 from bsalamat/preempt_opt
Automatic merge from submit-queue (batch tested with PRs 65388, 64995). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add more conditions to the list of predicate failures that won't be resolved by preemption

**What this PR does / why we need it**:
Adds more conditions to the list of predicate failures that won't be resolved by preemption. This change can potentially improve performance of preemption by avoiding the nodes that won't be able to schedule the pending pod no matter how many other pods are removed from them.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
Add more conditions to the list of predicate failures that won't be resolved by preemption.
```

/sig scheduling
2018-06-23 05:52:07 -07:00
Bobby (Babak) Salamat 8cdf83ed1e Add tests to cover newly added unresolvable failures 2018-06-22 17:06:19 -07:00
Bobby (Babak) Salamat fab26e470c Add more unresolvable conditions to optimize preemption logic 2018-06-22 17:04:55 -07:00
Jeff Grafton 23ceebac22 Run hack/update-bazel.sh 2018-06-22 16:22:57 -07:00
Jeff Grafton a725660640 Update to gazelle 0.12.0 and run hack/update-bazel.sh 2018-06-22 16:22:18 -07:00
Kubernetes Submit Queue b45ba959c0
Merge pull request #64693 from xiechengsheng/fix-typos
Automatic merge from submit-queue (batch tested with PRs 65024, 65287, 65345, 64693, 64941). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix some typos in code comments.

Signed-off-by: xiechengsheng <XIE1995@whut.edu.cn>



**What this PR does / why we need it**:
Fix some typos in code comments.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
NONE

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2018-06-22 06:10:21 -07:00
xiechengsheng 66e0b53c3c fix some typos
Signed-off-by: xiechengsheng <XIE1995@whut.edu.cn>
2018-06-22 09:26:14 +08:00
Kubernetes Submit Queue 23b4690d00
Merge pull request #65306 from shyamjvs/fine-grained-scheduler-metric
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Split scheduler latency metric to fine-grained steps

This splits the summary metric we recently added into finer steps. It should be very useful for performance experiments.

/cc @wojtek-t 
fyi - @bsalamat @misterikkit 

Strictly speaking this is a breaking change, but since this metric was added only ~week ago I think it should fine (we should port this change to 1.11).

```release-note
Split 'scheduling_latency_seconds' metric into finer steps (predicate, priority, premption)
```
2018-06-21 09:11:58 -07:00
Shyam Jeedigunta b9ae20c99e Split scheduler latency metric to fine-grained steps 2018-06-21 14:19:39 +02:00
Kubernetes Submit Queue d1f5cb2348
Merge pull request #65050 from sttts/sttts-deepcopy-update
Automatic merge from submit-queue (batch tested with PRs 64895, 64938, 63700, 65050, 64957). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Bump gengo to include uniform pointer deepcopy

This bumps k8s.io/gengo with uniform pointer support in deepcopy-gen.

Fixes https://github.com/kubernetes/code-generator/issues/45.
2018-06-21 04:15:16 -07:00
Kubernetes Submit Queue 15902d9113
Merge pull request #63662 from xchapter7x/pkg-scheduler-algorithmprovider
Automatic merge from submit-queue (batch tested with PRs 64285, 63660, 63661, 63662, 64883). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

use subtest for table units (pkg/scheduler/algorithmprovider)

**What this PR does / why we need it**: Update scheduler's unit table tests to use subtest

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:

**Special notes for your reviewer**:
breaks up PR: https://github.com/kubernetes/kubernetes/pull/63281
/ref #63267

**Release note**:

```release-note
This PR will leverage subtests on the existing table tests for the scheduler units.
Some refactoring of error/status messages and functions to align with new approach.

```
2018-06-21 01:19:26 -07:00
Kubernetes Submit Queue 58574021a7
Merge pull request #63661 from xchapter7x/pkg-scheduler
Automatic merge from submit-queue (batch tested with PRs 64285, 63660, 63661, 63662, 64883). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

use subtest for table units (pkg/scheduler)

**What this PR does / why we need it**: Update scheduler's unit table tests to use subtest

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:

**Special notes for your reviewer**:
breaks up PR: https://github.com/kubernetes/kubernetes/pull/63281
/ref #63267

**Release note**:

```release-note
This PR will leverage subtests on the existing table tests for the scheduler units.
Some refactoring of error/status messages and functions to align with new approach.

```
2018-06-21 01:19:22 -07:00
Kubernetes Submit Queue 1712073165
Merge pull request #63660 from xchapter7x/pkg-scheduler-algorithm-predicates
Automatic merge from submit-queue (batch tested with PRs 64285, 63660, 63661, 63662, 64883). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

use subtest for table units (pkg/scheduler/algorithm/predicates)

**What this PR does / why we need it**: Update scheduler's unit table tests to use subtest

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:

**Special notes for your reviewer**:
breaks up PR: https://github.com/kubernetes/kubernetes/pull/63281
/ref #63267

**Release note**:

```release-note
This PR will leverage subtests on the existing table tests for the scheduler units.
Some refactoring of error/status messages and functions to align with new approach.

```
2018-06-21 01:19:18 -07:00
Jonathan Basseri 56b941f3df scheduler: fix equiv. cache logging.
Change the log messages to accurately reflect the cache behavior.
2018-06-20 10:52:33 -07:00
Jonathan Basseri c24842d806 Add a package docstring to core/equivalence. 2018-06-20 10:52:33 -07:00
Jonathan Basseri b571065bc4 Clean up names in equivalence package.
Remove stutter from names and provide more idiomatic patterns.

This makes call sites that use equivalence cache easier to read.
2018-06-20 10:52:33 -07:00