k3s/pkg/scheduler
Kubernetes Submit Queue b883f4cff8
Merge pull request #65745 from silveryfu/image-locality-scoring
Automatic merge from submit-queue (batch tested with PRs 66011, 66111, 66106, 66039, 65745). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Enable adaptive scoring in ImageLocalityPriority

**What this PR does / why we need it**:

This PR replaces the original, pure image-size based scoring to an adaptive scoring scheme. The new scoring scheme considers not only the image size but also its `"spread" `- the definition of `"spread"` is described in what follows: 

> Given an image`i`, `spread_i = num_node_has_i / total_num_nodes`  

And the image receives the score: `score_i = size_i * spread_i`, as proposed by @resouer. The final node score is the summation of image scores for all images found existing on the node that are mentioned in the pod spec.

The goal of this heuristic is to better _balance image locality with other scheduling policies_. In particular, it aims to mitigate and prevent the undesirable "node heating problem", _i.e._, pods get assigned to the same or a few nodes due to preferred image locality. Given an image, the larger `spread` it has the more image locality we can consider for it - since we can expect more nodes having this image.

The new image state information in scheduler cache, enabled in this PR, allows other potential heuristics to be explored.

**Special notes for your reviewer**:

@resouer 

Additional unit tests are WIP. 

**Release note**:

```release-note
NONE
```
2018-07-12 17:57:16 -07:00
..
algorithm Rework image locality with spread-based scoring 2018-07-11 23:58:23 -07:00
algorithmprovider Fix scheduler config decoding 2018-06-24 23:28:56 -04:00
api Update generated files 2018-06-29 20:36:17 +02:00
cache Merge pull request #65745 from silveryfu/image-locality-scoring 2018-07-12 17:57:16 -07:00
core Merge pull request #65396 from bsalamat/sched_no_sort 2018-06-23 20:12:01 -07:00
factory Invalidate CheckVolumeBinding predicate cache on PV update. 2018-07-12 14:55:30 +08:00
metrics Split scheduler latency metric to fine-grained steps 2018-06-21 14:19:39 +02:00
testing Run hack/update-bazel.sh 2018-06-22 16:22:57 -07:00
util Run hack/update-bazel.sh 2018-06-22 16:22:57 -07:00
volumebinder Run hack/update-bazel.sh 2018-06-22 16:22:57 -07:00
BUILD Run hack/update-bazel.sh 2018-06-22 16:22:57 -07:00
OWNERS Update OWNERS labels for cluster-lifecycle and scheduling 2018-04-05 16:25:04 -05:00
scheduler.go Increase glog level of some scheduling errors. 2018-06-28 23:34:29 -04:00
scheduler_test.go use subtest for table units 2018-06-08 10:52:28 -04:00
testutil.go