k3s/staging
Kubernetes Submit Queue 00bf292cdc
Merge pull request #66480 from Huang-Wei/stateless-MatchNodeSelectorTerms
Automatic merge from submit-queue (batch tested with PRs 67042, 66480, 67053). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

ensure MatchNodeSelectorTerms() runs statelessly

**What this PR does**:

Fix sorting behavior in selector.go:

- move sorting from NewRequirement() out to String()
- add related unit tests
- add unit tests in one of outer callers (pkg/apis/core/v1/helper)

**Why we need it**:
- Without this fix, scheduling and daemonset controller doesn't work well in some (corner) cases

**Which issue(s) this PR fixes**:
Fixes #66298

**Special notes for your reviewer**:
Parameter `nodeSelectorTerms` in method MatchNodeSelectorTerms() is a slice, which is fundamentally a {*elements, len, cap} tuple - i.e. it's passing in a pointer. In that method, NodeSelectorRequirementsAsSelector() -> NewRequirement() is invoked, and the `matchExpressions[*].values` is passed into and **modified** via `sort.Strings(vals)`.

This will cause following daemonset pod fall into an infinite create/delete loop:

```yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: problem
spec:
  selector:
    matchLabels:
      app: sleeper
  template:
    metadata:
      labels:
        app: sleeper
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/hostname
                operator: In
                values:
                - 127.0.0.2
                - 127.0.0.1
      containers:
      - name: busybox
        image: busybox
        command: ["/bin/sleep", "7200"]
```

(the problem can be stably reproduced on a local cluster started by `hack/local-up-cluster.sh`)

The first time daemonset yaml is handled by apiserver and persisted in etcd with original format (original order of values was kept - 127.0.0.2, 127.0.0.1). After that, daemonset controller tries to schedule pod, and it reuses the predicates logic in scheduler component - where the values are **sorted** deeply. This not only causes the pod to be created in sorted order (127.0.0.1, 127.0.0.2), but also introduced a bug when updating daemonset - internally ds controller use a "rawMessage" (bytes of an object) to calculate hash acting as a "controller-revision-hash" to control revision rollingUpdate/rollBack, so it keeps killing "old" pod and spawning "new" pod back and forth, and fall into an infinite loop.

The issue exists in `master`, `release-1.11` and `release-1.10`.

**Release note**:
```release-note
NONE
```
2018-08-07 14:27:59 -07:00
..
src Merge pull request #66480 from Huang-Wei/stateless-MatchNodeSelectorTerms 2018-08-07 14:27:59 -07:00
BUILD Run hack/update-bazel.sh 2018-06-22 16:22:57 -07:00
OWNERS
README.md

README.md

External Repository Staging Area

This directory is the staging area for packages that have been split to their own repository. The content here will be periodically published to respective top-level k8s.io repositories.

Repositories currently staged here:

The code in the staging/ directory is authoritative, i.e. the only copy of the code. You can directly modify such code.

Using staged repositories from Kubernetes code

Kubernetes code uses the repositories in this directory via symlinks in the vendor/k8s.io directory into this staging area. For example, when Kubernetes code imports a package from the k8s.io/client-go repository, that import is resolved to staging/src/k8s.io/client-go relative to the project root:

// pkg/example/some_code.go
package example

import (
  "k8s.io/client-go/dynamic" // resolves to staging/src/k8s.io/client-go/dynamic
)

Once the change-over to external repositories is complete, these repositories will actually be vendored from k8s.io/<package-name>.