Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fixes a flakiness in GPUDevicePlugin e2e test.
Waits till nvidia gpu disappears from all nodes after deleting the
device plug DaemonSet to make sure its pods are deleted from all nodes.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
https://github.com/kubernetes/kubernetes/issues/53281
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 50988, 50509, 52660, 52663, 52250). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Added device plugin e2e kubelet failure test
Signed-off-by: Renaud Gaubert <renaud.gaubert@gmail.com>
**What this PR does / why we need it**:
This is part of issue #52859 (fixes#52859)
This PR adds a e2e_node test for the device plugin.
Specifically it implements testing of failure handling by the device plugin components in case Kubelet restart / crashes.
I might try to refactor the GPU tests in a later PR.
**Special notes for your reviewer**:
@jiayingz @vishh
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
bazel: build/test almost everything
**What this PR does / why we need it**: Miscellaneous cleanups and bug fixes. The main motivating idea here was to make `bazel build //...` and `bazel test //...` mostly work. (There's a few reasons these still don't work, but we're a lot closer.)
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
/assign @BenTheElder @mikedanese @spxtr
Automatic merge from submit-queue (batch tested with PRs 52442, 52247, 46542, 52363, 51781)
Add more tests for pod preemption
**What this PR does / why we need it**:
Adds more e2e and integration tests for pod preemption.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
This PR is based on #50949. Only the last commit is new.
**Release note**:
```release-note
NONE
```
ref/ #47604
@kubernetes/sig-scheduling-pr-reviews @davidopp
Automatic merge from submit-queue (batch tested with PRs 52316, 52289, 52375)
Extends GPUDevicePlugin e2e test to exercise device plugin restarts.
**What this PR does / why we need it**:
This is part of issue #52189 but does not fix it.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue
Version gates the ephemeral storage e2e test
Version gates the ephemeral storage e2e test.
**Release note**:
```
NONE
```
@kubernetes/sig-testing-pr-reviews
Automatic merge from submit-queue
Extend nvidia-gpus e2e test to include a device plugin based test
**What this PR does / why we need it**:
This is needed to verify device plugin feature.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/kubernetes/features/issues/368
**Special notes for your reviewer**:
Related test_infra PR: https://github.com/kubernetes/test-infra/pull/4265
**Release note**:
Add an e2e test for nvidia gpu device plugin
Automatic merge from submit-queue
Add pod preemption to the scheduler
**What this PR does / why we need it**:
This is the last of a series of PRs to add priority-based preemption to the scheduler. This PR connects the preemption logic to the scheduler workflow.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#48646
**Special notes for your reviewer**:
This PR includes other PRs which are under review (#50805, #50405, #50190). All the new code is located in 43627afdf9.
**Release note**:
```release-note
Add priority-based preemption to the scheduler.
```
ref/ #47604
/assign @davidopp
@kubernetes/sig-scheduling-pr-reviews
Automatic merge from submit-queue
Use the right image for the right platform in the e2e tests
**What this PR does / why we need it**:
This PR is for enabling kubernetes tests for multi architecture platform
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#38067
**Special notes for your reviewer**:
This will enable conformance tests for all the supported architectures.
**Release note**:
```release-note
Make all e2e tests lookup image to use from a centralized place. In that centralized place, add support for multiple platforms.
```
x-ref #38067
Automatic merge from submit-queue (batch tested with PRs 49961, 50005, 50738, 51045, 49927)
Add cluster e2es to verify scheduler local storage support
Add cluster e2es to verify scheduler local storage support and remove some unused private functions
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
part of #50818
**Release note**:
```release-note
Add cluster e2es to verify scheduler local ephemeral storage support
```
/assign @jingxu97
/cc @ddysher
Automatic merge from submit-queue (batch tested with PRs 51193, 51154, 42689, 51189, 51200)
Re-enable OIR e2e tests.
Re-enabling test skeleton for opaque integer resources originally submitted as part of #41870. The e2e was disabled since it was flaky. This is the first step toward re-enabling them. Currently all cases are skipped, so this exercises only the BeforeEach behavior and the deferred removal of OIRs from a node.
cc @timothysc
Automatic merge from submit-queue (batch tested with PRs 50257, 50247, 50665, 50554, 51077)
Replace hard-code "cpu" and "memory" to consts
**What this PR does / why we need it**:
There are many places using hard coded "cpu" and "memory" as resource name. This PR replace them to consts.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
/kind cleanup
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50277, 50823, 50376, 50867)
Move e2e taints test file to sig-scheduling
**What this PR does / why we need it**:
Move taint test file to e2e scheduling and add sig-scheduling prefix.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Ref Umbrella issue #49161
**Special notes for your reviewer**:
**Release note**:
none
Automatic merge from submit-queue (batch tested with PRs 49358, 49253)
Remove hostname label condition in SchedulerPredicates
**What this PR does / why we need it**:
```
validates that NodeSelector is respected if matching [Conformance]
validates that required NodeAffinity setting is respected if matching
```
The two tests above make the assumption that the node names are equal to the `kubernetes.io/hostname` labels. Unfortunately, this is not necessarily true all the time. For instance, when using the AWS Cloud Provider + Container Linux:
- The node name is set using the AWS SDK's `ec2.Instance.PrivateDnsName` and has the form `ip-10-0-35-57.ca-central-1.compute.internal` [[1](https://github.com/kubernetes/kubernetes/blob/v1.7.1/pkg/cloudprovider/providers/aws/aws.go#L3343-L3346)] [[2](https://raw.githubusercontent.com/aws/aws-sdk-go/master/service/ec2/api.go)]
- The node's hostname, however, is a simple call to `os.Hostname()`, itself reading `/proc/sys/kernel/hostname`, which contains what the AWS DHCP assigned to the instance, typically the hostname short-form: `ip-10-0-16-137`. [[1](https://github.com/kubernetes/kubernetes/blob/v1.7.1/pkg/util/node/node.go#L43-L54)]
Consequently, we are trying to assign a pod to a node having the following label: `kubernetes.io/hostname=ip-10-0-35-57.ca-central-1.compute.internal` (in addition to the randomly generated label), whereas the actual label on the node is `kubernetes.io/hostname=ip-10-0-35-57`.
Furthermore, this inaccurate `kubernetes.io/hostname=<nodename>` condition is actually useless given we already match over a random label, that was assigned to that node. Later, the test ensures that the scheduled pod was scheduled to the right node by comparing the pod's node name and the node name we expected the pod to be on:
```
framework.ExpectNoError(framework.WaitForPodNotPending(cs, ns, labelPodName))
labelPod, err := cs.Core().Pods(ns).Get(labelPodName, metav1.GetOptions{})
framework.ExpectNoError(err)
Expect(labelPod.Spec.NodeName).To(Equal(nodeName))
```
The `k8s.io/apimachinery/pkg/types/nodename` data structure actually [warns](55bee3ad21/staging/src/k8s.io/apimachinery/pkg/types/nodename.go (L40-L43)) about the fact that the node name might be different than the hostname on AWS.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Add an integration test library and some integration tests for scheduler
**What this PR does / why we need it**:
1. Add an integration test library (utils.go) for scheduler testing.
2. Cleaned up some of the tests in scheduler_test.go with the new integration test library.
3. Add priority_test.go with a couple of examples on how to test scheduler priority function in integration tests.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
ref/ #48176
@kubernetes/sig-scheduling-pr-reviews
@davidopp @k82cn @vikaschoudhary16
Automatic merge from submit-queue (batch tested with PRs 48043, 48200, 49139, 36238, 49130)
Implement equivalence cache by caching and re-using predicate result
The last part of #30844, I opened a new PR instead of overwrite the old one because we changed some basic assumption by allowing invalidating equivalence cache item by individual predicate.
The idea of this PR is based on discussion in https://github.com/kubernetes/kubernetes/issues/32024
- [x] Pods belong to same controllerRef considered to be equivalent
- [x] ` podFitsOnNode` will use cached predicate result if it's available
- [x] Equivalence cache will be updated when if a fresh new predicate is done
- [x] `factory.go` will invalid specific predicate cache(s) based on the object change
- [x] Since `schedule` and `bind` are async, we need to optimistically invalid affected cache(s) before `bind`
- [x] Fully unit test of affected files
- [x] e2e test to verify cache update/invalid workflow
- [x] performance test results
- [x] Some nits fixes related but expected to result in `needs-rebase` so they are split to: #36060#35968#37512
cc @wojtek-t @davidopp
Automatic merge from submit-queue (batch tested with PRs 49055, 49128, 49132, 49134, 49110)
Remove affinity annotations leftover
**What this PR does / why we need it**:
This is a further cleanup for affinity annotations, following #47869.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
ref: #47869
**Special notes for your reviewer**:
- I remove the commented test cases and just leave TODOs instead. I think converting these untestable test cases for now is not necessary. We can add new test cases in future.
- I remove the e2e test case `validates that embedding the JSON PodAffinity and PodAntiAffinity setting as a string in the annotation value work` because we have a test case `validates that InterPod Affinity and AntiAffinity is respected if matching` to test the same thing.
/cc @aveshagarwal @bsalamat @gyliu513 @k82cn @timothysc
**Release note**:
```release-note
NONE
```