Commit Graph

10083 Commits (0f0c754eb433acb501fe64fa46007a922aa68715)

Author SHA1 Message Date
Shyam Jeedigunta 0f0c754eb4 Get rid of duplicate VerifyPodStartupLatency util in node density tests 2018-03-21 16:58:31 +01:00
Shyam Jeedigunta b0dd166fa3 Capture different parts of pod-startup latency as metrics 2018-03-21 16:58:25 +01:00
Kubernetes Submit Queue 01e67e2808
Merge pull request #61169 from muhongwei/mhw_branch1
Automatic merge from submit-queue (batch tested with PRs 59536, 61104, 61030, 59013, 61169). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Correct spelling
2018-03-21 03:43:24 -07:00
Kubernetes Submit Queue bd745a4080
Merge pull request #59013 from crimsonfaith91/ds-integration-3
Automatic merge from submit-queue (batch tested with PRs 59536, 61104, 61030, 59013, 61169). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

add rolling update daemonset existing pod adoption integration test

**What this PR does / why we need it**:
This PR adds rolling update DaemonSet existing pod adoption integration test. It also shifts all helper functions to a new utility file.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
xref #52191

**Release note**:
```release-note
NONE
```
2018-03-21 03:43:21 -07:00
Kubernetes Submit Queue a5a3bc93f1
Merge pull request #61104 from hzxuzhonghu/AlwaysAdmit-stop-using
Automatic merge from submit-queue (batch tested with PRs 59536, 61104, 61030, 59013, 61169). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

stop using AlwaysAdmit admission

`AlwaysAdmit ` was deprecated, and stop using it in test.

**Release note**:

```release-note
NONE
```
2018-03-21 03:43:14 -07:00
Kubernetes Submit Queue dbea6f6372
Merge pull request #61085 from hzxuzhonghu/unversioned-cleanup
Automatic merge from submit-queue (batch tested with PRs 60919, 60953, 61085, 61083, 60971). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

remove unused `pkg/api/unversioned`

**What this PR does / why we need it**:

clean code, see #61084 

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #61084

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2018-03-20 20:34:31 -07:00
Kubernetes Submit Queue 14e3efe26a
Merge pull request #58717 from resouer/extender-interface
Automatic merge from submit-queue (batch tested with PRs 60759, 60531, 60923, 60851, 58717). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Implement preemption for extender with a verb and new interface

**What this PR does / why we need it**:

This is an alternative way of implementing #51656

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #51656

**Special notes for your reviewer**:

We will also want to compare with #56296 to see which one is the best solution. See: https://github.com/kubernetes/kubernetes/pull/56296#discussion_r163381235

cc @ravigadde @bsalamat 

**Release note**:

```release-note
Implement preemption for extender with a verb and new interface
```
2018-03-20 15:34:41 -07:00
Kubernetes Submit Queue c5d4a032d7
Merge pull request #60547 from brahmaroutu/conf_kubectl
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Adding details to Conformance Tests using RFC 2119 standards.

This PR is part of the conformance documentation. This is to provide more formal specification using RFC 2119 keywords to describe the test so that who ever is running conformance tests do not have to go through the code to understand why and what is tested.
The documentation information added here into each of the tests eventually result into a document which is currently checked in at location https://github.com/cncf/k8s-conformance/blob/master/docs/KubeConformance-1.9.md

I would like to have this PR reviewed for v1.10 as I consider it important to strengthen the conformance documents.
2018-03-20 11:35:23 -07:00
Kubernetes Submit Queue 7ab554ce43
Merge pull request #60666 from immutableT/kms_mock_flake_issue
Automatic merge from submit-queue (batch tested with PRs 60574, 60666, 60831, 60877, 60357). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Remove potential sources of flakes for kms_transformation_test.go.

**What this PR does / why we need it**:
Remove potential sources for flakes in TestKMSPlugin test.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
#60614
**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2018-03-20 08:34:35 -07:00
Kubernetes Submit Queue 2cb4297c96
Merge pull request #59637 from hanxiaoshuai/bugfix0209
Automatic merge from submit-queue (batch tested with PRs 59637, 60611, 60788, 60489, 60687). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

fix todo: use a better way to keep this label unique in the test

**What this PR does / why we need it**:
fix todo: use a better way to keep this label unique in the test in test/e2e/apimachinery/garbage_collector.go
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2018-03-20 04:34:31 -07:00
Kubernetes Submit Queue c0db49c2cb
Merge pull request #60331 from jennybuckley/watch-e2e-test
Automatic merge from submit-queue (batch tested with PRs 60457, 60331, 54970, 58731, 60562). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add e2e test for watch

**What this PR does / why we need it**:
Currently watch is only tested by kubectl tests, but all clients of kubernetes should be able to reliably watch resources for changes. This test should be able to accommodate testing watch for custom resources without many changes.

/sig api-machinery

```release-note
Added e2e test for watch
```
2018-03-19 23:42:11 -07:00
Kubernetes Submit Queue 8c3b5541e5
Merge pull request #60457 from sjenning/fix-websocket-e2e-test
Automatic merge from submit-queue (batch tested with PRs 60457, 60331, 54970, 58731, 60562). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

tests: e2e: empty msg from channel other than stdout should be non-fatal

Currently, if the exec websocket encounters a message that is not in the stdout stream, it immediately fails.  However it also currently requests the stderr steam in the query params.  There doesn't seem to be any guarantee that we don't get an empty message on the stderr stream.

Requesting the stderr stream in the query is desirable if, for some reason, something in the container fails and writes to stderr.

However, we do not need fail the test if we get an empty message on another stream.  If the message is not empty, then that _does_ indicate and error and we should fail.

This is the situation we are currently observing with docker 1.13 in the origin CI https://github.com/openshift/origin/issues/18726

@derekwaynecarr @smarterclayton @gabemontero @liggitt @deads2k 

/sig node
2018-03-19 23:42:07 -07:00
Kubernetes Submit Queue 9df57d2812
Merge pull request #60086 from dougm/vsphere-finder
Automatic merge from submit-queue (batch tested with PRs 59740, 59728, 60080, 60086, 58714). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

vSphere: Minimize property collection via Finder

The 'All' parameter of the 'NewFinder' function controls property collection while searching the inventory.
When 'All' is set to 'false', Finder collects the minimal set of object properties required to search inventory.
When 'All' is set to 'true', Finder collects *all* object properties, which are *not* required to search inventory.
Setting 'All' to 'true' is only useful when inspecting all properties of an object,
such as by certain govc commands when the '-json' or '-dump' flags are specified.

Changing All=false in VCP minimizes the SOAP payload size and marshalling required on both sides, without impacting any functionality.



**What this PR does / why we need it**:

Changing All=false in VCP minimizes the SOAP payload size and marshalling required on both sides, without impacting any functionality.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2018-03-19 21:34:36 -07:00
Kubernetes Submit Queue c64f19dd1b
Merge pull request #59728 from wgliang/master.append
Automatic merge from submit-queue (batch tested with PRs 59740, 59728, 60080, 60086, 58714). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

more concise to merge the slice

**What this PR does / why we need it**:
more concise to merge the slice

**Special notes for your reviewer**:
2018-03-19 21:34:30 -07:00
Kubernetes Submit Queue 37b2edd855
Merge pull request #54300 from jsafrane/fix-test-images
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix test images

These commits fix volume_io tests for iSCSI and Ceph RBD. Both server images were quite old (Fedora 22), so I updated them to ~~something more stable (CentOS 7) and to newer Ceph (Jewel, 10.2.7).~~ something newer (Fedora 26).

The most important fix is that the test volumes have 120 MB so volume_io test can actually run - the tests put 100MB file to the volume to check its persistence.

When mount containers in #53440 are merged I'll try to run the tests regularly with every PR (or merge) so we catch regressions quickly.

```release-note
NONE
```

/sig testing
/sig storage

/assign @jeffvance 

Fixes: #56725
2018-03-19 20:34:22 -07:00
Shyam Jeedigunta e5dc6c88eb Wait for only enough no. of RC replicas to be running in testutil 2018-03-19 14:22:18 +01:00
Kubernetes Submit Queue ebae09e741
Merge pull request #61234 from nikhiljindal/kubemciTest
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fail the ingress test if it timesout getting address

Updating the test to fail if it timesout getting IP address for the ingress rather than silently ignoring that error.
Also improved some logging to print more information.

This is to help in debugging tests added in https://github.com/kubernetes/kubernetes/pull/59234

cc @madhusudancs @MrHohn @nicksardo 

Ref https://github.com/GoogleCloudPlatform/k8s-multicluster-ingress/issues/131

```release-note
NONE
```
2018-03-18 15:33:43 -07:00
Kubernetes Submit Queue f125152212
Merge pull request #61284 from jsafrane/fix-fsgroup-subpath
Automatic merge from submit-queue (batch tested with PRs 61284, 61119, 61201). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix creation of subpath with SUID/SGID directories.

SafeMakeDir() should apply SUID/SGID/sticky bits to the directory it creates.

Fixes #61283 

**Release note**:

```release-note
NONE
```
2018-03-16 16:55:57 -07:00
Hemant Kumar 0600f7ee22 Fix e2e tests for emptydir 2018-03-16 15:14:42 -04:00
Kubernetes Submit Queue ca02c11887
Merge pull request #61161 from k82cn/k8s_59194_4
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Added unschedulable taint

Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>

**What this PR does / why we need it**:

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
part of #59194; fixes #61050

**Release note**:

```release-note
When `TaintNodesByCondition` enabled, added `node.kubernetes.io/unschedulable:NoSchedule`
 taint to the node if `spec.Unschedulable` is true.

When `ScheduleDaemonSetPods` enabled, `node.kubernetes.io/unschedulable:NoSchedule` 
toleration is added automatically to DaemonSet Pods; so the `unschedulable` field of 
a node is not respected by the DaemonSet controller.
```
2018-03-16 11:22:05 -07:00
Da K. Ma b23db30765 Added unscheduable taint.
Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>
2018-03-16 09:13:08 +08:00
Cheng Xing fe76c9f779 Fixes 'Zone is empty' errors in PD upgrade tests; skips pd tests with inline volume in multizone clusters 2018-03-15 15:00:13 -07:00
nikhiljindal cdfbb54db2 Fail the ingress test if it timesout getting address for IP address 2018-03-15 14:46:17 -07:00
Jun Xiang Tee 92070eba3d add rolling update daemonset existing pod adoption integration test 2018-03-14 14:00:38 -07:00
Kubernetes Submit Queue 05ec0a77b4
Merge pull request #61118 from shyamjvs/bump-apiserver-mem-threshold
Automatic merge from submit-queue (batch tested with PRs 61118, 60579). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Increase apiserver mem-threshold in density test

Ref: https://github.com/kubernetes/kubernetes/issues/60500#issuecomment-372682659 (fixes part of that issue)

/sig scalability
/kind bug
/priority important-soon
/cc @wojtek-t
/cc @crassirostris (for the release-note)

```release-note
Audit logging with buffering enabled can increase apiserver memory usage (e.g. up to 200MB in 100-node cluster). The increase is bounded by the buffer size (configurable). Ref: issue #60500
```
2018-03-14 09:49:48 -07:00
muhongwei e153a0d9cb Correct spelling 2018-03-14 18:03:42 +08:00
Kubernetes Submit Queue 32343b7f3d
Merge pull request #61111 from jsafrane/fix-subpath-multizone
Automatic merge from submit-queue (batch tested with PRs 61111, 61069). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix subpath e2e tests on multizone cluster.

Use dynamically provisioned PV to run GCE PD tests. This will make sure that the pod is scheduled to the right zone and GCE PD can be attached to a node.

**Which issue(s) this PR fixes**:
Fixes #61101 


**Release note**:

```release-note
NONE
```
/sig storage
@msau42 @verult
2018-03-13 14:06:47 -07:00
Kubernetes Submit Queue ae990bb5a9
Merge pull request #60968 from loburm/fix_gke_logging_test
Automatic merge from submit-queue (batch tested with PRs 60737, 60739, 61080, 60968, 60951). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix broken gke regional logging test.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #60882

```release-note
NONE
```
2018-03-13 12:27:04 -07:00
Kubernetes Submit Queue 8313fc0dac
Merge pull request #61080 from liggitt/subpath-test
Automatic merge from submit-queue (batch tested with PRs 60737, 60739, 61080, 60968, 60951). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Detect backsteps correctly in base path detection

Avoids false positives with atomic writer `..<timestamp>` directories

Fixes #61076

/assign @msau42 @jsafrane

```release-note
Fix a regression that prevented using `subPath` volume mounts with secret, configMap, projected, and downwardAPI volumes
```
2018-03-13 12:27:00 -07:00
Kubernetes Submit Queue b651ed5ea7
Merge pull request #60998 from jpbetz/etcd-3.1.12
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Bump to etcd 3.1.12 to pick up critical fix

etcd [3.1.12](https://github.com/coreos/etcd/releases/tag/v3.1.12) (as well as 3.2.17 and 3.3.2) was released yesterday to fix a bug critical to kubernetes:

Fix [mvcc "unsynced" watcher restore operation](https://github.com/coreos/etcd/pull/9297).
- "unsynced" watcher is watcher that needs to be in sync with events that have happened.
- That is, "unsynced" watcher is the slow watcher that was requested on old revision.
- "unsynced" watcher restore operation was not correctly populating its underlying watcher group.
- Which possibly causes [missing events from "unsynced" watchers](https://github.com/coreos/etcd/issues/9086).

This will be backported to 1.9 as well.

Release note:
```release-note
Upgrade the default etcd server version to 3.1.12 to pick up critical etcd "mvcc "unsynced" watcher restore operation" fix.
```

cc @gyuho @wojtek-t @shyamjvs @timothysc @jdumars
2018-03-13 09:11:10 -07:00
Shyam JVS b43b621690
Increase apiserver mem-threshold in density test 2018-03-13 16:47:14 +01:00
Jan Safranek c44e135442 Fix subpath e2e tests on multizone cluster.
Use dynamically provisioned PV to run GCE PD tests. This will make sure
that the pod is scheduled to the right zone and GCE PD can be attached
to a node.
2018-03-13 14:26:37 +01:00
Jordan Liggitt 806f6772c6
Add atomic writer subpath e2e tests 2018-03-13 08:53:50 -04:00
xuzhonghu 70d5af6e7b stop using AlwaysAdmit admission 2018-03-13 20:02:56 +08:00
hzxuzhonghu f12647e16d pkg/api/unversioned related cleanup 2018-03-13 17:20:16 +08:00
Kubernetes Submit Queue 3d1331f297
Merge pull request #61044 from liggitt/subpath-master
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

subpath fixes

fixes #60813 for master / 1.10

```release-note
Fixes CVE-2017-1002101 - See https://issue.k8s.io/60813 for details
```
2018-03-12 11:51:59 -07:00
jennybuckley 3b2472a305 Add e2e test for watch 2018-03-12 10:48:43 -07:00
Kubernetes Submit Queue a3f40dd8df
Merge pull request #60856 from jiayingz/race-fix
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fixes the races around devicemanager Allocate() and endpoint deletion.

There is a race in predicateAdmitHandler Admit() that getNodeAnyWayFunc()
could get Node with non-zero deviceplugin resource allocatable for a
non-existing endpoint. That race can happen when a device plugin fails,
but is more likely when kubelet restarts as with the current registration
model, there is a time gap between kubelet restart and device plugin
re-registration. During this time window, even though devicemanager could
have removed the resource initially during GetCapacity() call, Kubelet
may overwrite the device plugin resource capacity/allocatable with the
old value when node update from the API server comes in later. This
could cause a pod to be started without proper device runtime config set.

To solve this problem, introduce endpointStopGracePeriod. When a device
plugin fails, don't immediately remove the endpoint but set stopTime in
its endpoint. During kubelet restart, create endpoints with stopTime set
for any checkpointed registered resource. The endpoint is considered to be
in stopGracePeriod if its stoptime is set. This allows us to track what
resources should be handled by devicemanager during the time gap.
When an endpoint's stopGracePeriod expires, we remove the endpoint and
its resource. This allows the resource to be exported through other channels
(e.g., by directly updating node status through API server) if there is such
use case. Currently endpointStopGracePeriod is set as 5 minutes.

Given that an endpoint is no longer immediately removed upon disconnection,
mark all its devices unhealthy so that we can signal the resource allocatable
change to the scheduler to avoid scheduling more pods to the node.
When a device plugin endpoint is in stopGracePeriod, pods requesting the
corresponding resource will fail admission handler.

Tested:
Ran GPUDevicePlugin e2e_node test 100 times and all passed now.



**What this PR does / why we need it**:

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/kubernetes/issues/60176

**Special notes for your reviewer**:

**Release note**:

```release-note
Fixes the races around devicemanager Allocate() and endpoint deletion.
```
2018-03-12 02:50:13 -07:00
Kubernetes Submit Queue 36058cb0c3
Merge pull request #60997 from MrHohn/e2e-fix-cleanup-svc-regional
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

[e2e service] Fix CleanupGCEResources for regional test

*What this PR does / why we need it**:
From https://k8s-testgrid.appspot.com/google-gke-staging#gke-staging-1-8-1-9-upgrade-regional-cluster&width=20, regional cluster test is failing because the GCE resource cleanup function attempts to parse region from `--gcp-zone` while regional cluster only set `--gcp-region`.

This PR pipes region into the cleanup function as well. This will need to be cherrypicked to 1.8 and 1.9.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #NONE 

**Special notes for your reviewer**:
/assign @bowei @wojtek-t 
cc @nikhiljindal to see if there is anything should be fixed for federation.

**Release note**:

```release-note
NONE
```
2018-03-12 00:03:38 -07:00
Kubernetes Submit Queue 9ad5ea2d61
Merge pull request #60993 from MrHohn/e2e-restart-apiserver-refine-followup
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

[e2e service] Move apiserver restart validation logic into util

**What this PR does / why we need it**:
Follow up of #60906, on GKE apiserver pod is invisible on k8s, hence test is failing.

This PR bakes the restart validation logic into the util function instead so it could be env-awared.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #60761

**Special notes for your reviewer**:
Sorry for the noise.
/assign @rramkumar1 @bowei 
cc @krzyzacy 

**Release note**:

```release-note
NONE
```
2018-03-09 18:33:05 -08:00
Jiaying Zhang 5514a1f4dd Fixes the races around devicemanager Allocate() and endpoint deletion.
There is a race in predicateAdmitHandler Admit() that getNodeAnyWayFunc()
could get Node with non-zero deviceplugin resource allocatable for a
non-existing endpoint. That race can happen when a device plugin fails,
but is more likely when kubelet restarts as with the current registration
model, there is a time gap between kubelet restart and device plugin
re-registration. During this time window, even though devicemanager could
have removed the resource initially during GetCapacity() call, Kubelet
may overwrite the device plugin resource capacity/allocatable with the
old value when node update from the API server comes in later. This
could cause a pod to be started without proper device runtime config set.

To solve this problem, introduce endpointStopGracePeriod. When a device
plugin fails, don't immediately remove the endpoint but set stopTime in
its endpoint. During kubelet restart, create endpoints with stopTime set
for any checkpointed registered resource. The endpoint is considered to be
in stopGracePeriod if its stoptime is set. This allows us to track what
resources should be handled by devicemanager during the time gap.
When an endpoint's stopGracePeriod expires, we remove the endpoint and
its resource. This allows the resource to be exported through other channels
(e.g., by directly updating node status through API server) if there is such
use case. Currently endpointStopGracePeriod is set as 5 minutes.

Given that an endpoint is no longer immediately removed upon disconnection,
mark all its devices unhealthy so that we can signal the resource allocatable
change to the scheduler to avoid scheduling more pods to the node.
When a device plugin endpoint is in stopGracePeriod, pods requesting the
corresponding resource will fail admission handler.
2018-03-09 17:00:57 -08:00
Kubernetes Submit Queue 36fd62eed8
Merge pull request #60972 from wojtek-t/fix_upgrade_test
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix upgrade tests for GKE Regional Clusters
2018-03-09 15:44:46 -08:00
Joe Betz e2a25f9b54 Bump to etcd 3.1.12 to pick up critical fix 2018-03-09 14:28:23 -08:00
Zihong Zheng 9bb962e238 [e2e service] Fix CleanupGCEResources for regional test 2018-03-09 13:29:30 -08:00
Zihong Zheng e7c673086f [e2e service] Fix gke failure: move apiserver restart validation logic into util 2018-03-09 10:56:46 -08:00
Shyam Jeedigunta 34e7a7cf06 Revert "Use quotas in default performance tests"
This reverts commit c3c10208bd.
2018-03-09 18:18:18 +01:00
Kubernetes Submit Queue 7c9293e1c3
Merge pull request #60973 from shyamjvs/revert-accidental-load-test-remove
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Revert "[Test change - don't merge] Skip load test"

This reverts commit ba6bb999f7.

This was accidentally merged as part of 60891.

/cc @wojtek-t 
/sig scalability
/kind bug
/priority important-soon

```release-note
NONE
```
2018-03-09 05:34:18 -08:00
Kubernetes Submit Queue b13105d43b
Merge pull request #60421 from gmarek/quotas
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Use quotas in default performance tests

Better to use more features in default tests if possible.

LGTM whenever you think we're ready.

```release-note
NONE
```
2018-03-09 04:39:31 -08:00
Shyam Jeedigunta 62f62fc93a Revert "[Test change - don't merge] Skip load test"
This reverts commit ba6bb999f7.
2018-03-09 13:06:50 +01:00
wojtekt 875c1a7053 Fix upgrade tests for GKE Regional Clusters 2018-03-09 12:23:29 +01:00