Commit Graph

5554 Commits (22b2da2378081daf6f4d4f4c2e4684963dd55445)

Author SHA1 Message Date
Shyam Jeedigunta 240a1ae5ab Make threshold for glbc mem-usage scale with nodes in density test 2017-08-28 13:24:24 +02:00
Kubernetes Submit Queue 877ee91930 Merge pull request #51082 from caesarxuchao/repair-null-pending-initializer
Automatic merge from submit-queue (batch tested with PRs 50953, 51082)

Fix mergekey of initializers; Repair invalid update of initializers

Fix https://github.com/kubernetes/kubernetes/issues/51131

The PR did two things to make parallel patching `metadata.initializers.pending` possible:
* Add mergekey to initializers.pending
* Let the initializer admission plugin set the `metadata.intializers` to nil if an update makes the `pending` and the `result` both nil, instead of returning a validation error. Otherwise if multiple initializer controllers sending the patch removing themselves from `pending` at the same time, one of them will get a validation error.


```release-note
The patch to remove the last initializer from metadata.initializer.pending will result in metadata.initializer to be set to nil (assuming metadata.initializer.result is also nil), instead of resulting in an validation error.
```
2017-08-26 23:03:01 -07:00
Antoine Pelisse 281630b0b0 Revert "Re-enable OIR e2e tests." 2017-08-26 13:09:21 -07:00
Kubernetes Submit Queue 4b7135513f Merge pull request #51382 from nicksardo/revert-51038-gce-netproj
Automatic merge from submit-queue (batch tested with PRs 51174, 51363, 51087, 51382, 51388)

Revert "GCE: Consume new config value for network project id"

Reverts kubernetes/kubernetes#51038

Broke GKE tests
2017-08-26 06:43:33 -07:00
Kubernetes Submit Queue 1e5d85a0bb Merge pull request #51174 from caesarxuchao/fix-resourcequota
Automatic merge from submit-queue

Let the quota evaluator handle mutating specs of pod & pvc

### Background
The final goal is to address https://github.com/kubernetes/kubernetes/issues/47837, which aims to allow more mutation for uninitialized objects.

To do that, we [decided](https://github.com/kubernetes/kubernetes/issues/47837#issuecomment-321462433) to let the admission controllers to handle mutation of uninitialized objects.

### Issue
#50399 attempted to fix all admission controllers so that can handle mutating uninitialized objects. It was incomplete. I didn't realize although the resourcequota admission plugin handles the update operation, the underlying evaluator didn't. This PR updated the evaluators to handle updates of uninitialized pods/pvc.

### TODO
We still miss another piece. The [quota replenish controller](https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/resourcequota/replenishment_controller.go) uses the sharedinformer, which doesn't observe the deletion of uninitialized pods at the moment. So there is a quota leak if a pod is deleted before it's initialized. It will be addressed with https://github.com/kubernetes/kubernetes/issues/48893.
2017-08-26 06:07:29 -07:00
Kubernetes Submit Queue b65d665b99 Merge pull request #51264 from m1093782566/e2e-maxTries
Automatic merge from submit-queue (batch tested with PRs 50889, 51347, 50582, 51297, 51264)

Fix e2e network util wrong output message

**What this PR does / why we need it**:

See https://github.com/kubernetes/kubernetes/blob/master/test/e2e/framework/networking_utils.go#L217

and 

https://github.com/kubernetes/kubernetes/blob/master/test/e2e/framework/networking_utils.go#L273

I assume it should be `minTries` -> `MaxTries`

This PR fixes the wrong output message.

**Which issue this PR fixes**: fixes #51265

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2017-08-25 22:43:37 -07:00
Kubernetes Submit Queue 65da3ce246 Merge pull request #51235 from cheftako/aggregator
Automatic merge from submit-queue

Fixed gke auth update wait condition.

Lookup whoami on gke using gcloud auth list.
Make sure we do not run the test on any cluster older than 1.7.

**What this PR does / why we need it**: Fixes issue with aggregator e2e test on GKE

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #50945 

**Special notes for your reviewer**: There is a TODO, follow up will be provided when the immediate problem is resolved.

**Release note**: ```release-note
NONE
```
2017-08-25 18:52:46 -07:00
Nick Sardo 0d55f6bdcb Revert "GCE: Consume new config value for network project id" 2017-08-25 18:02:10 -07:00
Kubernetes Submit Queue a235ba4e49 Merge pull request #51327 from wasylkowski/ensure-ca-is-on
Automatic merge from submit-queue (batch tested with PRs 51134, 51122, 50562, 50971, 51327)

Made the tests ensure that Cluster Autoscaler is on before running.

**What this PR does / why we need it**:

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2017-08-25 14:01:36 -07:00
Walter Fender 3b9485bba3 Fixed gke auth update wait condition.
Lookup whoami on gke using gcloud auth list.
Make sure we do not run the test on any cluster older than 1.7.
Fix for Mehdy
Fixes for LavaLamp
2017-08-25 11:11:59 -07:00
Nick Sardo 0028385e20 Consume new config value for network project id 2017-08-25 08:42:28 -07:00
Kubernetes Submit Queue 11299e363c Merge pull request #51282 from shyamjvs/new-allowed-not-ready-semantics
Automatic merge from submit-queue

AllowedNotReadyNodes allowed to be not ready for absolutely *any* reason

It's as good as we allow those many nodes to be not part of the cluster at all, ever.

Btw - currently our 5k-node correctness test fails if "kubelet stopped posting node status" or "route not created", etc (ref: https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scale-correctness/3/build-log.txt)

cc @kubernetes/sig-scalability-misc
2017-08-25 05:00:32 -07:00
Kubernetes Submit Queue d1783e0bd6 Merge pull request #51194 from bskiba/run_on_each_node
Automatic merge from submit-queue (batch tested with PRs 51244, 50559, 49770, 51194, 50901)

Distribute pods efficiently in CA scalability tests

**What this PR does / why we need it**:
Instead of using runReplicatedPodOnEachNode method
which is suited to a small number of nodes,
distribute pods on the nodes with desired load
using RCs that eat up all the space we want to be
empty after distribution.

**Release note**:
```
NONE
```
2017-08-25 04:11:13 -07:00
Andrzej Wasylkowski 6e1fbf32b0 Made the tests ensure that Cluster Autoscaler is on before running. 2017-08-25 11:11:38 +02:00
Kubernetes Submit Queue 81363abc20 Merge pull request #51230 from enisoc/sts-deflake-exec
Automatic merge from submit-queue (batch tested with PRs 50213, 50707, 49502, 51230, 50848)

StatefulSet: Deflake e2e `kubectl exec` commands.

This may help with another source of flakiness found while investigating #48031.

We seem to get a lot of flakes due to "connection refused" while running `kubectl exec`. I can't find any reason this would be caused by the test flow, so I'm adding retries to see if that helps.
2017-08-25 01:10:35 -07:00
Kubernetes Submit Queue ce3e2d9b10 Merge pull request #51224 from enisoc/sts-deflake-restart
Automatic merge from submit-queue (batch tested with PRs 51224, 51191, 51158, 50669, 51222)

StatefulSet: Deflake e2e "restart" phase.

This addresses another source of flakiness found while investigating #48031.

The test used to scale the StatefulSet down to 0, wait for ListPods to return 0 matching Pods, and then scale the StatefulSet back up.

This was prone to a race in which StatefulSet was told to scale back up before it had observed its own deletion of the last Pod, as evidenced by logs showing the creation of Pod ss-1 prior to the creation of the replacement Pod ss-0.

Instead, we now wait for the controller to observe all deletions before scaling it back up. This should fix flakes of the form:

```
Too many pods scheduled, expected 1 got 2
```
2017-08-24 22:59:28 -07:00
Chao Xu 4928c8d1bf let resourcequota evaluator handle uninitialid pod and pvc 2017-08-24 14:50:03 -07:00
Anthony Yeh 05d6c8a6c2
StatefulSet: Deflake e2e `kubectl exec` commands.
We seem to get a lot of flakes due to "connection refused" while running
`kubectl exec`. I can't find any reason this would be caused by the test
flow, so I'm adding retries to see if that helps.
2017-08-24 11:42:05 -07:00
Chao Xu fcd646d80e Let the initializer admission plugin set the metadata.intializers to nil
if an update makes the pendings and the result both nil
2017-08-24 11:23:51 -07:00
Shyam Jeedigunta b374416807 AllowedNotReadyNodes allowed to be not ready for absolutely *any* reason 2017-08-24 19:39:26 +02:00
Beata Skiba 6e08007ce1 Distribute pods efficiently in CA scalability tests
Instead of using runReplicatedPodOnEachNode method
which is suited to a small number of nodes,
distribute pods on the nodes with desired load
using RCs that eat up all the space we want to be
empty after distribution.
2017-08-24 15:21:32 +02:00
m1093782566 b8edd9b885 fix e2e network wrong output message 2017-08-24 19:39:42 +08:00
Kubernetes Submit Queue ce3b118959 Merge pull request #42689 from intelsdi-x/enable-oir-e2e
Automatic merge from submit-queue (batch tested with PRs 51193, 51154, 42689, 51189, 51200)

Re-enable OIR e2e tests.

Re-enabling test skeleton for opaque integer resources originally submitted as part of #41870. The e2e was disabled since it was flaky. This is the first step toward re-enabling them. Currently all cases are skipped, so this exercises only the BeforeEach behavior and the deferred removal of OIRs from a node.

cc @timothysc
2017-08-24 04:38:07 -07:00
Kubernetes Submit Queue db928095a0 Merge pull request #50947 from shyamjvs/clusterIpRange-ginkgo
Automatic merge from submit-queue (batch tested with PRs 51108, 51035, 50539, 51160, 50947)

Auto-calculate CLUSTER_IP_RANGE based on cluster size

In preparation for eliminating CLUSTER_IP_RANGE env var from job configs, making it less error prone while folks try to start their own large cluster tests (https://github.com/kubernetes/kubernetes/issues/50907).

/cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek
2017-08-24 02:32:14 -07:00
Kubernetes Submit Queue 14cc8cdfa4 Merge pull request #50397 from bdbauer/statefulTesting
Automatic merge from submit-queue (batch tested with PRs 51113, 46597, 50397, 51052, 51166)

Add statefulset upgrade tests to cluster_upgrade

**What this PR does / why we need it**:
Adds already created statefulset upgrade tests to cluster_upgrade.go. With further test infra changes, this will allow them to be continuously run, giving better signals.

Detect and prevent issues like https://github.com/kubernetes/kubernetes/issues/48327

**Release note**:

```release-note
NONE
```
2017-08-23 23:16:30 -07:00
Kubernetes Submit Queue c041567b5a Merge pull request #46597 from dixudx/implement_proposal_34058
Automatic merge from submit-queue (batch tested with PRs 51113, 46597, 50397, 51052, 51166)

implement proposal 34058: hostPath volume type

**What this PR does / why we need it**:
implement proposal #34058

**Which issue this PR fixes** : fixes #46549

**Special notes for your reviewer**:
cc @thockin @luxas @euank PTAL
2017-08-23 23:16:27 -07:00
Kubernetes Submit Queue ea3a8a7570 Merge pull request #51047 from apelisse/remove-gke-test
Automatic merge from submit-queue

Skip "Simple pod should support exec through kubectl proxy" test

As reported in https://github.com/kubernetes/kubernetes/issues/50466,
this test doesn't work in GKE because it uses a bearer token and the feature only works with client certs.

As the feature that is broken in GKE is new and didn't work before, it
is safe to juste ignore the test and consider the feature as "still not
working" in GKE.

**What this PR does / why we need it**: Fixes the broken test in https://k8s-testgrid.appspot.com/release-master-blocking#gke

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: works-around #50466

**Special notes for your reviewer**:

**Release note**:
```release-note
NONE
```
2017-08-23 17:41:58 -07:00
Anthony Yeh ce3fad326f
StatefulSet: Deflake e2e "restart" phase.
The test used to scale the StatefulSet down to 0, wait for ListPods to
return 0 matching Pods, and then scale the StatefulSet back up.

This was prone to a race in which StatefulSet was told to scale back up
before it had observed its own deletion of the last Pod, as evidenced by
logs showing the creation of Pod ss-1 prior to the creation of the
replacement Pod ss-0.

We now wait for the controller to observe all deletions before
scaling it back up. This should fix flakes of the form:

```
Too many pods scheduled, expected 1 got 2
```
2017-08-23 15:08:58 -07:00
Kubernetes Submit Queue 178a5ff314 Merge pull request #50665 from xiangpengzhao/hardcode-to-const
Automatic merge from submit-queue (batch tested with PRs 50257, 50247, 50665, 50554, 51077)

Replace hard-code "cpu" and "memory" to consts

**What this PR does / why we need it**:
There are many places using hard coded "cpu" and "memory" as resource name. This PR replace them to consts.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:
/kind cleanup

**Release note**:

```release-note
NONE
```
2017-08-23 02:35:09 -07:00
Di Xu 6f74af94ef update e2e tests and yaml files 2017-08-23 14:05:21 +08:00
Kubernetes Submit Queue a44e538dbc Merge pull request #51039 from enisoc/deflake-sts-saturate
Automatic merge from submit-queue

StatefulSet: Deflake e2e "Saturate" phase.

This should reduce one source of flakiness found while investigating #48031.

The "Saturate" phase of StatefulSet e2e tests verifies orderly startup by controlling when each Pod is allowed to report Ready. If a Pod unexepectedly goes down during the test, the replacement Pod
created by the controller will forget if it was already allowed to report Ready.

After this change, the signal that allows each Pod to report Ready is persisted in the Pod's PVC. Thus, the replacement Pod will remember that it was already told to proceed to a Ready state.
2017-08-22 21:13:13 -07:00
Kubernetes Submit Queue 36b5e0eca6 Merge pull request #51037 from MrHohn/sig-network-e2e-fix-describe
Automatic merge from submit-queue (batch tested with PRs 51102, 50712, 51037, 51044, 51059)

[sig-network-e2e] Remove redundant sig prefix from tests

**What this PR does / why we need it**:
Remove redundant sig prefix from:
```
[sig-network] Networking [sig-network] Granular Checks: Services [Slow] should function for endpoint-Service: http
[sig-network] Networking [sig-network] Granular Checks: Services [Slow] should function for endpoint-Service: udp
[sig-network] Networking [sig-network] Granular Checks: Services [Slow] should function for node-Service: http
[sig-network] Networking [sig-network] Granular Checks: Services [Slow] should function for node-Service: udp
[sig-network] Networking [sig-network] Granular Checks: Services [Slow] should function for pod-Service: http
[sig-network] Networking [sig-network] Granular Checks: Services [Slow] should function for pod-Service: udp
[sig-network] Networking [sig-network] Granular Checks: Services [Slow] should update endpoints: http
[sig-network] Networking [sig-network] Granular Checks: Services [Slow] should update endpoints: udp
[sig-network] Networking [sig-network] Granular Checks: Services [Slow] should update nodePort: http [Slow]
[sig-network] Networking [sig-network] Granular Checks: Services [Slow] should update nodePort: udp [Slow]
[sig-network] Loadbalancing: L7 [sig-network] GCE [Slow] [Feature:Ingress] should conform to Ingress spec
[sig-network] Loadbalancing: L7 [sig-network] GCE [Slow] [Feature:Ingress] should create ingress with given static-ip
```

Umbrella issue #49161

**Special notes for your reviewer**:
cc @xiangpengzhao 

**Release note**:

```release-note
NONE
```
2017-08-22 12:28:02 -07:00
Antoine Pelisse 3bc6ceac38 Skip "Simple pod should support exec through kubectl proxy" test
As reported in https://github.com/kubernetes/kubernetes/issues/50466,
this test doesn't work in GKE because the transport layer doesn't work
with dialing.

As the feature that is broken in GKE is new and didn't work before, it
is safe to juste ignore the test and consider the feature as "still not
working" in GKE.
2017-08-22 10:30:16 -07:00
Kubernetes Submit Queue fdf14b8218 Merge pull request #50913 from shyamjvs/list-call-slo
Automatic merge from submit-queue (batch tested with PRs 50893, 50913, 50963, 50629, 50640)

Increase latency threshold for list api calls

This is only a short-term solution to make our density test green. In the long-term, we should measure as per our new SLIs.
From @wojtek-t's [doc](https://docs.google.com/document/d/1Q5qxdeBPgTTIXZxdsFILg7kgqWhvOwY8uROEf0j5YBw) on the new SLIs/SLOs, we have the following SLO for list calls:

```
SLO1: In default Kubernetes installation, 99th percentile of SLI2 per cluster-day:
<= 1s if total number of objects of the same type as resource in the system <= X
<= 5s if total number of objects of the same type as resource in the system <= Y
<= 30s if total number of objects of the same types as resource in the system <= Z
```

I would guess that 170,000 pods would fall into the 2nd bracket (at least) and hence the new value of 5s. WDYT?

cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek
2017-08-22 05:31:07 -07:00
Anthony Yeh 3bc7676024
StatefulSet: Deflake e2e "Saturate" phase.
The "Saturate" phase of StatefulSet e2e tests verifies orderly startup
by controlling when each Pod is allowed to report Ready.
If a Pod unexepectedly goes down during the test, the replacement Pod
created by the controller will forget if it was already allowed to
report Ready.

After this change, the signal that allows each Pod to report Ready is
persisted in the Pod's PVC. Thus, the replacement Pod will remember that
it was already told to proceed to a Ready state.
2017-08-21 13:52:15 -07:00
Zihong Zheng e5349e8a90 [sig-network-e2e] Remove redundant sig prefix from tests 2017-08-21 11:17:02 -07:00
Kubernetes Submit Queue b2b079b95a Merge pull request #51005 from wasylkowski/preparation-timeout
Automatic merge from submit-queue (batch tested with PRs 47896, 50678, 50620, 50631, 51005)

Made the difference between scale-up timeout and cluster set-up timeout explicit.

**What this PR does / why we need it**:

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2017-08-21 08:26:29 -07:00
Shyam Jeedigunta bacc01f729 Auto-calculate CLUSTER_IP_RANGE based on no. of nodes 2017-08-21 14:21:43 +02:00
Andrzej Wasylkowski 6a41614342 Made the difference between scale-up timeout and cluster set-up timeout explicit. 2017-08-21 13:21:50 +02:00
Kubernetes Submit Queue b59ad9cbff Merge pull request #50146 from gmarek/deepcopyinto
Automatic merge from submit-queue (batch tested with PRs 46512, 50146)

Make metav1.(Micro)?Time functions take pointers

Is there any reason for those functions not to be on pointers?
2017-08-19 11:28:15 -07:00
Shyam Jeedigunta 70123e71bb Increase latency threshold for list api calls 2017-08-19 00:55:35 +02:00
Benjamin Bauer 014b765988 Refactor cluster_upgrade to include statefulset upgrade tests. 2017-08-18 10:27:47 -07:00
Kubernetes Submit Queue 26eb7c94ea Merge pull request #50904 from crassirostris/sd-logging-e2e-system-logs
Automatic merge from submit-queue (batch tested with PRs 50904, 50691)

Stackdriver Logging e2e: Explicitly check for docker and kubelet logs presence

Check for kubelet and docker logs explicitly in the Stackdriver Logging e2e tests
2017-08-18 07:29:36 -07:00
Mik Vyatskov cd23c5d1b0 Stackdriver Logging e2e: Explicitly check for docker and kubelet logs presence 2017-08-18 14:43:04 +02:00
Kubernetes Submit Queue e553d6eb5b Merge pull request #50575 from dixudx/CollisionCount_int64_to_int32
Automatic merge from submit-queue

CollisionCount should have type int32 across controllers that use it for collision avoidance

**What this PR does / why we need it**:

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #50530

**Special notes for your reviewer**:
/cc @liyinan926
/assign @kow3ns @thockin @janetkuo 

**Release note**:

```release-note
Change CollisionCount from int64 to int32 across controllers
```
2017-08-17 23:21:11 -07:00
Kubernetes Submit Queue bf69aafde8 Merge pull request #50376 from guangxuli/add_sig_prefix_to_e2e_test
Automatic merge from submit-queue (batch tested with PRs 50277, 50823, 50376, 50867)

Move e2e taints test file to sig-scheduling

**What this PR does / why we need it**:
Move taint test file to e2e scheduling and add sig-scheduling prefix.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Ref Umbrella issue #49161

**Special notes for your reviewer**:

**Release note**:

none
2017-08-17 22:24:30 -07:00
Di Xu 85602fd542 CollisionCount should have type int32 across controllers that use it for collision avoidance 2017-08-18 10:48:12 +08:00
Kubernetes Submit Queue 12ce4151ce Merge pull request #50861 from crimsonfaith91/e2e-scale
Automatic merge from submit-queue

Change API version of statefulset scale subresource e2e test to v1beta2

**What this PR does / why we need it**:
This PR changes API version of statefulset scale subresource e2e test from `v1beta1` to `v1beta2`.
`apps/v1beta2` has been enabled.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: xref #50109

**Release note**:

```release-note
NONE
```
2017-08-17 19:00:50 -07:00
crimsonfaith91 71b477bd6b Change API version of statefulset scale subresource e2e test to v1beta2 2017-08-17 15:11:10 -07:00
Walter Fender cb28f0f34f Add e2e aggregator test.
What this PR does / why we need it:
This adds an e2e test for aggregation based on the sample-apiserver.
Currently is uses a sample-apiserver built as of 1.7.
This should ensure that the aggregation system works end-to-end.
It will also help detect if we break "old" extension api servers.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes
fixes #43714
Fixed bazel for the change.
Fixed # of args issue from govet.
Added code to test dynamic.Client.
2017-08-17 10:56:43 -07:00