Automatic merge from submit-queue (batch tested with PRs 51134, 51122, 50562, 50971, 51327)
Made the tests ensure that Cluster Autoscaler is on before running.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Configuration is based on:
https://coreos.com/os/docs/latest/customizing-sshd.html
The specific SSHD config is:
# Use most defaults for sshd configuration.
UsePrivilegeSeparation sandbox
Subsystem sftp internal-sftp
ClientAliveInterval 180
UseDNS no
UsePAM yes
PrintLastLog no # handled by PAM
PrintMotd no # handled by PAM
AuthenticationMethods publickey
This will prevent security scanners from triggering.
Automatic merge from submit-queue
AllowedNotReadyNodes allowed to be not ready for absolutely *any* reason
It's as good as we allow those many nodes to be not part of the cluster at all, ever.
Btw - currently our 5k-node correctness test fails if "kubelet stopped posting node status" or "route not created", etc (ref: https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scale-correctness/3/build-log.txt)
cc @kubernetes/sig-scalability-misc
Automatic merge from submit-queue (batch tested with PRs 51244, 50559, 49770, 51194, 50901)
Distribute pods efficiently in CA scalability tests
**What this PR does / why we need it**:
Instead of using runReplicatedPodOnEachNode method
which is suited to a small number of nodes,
distribute pods on the nodes with desired load
using RCs that eat up all the space we want to be
empty after distribution.
**Release note**:
```
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50213, 50707, 49502, 51230, 50848)
StatefulSet: Deflake e2e `kubectl exec` commands.
This may help with another source of flakiness found while investigating #48031.
We seem to get a lot of flakes due to "connection refused" while running `kubectl exec`. I can't find any reason this would be caused by the test flow, so I'm adding retries to see if that helps.
Automatic merge from submit-queue (batch tested with PRs 51224, 51191, 51158, 50669, 51222)
Enable overlay2 on cos-m60 in node e2e tests
Ref: https://github.com/kubernetes/kubernetes/issues/42926
- Restart docker with `-s overlay2` in cloud-init before running all node e2e tests. I have to copy the systemd unit file to `/etc/systemd/system` because the `/usr/lib/systemd/system/` is read only.
- Updated node e2e tests to use the new cos-m60 image.
- The name of the cloud init file (`cos-init-live-restore.yaml`) does not indicate overlay2 will be enabled, but I can't just change the name in this PR, since it's referenced in test-infra.
**Release note**:
```
None
```
/assign @Random-Liu
Automatic merge from submit-queue (batch tested with PRs 51224, 51191, 51158, 50669, 51222)
StatefulSet: Deflake e2e "restart" phase.
This addresses another source of flakiness found while investigating #48031.
The test used to scale the StatefulSet down to 0, wait for ListPods to return 0 matching Pods, and then scale the StatefulSet back up.
This was prone to a race in which StatefulSet was told to scale back up before it had observed its own deletion of the last Pod, as evidenced by logs showing the creation of Pod ss-1 prior to the creation of the replacement Pod ss-0.
Instead, we now wait for the controller to observe all deletions before scaling it back up. This should fix flakes of the form:
```
Too many pods scheduled, expected 1 got 2
```
We seem to get a lot of flakes due to "connection refused" while running
`kubectl exec`. I can't find any reason this would be caused by the test
flow, so I'm adding retries to see if that helps.
Instead of using runReplicatedPodOnEachNode method
which is suited to a small number of nodes,
distribute pods on the nodes with desired load
using RCs that eat up all the space we want to be
empty after distribution.
Automatic merge from submit-queue (batch tested with PRs 51193, 51154, 42689, 51189, 51200)
Re-enable OIR e2e tests.
Re-enabling test skeleton for opaque integer resources originally submitted as part of #41870. The e2e was disabled since it was flaky. This is the first step toward re-enabling them. Currently all cases are skipped, so this exercises only the BeforeEach behavior and the deferred removal of OIRs from a node.
cc @timothysc
Automatic merge from submit-queue (batch tested with PRs 51108, 51035, 50539, 51160, 50947)
Auto-calculate CLUSTER_IP_RANGE based on cluster size
In preparation for eliminating CLUSTER_IP_RANGE env var from job configs, making it less error prone while folks try to start their own large cluster tests (https://github.com/kubernetes/kubernetes/issues/50907).
/cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek
Automatic merge from submit-queue (batch tested with PRs 51113, 46597, 50397, 51052, 51166)
Add statefulset upgrade tests to cluster_upgrade
**What this PR does / why we need it**:
Adds already created statefulset upgrade tests to cluster_upgrade.go. With further test infra changes, this will allow them to be continuously run, giving better signals.
Detect and prevent issues like https://github.com/kubernetes/kubernetes/issues/48327
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51113, 46597, 50397, 51052, 51166)
implement proposal 34058: hostPath volume type
**What this PR does / why we need it**:
implement proposal #34058
**Which issue this PR fixes** : fixes#46549
**Special notes for your reviewer**:
cc @thockin @luxas @euank PTAL
Automatic merge from submit-queue (batch tested with PRs 50489, 51070, 51011, 51022, 51141)
update to rbac v1 in yaml file
**What this PR does / why we need it**:
ref to https://github.com/kubernetes/kubernetes/pull/49642
ref https://github.com/kubernetes/features/issues/2
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
cc @liggitt
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Skip "Simple pod should support exec through kubectl proxy" test
As reported in https://github.com/kubernetes/kubernetes/issues/50466,
this test doesn't work in GKE because it uses a bearer token and the feature only works with client certs.
As the feature that is broken in GKE is new and didn't work before, it
is safe to juste ignore the test and consider the feature as "still not
working" in GKE.
**What this PR does / why we need it**: Fixes the broken test in https://k8s-testgrid.appspot.com/release-master-blocking#gke
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: works-around #50466
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
The test used to scale the StatefulSet down to 0, wait for ListPods to
return 0 matching Pods, and then scale the StatefulSet back up.
This was prone to a race in which StatefulSet was told to scale back up
before it had observed its own deletion of the last Pod, as evidenced by
logs showing the creation of Pod ss-1 prior to the creation of the
replacement Pod ss-0.
We now wait for the controller to observe all deletions before
scaling it back up. This should fix flakes of the form:
```
Too many pods scheduled, expected 1 got 2
```
Automatic merge from submit-queue (batch tested with PRs 50257, 50247, 50665, 50554, 51077)
Replace hard-code "cpu" and "memory" to consts
**What this PR does / why we need it**:
There are many places using hard coded "cpu" and "memory" as resource name. This PR replace them to consts.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
/kind cleanup
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50980, 46902, 51051, 51062, 51020)
Remove seemingly obsolete binaries
It's hard to tell if these are safe to remove. Let CI tell me.
Automatic merge from submit-queue (batch tested with PRs 51039, 50512, 50546, 50965, 50467)
Kubectl: Plumb openapi validation (disabled by default)
**What this PR does / why we need it**: Creates a new flag '--openapi' and plumb in the validation code so that it can be used by default to validate objects against the openapi schema.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: partially https://github.com/kubernetes/kubectl/issues/49
**Special notes for your reviewer**:
This is not complete, the name of the variable must change for example.
**Release note**:
```release-note
Kubectl uses openapi for validation. If OpenAPI is not available on the server, it defaults back to the old Swagger.
```
Automatic merge from submit-queue
StatefulSet: Deflake e2e "Saturate" phase.
This should reduce one source of flakiness found while investigating #48031.
The "Saturate" phase of StatefulSet e2e tests verifies orderly startup by controlling when each Pod is allowed to report Ready. If a Pod unexepectedly goes down during the test, the replacement Pod
created by the controller will forget if it was already allowed to report Ready.
After this change, the signal that allows each Pod to report Ready is persisted in the Pod's PVC. Thus, the replacement Pod will remember that it was already told to proceed to a Ready state.
Automatic merge from submit-queue (batch tested with PRs 51102, 50712, 51037, 51044, 51059)
[sig-network-e2e] Remove redundant sig prefix from tests
**What this PR does / why we need it**:
Remove redundant sig prefix from:
```
[sig-network] Networking [sig-network] Granular Checks: Services [Slow] should function for endpoint-Service: http
[sig-network] Networking [sig-network] Granular Checks: Services [Slow] should function for endpoint-Service: udp
[sig-network] Networking [sig-network] Granular Checks: Services [Slow] should function for node-Service: http
[sig-network] Networking [sig-network] Granular Checks: Services [Slow] should function for node-Service: udp
[sig-network] Networking [sig-network] Granular Checks: Services [Slow] should function for pod-Service: http
[sig-network] Networking [sig-network] Granular Checks: Services [Slow] should function for pod-Service: udp
[sig-network] Networking [sig-network] Granular Checks: Services [Slow] should update endpoints: http
[sig-network] Networking [sig-network] Granular Checks: Services [Slow] should update endpoints: udp
[sig-network] Networking [sig-network] Granular Checks: Services [Slow] should update nodePort: http [Slow]
[sig-network] Networking [sig-network] Granular Checks: Services [Slow] should update nodePort: udp [Slow]
[sig-network] Loadbalancing: L7 [sig-network] GCE [Slow] [Feature:Ingress] should conform to Ingress spec
[sig-network] Loadbalancing: L7 [sig-network] GCE [Slow] [Feature:Ingress] should create ingress with given static-ip
```
Umbrella issue #49161
**Special notes for your reviewer**:
cc @xiangpengzhao
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50967, 50505, 50706, 51033, 51028)
Fix GC integration test race
During TestCreateWithNonExistentOwner, when creating a pod with a
non-existent owner, assume it's possible the pod will be deleted before
we start checking for the pod's existence. Assuming that the pod still
exists immediately after Create returns is flaky if the GC reacts very
quickly.
```release-note
NONE
```
Might fix https://github.com/kubernetes/kubernetes/issues/50943; without the additional test context provided by this PR, it's not entirely possible to assess the root cause of the reported failure (as we don't know whether the original assertion failure was due to there being 0 or >1 pods).
/cc @caesarxuchao
Automatic merge from submit-queue (batch tested with PRs 50967, 50505, 50706, 51033, 51028)
Revert "Merge pull request #51008 from kubernetes/revert-50789-fix-scheme"
I'm spinning up a cluster right now to test this fix, but I'm pretty sure this was the problem.
There doesn't seem to be a way to confirm from logs, because AFAICT the logs from the hollow kubelet containers are not collected as part of the kubemark test.
**What this PR does / why we need it**:
This reverts commit f4afdecef8, reversing
changes made to e633a1604f.
This also fixes a bug where Kubemark was still using the core api scheme
to manipulate the Kubelet's types, which was the cause of the initial
revert.
**Which issue this PR fixes**: fixes#51007
**Release note**:
```release-note
NONE
```
/cc @shyamjvs @wojtek-t
As reported in https://github.com/kubernetes/kubernetes/issues/50466,
this test doesn't work in GKE because the transport layer doesn't work
with dialing.
As the feature that is broken in GKE is new and didn't work before, it
is safe to juste ignore the test and consider the feature as "still not
working" in GKE.
Automatic merge from submit-queue
Should generate files before scheduler perf
**What this PR does / why we need it**:
For a newly cloned project, generated files are not included. Then scheduler_perf will fail:
```
undefined: openapi.GetOpenAPIDefinitions
```
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
fixes: #51090
**Special notes for your reviewer**:
Automatic merge from submit-queue (batch tested with PRs 50893, 50913, 50963, 50629, 50640)
Increase latency threshold for list api calls
This is only a short-term solution to make our density test green. In the long-term, we should measure as per our new SLIs.
From @wojtek-t's [doc](https://docs.google.com/document/d/1Q5qxdeBPgTTIXZxdsFILg7kgqWhvOwY8uROEf0j5YBw) on the new SLIs/SLOs, we have the following SLO for list calls:
```
SLO1: In default Kubernetes installation, 99th percentile of SLI2 per cluster-day:
<= 1s if total number of objects of the same type as resource in the system <= X
<= 5s if total number of objects of the same type as resource in the system <= Y
<= 30s if total number of objects of the same types as resource in the system <= Z
```
I would guess that 170,000 pods would fall into the 2nd bracket (at least) and hence the new value of 5s. WDYT?
cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek
Automatic merge from submit-queue
Revert #50362.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: part of #50884
**Release note**:
```release-note
None
```
Automatic merge from submit-queue (batch tested with PRs 50693, 50831, 47506, 49119, 50871)
Add instance metadata from flag even when using image config.
Also add instance metadata from flag even when we are using image config.
* Sometimes we need to dynamically generate instance metadata, it's troublesome to put them into image config.
* Sometimes we want to apply instance metadata to all images, it's duplicated to add them to each image in the image config.
/assign @yguo0905 Could you help me review this?
The "Saturate" phase of StatefulSet e2e tests verifies orderly startup
by controlling when each Pod is allowed to report Ready.
If a Pod unexepectedly goes down during the test, the replacement Pod
created by the controller will forget if it was already allowed to
report Ready.
After this change, the signal that allows each Pod to report Ready is
persisted in the Pod's PVC. Thus, the replacement Pod will remember that
it was already told to proceed to a Ready state.
This reverts commit f4afdecef8, reversing
changes made to e633a1604f.
This also fixes a bug where Kubemark was still using the core api scheme
to manipulate the Kubelet's types, which was the cause of the initial
revert.
Automatic merge from submit-queue (batch tested with PRs 47896, 50678, 50620, 50631, 51005)
Made the difference between scale-up timeout and cluster set-up timeout explicit.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
During TestCreateWithNonExistentOwner, when creating a pod with a
non-existent owner, assume it's possible the pod will be deleted before
we start checking for the pod's existence. Assuming that the pod still
exists immediately after Create returns is flaky if the GC reacts very
quickly.
Automatic merge from submit-queue (batch tested with PRs 46512, 50146)
Make metav1.(Micro)?Time functions take pointers
Is there any reason for those functions not to be on pointers?
Automatic merge from submit-queue (batch tested with PRs 50904, 50691)
Stackdriver Logging e2e: Explicitly check for docker and kubelet logs presence
Check for kubelet and docker logs explicitly in the Stackdriver Logging e2e tests
Automatic merge from submit-queue
Add enj to OWNERS for test/integration/etcd/etcd_storage_path_test.go
@deads2k is the bot smart enough to not spam me with every test change? Perhaps I should create an `OWNERS` file in `test/integration/etcd`?
**Release note**:
```release-note
NONE
```
@kubernetes/sig-api-machinery-pr-reviews
Automatic merge from submit-queue
CollisionCount should have type int32 across controllers that use it for collision avoidance
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#50530
**Special notes for your reviewer**:
/cc @liyinan926
/assign @kow3ns @thockin @janetkuo
**Release note**:
```release-note
Change CollisionCount from int64 to int32 across controllers
```
Automatic merge from submit-queue (batch tested with PRs 50277, 50823, 50376, 50867)
Move e2e taints test file to sig-scheduling
**What this PR does / why we need it**:
Move taint test file to e2e scheduling and add sig-scheduling prefix.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Ref Umbrella issue #49161
**Special notes for your reviewer**:
**Release note**:
none
Automatic merge from submit-queue
Change API version of statefulset scale subresource e2e test to v1beta2
**What this PR does / why we need it**:
This PR changes API version of statefulset scale subresource e2e test from `v1beta1` to `v1beta2`.
`apps/v1beta2` has been enabled.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: xref #50109
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50281, 50747, 50347, 50834, 50852)
Add e2e aggregator test.
What this PR does / why we need it:
This adds an e2e test for aggregation based on the sample-apiserver.
Currently is uses a sample-apiserver built as of 1.7.
This should ensure that the aggregation system works end-to-end.
It will also help detect if we break "old" extension api servers.
Which issue this PR fixes (optional, in fixes #<issue number>(, fixes
fixes#43714
**Special notes for your reviewer**:
**Release note**: NONE
Automatic merge from submit-queue (batch tested with PRs 50563, 50698, 50796)
Add ControllerRevision to apps/v1beta2
**What this PR does / why we need it**:
This PR added `ControllerRevision` currently in `apps/v1beta1` to `apps/v1beta2`.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#50696.
**Special notes for your reviewer**:
@kow3ns @janetkuo
**Release note**:
```release-note
Add ControllerRevision to apps/v1beta2
```
What this PR does / why we need it:
This adds an e2e test for aggregation based on the sample-apiserver.
Currently is uses a sample-apiserver built as of 1.7.
This should ensure that the aggregation system works end-to-end.
It will also help detect if we break "old" extension api servers.
Which issue this PR fixes (optional, in fixes #<issue number>(, fixes
fixes#43714
Fixed bazel for the change.
Fixed # of args issue from govet.
Added code to test dynamic.Client.
Automatic merge from submit-queue
Migrate sig-apimachinery and sig-servicecatalog e2e tests
**What this PR does / why we need it**:
Migrate sig-apimachinery and sig-servicecatalog e2e tests
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Ref Umbrella issue #49161
1. Move generated_clientset.go to sig-apimachinary
2. Move podpreset.go to sig-servicecatalog by creating new directory.
**Special notes for your reviewer**:
**Release note**:
none
/cc @liggitt
Automatic merge from submit-queue (batch tested with PRs 50550, 50768)
Don't SSH to master for metrics in case of GKE
cc @kubernetes/sig-scalability-misc @crassirostris
Automatic merge from submit-queue
add some e2e for node authz
**What this PR does / why we need it**:
fix#47174
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#47174
**Special notes for your reviewer**:
**Release note**:
```
None
```
Automatic merge from submit-queue
Promote CronJobs to batch/v1beta1 - just the API
This PR promotes CronJobs to beta.
@erictune @kubernetes/sig-apps-api-reviews @kubernetes/api-approvers ptal
This builds on top of #41890 and needs #40932 as well
```release-note
Promote CronJobs to batch/v1beta1.
```
Automatic merge from submit-queue (batch tested with PRs 50711, 50742, 50204)
fix panic in e2e
**What this PR does / why we need it**:
fix#50660
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
no
**Release note**:
```release-note
none
```
Automatic merge from submit-queue (batch tested with PRs 50670, 50332)
e2e test for local storage mount point
**What this PR does / why we need it**:
We discovered that kubernetes can treat local directories and actual mountpoints differently. For example, https://github.com/kubernetes/kubernetes/issues/48331. The current local storage e2e tests use directories.
This PR introduces a test that creates a tmpfs and mounts it, and runs one of the local storage e2e tests.
**Which issue this PR fixes**: fixes https://github.com/kubernetes/kubernetes/issues/49126
**Special notes for your reviewer**:
I cherrypicked PR https://github.com/kubernetes/kubernetes/pull/50177, since local storage e2e tests are broken in master on 2017-08-08 due to "no such host" error. This PR replaces NodeExec with SSH commands.
You can run the tests using the following commands:
```
$ NUM_NODES=1 KUBE_FEATURE_GATES="PersistentLocalVolumes=true" go run hack/e2e.go -- -v --up
$ go run hack/e2e.go -- -v --test --test_args="--ginkgo.focus=\[Feature:LocalPersistentVolumes\]"
```
Here are the summary of results from my test run:
```
Ran 9 of 651 Specs in 387.905 seconds
SUCCESS! -- 9 Passed | 0 Failed | 0 Pending | 642 Skipped PASS
Ginkgo ran 1 suite in 6m29.369318483s
Test Suite Passed
2017/08/08 11:54:01 util.go:133: Step './hack/ginkgo-e2e.sh --ginkgo.focus=\[Feature:LocalPersistentVolumes\]' finished in 6m32.077462612s
```
**Release note**:
`NONE`
Automatic merge from submit-queue (batch tested with PRs 50198, 49051, 48432)
move KubeletConfiguration out of componentconfig API group
I'm splitting #44252 into more manageable steps. This step moves the types and updates references.
To reviewers: the most important changes are the removals from pkg/apis/componentconfig and additions to pkg/kubelet/apis/kubeletconfig. Almost everything else is an import or name update.
I have one unanswered question: Should I create a whole new api scheme for Kubelet APIs rather than register e.g. a kubeletconfig group with the default runtime.Scheme instance? This feels like the right thing, as the Kubelet should be exposing its own API, but there's a big fat warning not to do this in `pkg/api/register.go`. Can anyone answer this?
Automatic merge from submit-queue (batch tested with PRs 50198, 49051, 48432)
Add prefix to common networking e2e tests
**What this PR does / why we need it**:
Common networking e2e tests shared by node and cluster suites should also have prefix `[sig-network]`.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Umbrella issue #49161
**Special notes for your reviewer**:
/cc @bowei
**Release note**:
```release-note
NONE
```