For LoadBalancer type service:
- Verifies corresponding firewall rule has correct sourceRanges, ports
& protocols, target tags.
- Verifies requests can reach all expected instances.
- Verifies requests can not reach instances that are not included.
For Ingress resrouce:
- Verifies the ingress firewall rule has correct sourceRanges, target
tags and tcp ports.
For general e2e cluster:
- Verifies all required firewall rules has correct sourceRange, ports
& protocols, source tags and target tags.
- Verifies well know ports on master and nodes are not
exposed externally
Automatic merge from submit-queue
Fix Recreate for Deployments and stop using events in e2e tests
Fixes https://github.com/kubernetes/kubernetes/issues/36453 by removing events from the deployment tests. The test about events during a Rolling deployment is redundant so I just removed it (we already have another test specifically for Rolling deployments).
Closes https://github.com/kubernetes/kubernetes/issues/32567 (preferred to use pod LISTs instead of a new status API field for replica sets that would add many more writes to replica sets).
@kubernetes/deployment
Automatic merge from submit-queue (batch tested with PRs 38830, 38750)
[Federation] Stop cleaning federation namespace in e2e tests
when --clean-start=true flag is provided to e2e tests it would cleanup all the leftover namespaces except `default` and `kube-system` and because of this when we run e2e tests in federation soak test job, the federation control plane is destroyed before it runs the tests and all tests start to fail.
So adding federation-system to the list of namespace to be left intact and also changed the default federation namespace name from `federation` to `federation-system` to be consistent with the newer method of deploying federation using kubefed.
@madhusudancs @nikhiljindal
Automatic merge from submit-queue (batch tested with PRs 38830, 38750)
Remove the ReadyReplica version guard
**What this PR does / why we need it**: Removes outlived version guards.
**Which issue this PR fixes**: fixes#37310
Automatic merge from submit-queue
Add a package for handling version numbers (including non-"Semantic" versions)
As noted in #32401, we are using Semantic Version-parsing libraries to parse version numbers that aren't necessarily "Semantic". Although, contrary to what I'd said there, it turns out that this wasn't actually currently a problem for the iptables code, because the regexp used to extract the version number out of the "iptables --version" output only pulled out three components, so given "iptables v1.4.19.1", it would have extracted just "1.4.19". Still, it could be a problem if they later release "1.5" rather than "1.5.0", or if we eventually need to _compare_ against a 4-digit version number.
Also, as noted in #23854, we were also using two different semver libraries in different parts of the code (plus a wrapper around one of them in pkg/version).
This PR adds pkg/util/version, with code to parse and compare both semver and non-semver version strings, and then updates kubernetes to use it everywhere (including getting rid of a bunch of code duplication in kubelet by making utilversion.Version implement the kubecontainer.Version interface directly).
Ironically, this does not actually allow us to get rid of either of the vendored semver libraries, because we still have other dependencies that depend on each of them. (cadvisor uses blang/semver and etcd uses coreos/go-semver)
fixes#32401, #23854
Automatic merge from submit-queue
Add an option to run Job in Density/Load config
cc @timothysc @jeremyeder
@erictune @soltysh - I run this test and it seems to me that Job has noticeably worse performance than Deployment. I'll create an issue for this, but this PR is for easy repro.
Automatic merge from submit-queue (batch tested with PRs 37325, 38313, 38141, 38321, 38333)
Fix running e2e with 'Completed' kube-system pods
As of now, e2e runner keeps waiting for pods in `kube-system` namespace to be "Running and Ready" if there are any pods in `Completed` state in that namespace.
This for example happens after following [Kubernetes Hosted Installation](http://docs.projectcalico.org/v2.0/getting-started/kubernetes/installation/#kubernetes-hosted-installation) instructions for Calico, making it impossible to run conformance tests against the cluster. It's also to possible to reproduce the problem like that:
```
$ cat testjob.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: tst
namespace: kube-system
spec:
template:
metadata:
name: tst
spec:
containers:
- name: tst
image: busybox
command: ["echo", "test"]
restartPolicy: Never
$ kubectl create -f testjob.yaml
$ go run hack/e2e.go -v --test --test_args='--ginkgo.focus=existing\s+RC'
```
Automatic merge from submit-queue
Delete regional static-ip instead of global for type=lb
Global vs region is the difference between
```
$ gcloud compute addresses delete foo --global
$ gcloud compute addresses delete foo --region us-central1
```
Type=LoadBalancer users the second type and were were doing the first.
Also adds some logging.
Automatic merge from submit-queue (batch tested with PRs 38173, 38151, 38197, 38221)
test: wait for ready replica set before adopting
Reworked version of https://github.com/kubernetes/kubernetes/pull/36439 which was reverted in https://github.com/kubernetes/kubernetes/pull/38049. This PR doesn't use any of the new status API added in replica sets so it should cause no trouble with upgrade tests.
@kubernetes/deployment @smarterclayton
Automatic merge from submit-queue (batch tested with PRs 37032, 38119, 38186, 38200, 38139)
New ns param for NewClusterVerification
**What this PR does / why we need it**: Allows the test to specify alternate namespaces to when waiting for pods to be in a specific state.
**Which issue this PR fixes**: fixes#38138
**Special notes for your reviewer**: Minor fix
**Release note**: None
Automatic merge from submit-queue (batch tested with PRs 37328, 38102, 37261, 31321, 38146)
Fixes flake: wait for dns pods terminating after test completed
From #37194. Based on #36600. Please only look at the second commit.
As mentioned in [comment](https://github.com/kubernetes/kubernetes/issues/37194#issuecomment-262007174), "DNS horizontal autoscaling" test does not wait for the additional pods to be terminated and this may lead to the failure of later tests.
This fix adds a wait loop at the end of the serial test to ensure the cluster recovers to the original state. In the non-serial test it does not wait for the additional pods terminating because it will not affect other tests, given they are able to be run simultaneously. Plus wait for pods terminating will take certain amount of time.
Note this only fixes certain case of #37194. I noticed there are other failures irrelevant to dns autoscaler. LIke [this one](https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce-serial/34/).
@bprashanth @Random-Liu
Automatic merge from submit-queue (batch tested with PRs 36352, 36538, 37976, 36374)
test: update deployment helper to return better error messages
@kubernetes/deployment the problem with https://github.com/kubernetes/kubernetes/issues/36270 is that the selector key is never added in the deployment but this change would make it clearer.
Automatic merge from submit-queue (batch tested with PRs 37094, 37663, 37442, 37808, 37826)
Moved gobindata, refactored ReadOrDie refs
**What this PR does / why we need it**: Having gobindata inside of test/e2e/framework prevents external projects from importing the framework. Moving it out and managing refs fixes this problem.
**Which issue this PR fixes**: fixes#37007
Automatic merge from submit-queue (batch tested with PRs 37997, 37939, 37990, 36700, 37258)
Add cluster-level AppArmor E2E test
My goal is to reuse this test for an automated cluster upgrade test.
Automatic merge from submit-queue
test: update rollover test to wait for available rs before adopting
Scenario that happened in https://github.com/kubernetes/kubernetes/issues/35355#issuecomment-257808460
-- Replica set that is about to be adopted has 2 out of 4 ready replicas
-- Deployment is created with 4 replicas, adopts pre-existing replica set, creates a new one, and starts rolling replicas over to the new replica set.
```
Nov 2 01:38:17.088: INFO: At 2016-11-02 01:38:04 -0700 PDT - event for test-rollover-deployment: {deployment-controller } ScalingReplicaSet: Scaled down replica set test-rollover-controller to 3
Nov 2 01:38:17.088: INFO: At 2016-11-02 01:38:04 -0700 PDT - event for test-rollover-deployment: {deployment-controller } ScalingReplicaSet: Scaled up replica set test-rollover-deployment-2505289747 to 1
Nov 2 01:38:17.088: INFO: At 2016-11-02 01:38:04 -0700 PDT - event for test-rollover-deployment-2505289747: {replicaset-controller } SuccessfulCreate: Created pod: test-rollover-deployment-2505289747-iuiei
Nov 2 01:38:17.088: INFO: At 2016-11-02 01:38:04 -0700 PDT - event for test-rollover-deployment-2505289747-iuiei: {default-scheduler } Scheduled: Successfully assigned test-rollover-deployment-2505289747-iuiei to gke-jenkins-e2e-default-pool-33c0400e-6q5m
Nov 2 01:38:17.088: INFO: At 2016-11-02 01:38:05 -0700 PDT - event for test-rollover-deployment: {deployment-controller } ScalingReplicaSet: Scaled up replica set test-rollover-deployment-2505289747 to 2
```
At this point there is no minimum availability for the Deployment (maxUnavailable is 1 meaning desired minimum available is 3 but we only have 2), and the new replica set uses a non-existent image. New replica set is scaled up to 1 (maxSurge is 1), then old replica set is scaled down by one, because cleanupUnhealthyReplicas observes that it has 2 unhealthy replicas - it can only scale down one though because the [maximum replicas it can cleanup is one](d87dfa2723/pkg/controller/deployment/rolling.go (L125)) (4+1-3-1). New replica set is scaled to 2. Available replicas are still 2 (third replica from the old replica set has yet to come up).
-- Deployment is rolled over with a new update. Test reaches for the WaitForDeploymentStatus check but there are only 2 availableReplicas (maxUnavailable is still violated).
This change makes the test wait for a healthy replica set before proceeding thus it should never hit the scenario described above.
@kubernetes/deployment
- Remaining spaghetti untangled
- Missed bazel update and a few hardcoded refs
- New instance of framework.ReadOrDie reference removed post rebase
- Resolve new clientset rebase
- Fixed e2e/generated BUILD dep
- A space
- Missed gobindata ref in golang.sh
Automatic merge from submit-queue
Build vendored copy of go-bindata and use that in go generate step
**What this PR does / why we need it**: as the title says, uses the vendored version of `go-bindata` rather than expecting developers to `go get` it (when building outside docker).
**Which issue this PR fixes**: fixes#34067, partially addresses #36655
**Special notes for your reviewer**: we still call `go generate` far too many times:
```console
~/.../src/k8s.io/kubernetes $ which go-bindata
~/.../src/k8s.io/kubernetes $ make
+++ [1116 17:35:28] Building the toolchain targets:
k8s.io/kubernetes/hack/cmd/teststale
k8s.io/kubernetes/vendor/github.com/jteeuwen/go-bindata/go-bindata
+++ [1116 17:35:29] Generating bindata:
test/e2e/framework/gobindata_util.go
+++ [1116 17:35:30] Building go targets for linux/amd64:
cmd/libs/go2idl/deepcopy-gen
+++ [1116 17:35:35] Building the toolchain targets:
k8s.io/kubernetes/hack/cmd/teststale
k8s.io/kubernetes/vendor/github.com/jteeuwen/go-bindata/go-bindata
+++ [1116 17:35:35] Generating bindata:
test/e2e/framework/gobindata_util.go
+++ [1116 17:35:36] Building go targets for linux/amd64:
cmd/libs/go2idl/defaulter-gen
+++ [1116 17:35:41] Building the toolchain targets:
k8s.io/kubernetes/hack/cmd/teststale
k8s.io/kubernetes/vendor/github.com/jteeuwen/go-bindata/go-bindata
+++ [1116 17:35:41] Generating bindata:
test/e2e/framework/gobindata_util.go
+++ [1116 17:35:42] Building go targets for linux/amd64:
cmd/libs/go2idl/conversion-gen
+++ [1116 17:35:47] Building the toolchain targets:
k8s.io/kubernetes/hack/cmd/teststale
k8s.io/kubernetes/vendor/github.com/jteeuwen/go-bindata/go-bindata
+++ [1116 17:35:47] Generating bindata:
test/e2e/framework/gobindata_util.go
+++ [1116 17:35:48] Building go targets for linux/amd64:
cmd/libs/go2idl/openapi-gen
+++ [1116 17:35:56] Building the toolchain targets:
k8s.io/kubernetes/hack/cmd/teststale
k8s.io/kubernetes/vendor/github.com/jteeuwen/go-bindata/go-bindata
+++ [1116 17:35:56] Generating bindata:
test/e2e/framework/gobindata_util.go
```
Fixing that is a separate effort, though.
cc @sebgoa @ZhangBanger
Automatic merge from submit-queue
Fix package aliases to follow golang convention
Some package aliases are not not align with golang convention https://blog.golang.org/package-names. This PR fixes them. Also adds a verify script and presubmit checks.
Fixes#35070.
cc/ @timstclair @Random-Liu
Automatic merge from submit-queue
Skip rather than fail networking tests on single node
**What this PR does / why we need it**:
Needed for the general e2e tidying we need to do for flakey slow tests, imo pre 1.5, see #31402 and so on.
**Which issue this PR fixes** *
Dont fail multinode tests if on a single node cluster, skip instead.
Checking the result.Code prior to err in the if statement causes a panic
if result is nil. It turns out the formatting of the error is already in
IssueSSHCommandWithResult, so removing redundant code is enough to fix
the issue. Logging the SSH result was also redundant, so I removed that
as well.
Automatic merge from submit-queue
Guard the ready replica checking by server version
I fixed replica readiness checking for 1.4->1.5 upgrades by using a field that only exists in versions >=1.4.0 in #36924
This fixed a lot of issues in 1.4->1.5 upgrade testing, but did not fix 1.3->1.5 upgrade tests. I've disabled replica checking for 1.3 masters as the old logic was broken anyway.
This will not affect the 1.3 CI tests. Just 1.3 -> {1.4, 1.5} upgrade tests.
https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/kubernetes-e2e-gke-container_vm-1.3-container_vm-1.5-upgrade-cluster-new/330?log
is an example of this breakage. This is the tell-tale logs:
```console
Nov 22 09:40:50.469: INFO: 11 / 11 pods in namespace 'kube-system' are running and ready (506 seconds elapsed)
Nov 22 09:40:50.469: INFO: expected 5 pod replicas in namespace 'kube-system', 0 are Running and Ready.
Nov 22 09:40:50.469: INFO: POD NODE PHASE GRACE CONDITIONS
```
Automatic merge from submit-queue
Use netexec container in http lifecycle hook test.
Fixes https://github.com/kubernetes/kubernetes/issues/33636.
The original test is using `"echo -e \"HTTP/1.1 200 OK\n\" | nc -l -p 1234` as a simple http server.
However, it seems that this is not very reliable, which may response before golang thinks it should.
So we get the error:
```
I1106 06:14:13.325397 2096 logs.go:41] Unsolicited response received on idle HTTP channel starting with "HTTP/1.1 200 OK\n\n"; err=<nil>
```
This PR changes the test to use the `netexec` container which is a simple http server written by golang and used in many of our networking e2e test. It should be more reliable.
Mark 1.5 since this is fixing a 1.5 release blocking issue. Mark P0 to match the original issue.
@dchen1107
This disables ready replica checking for 1.3 masters, but only from 1.4
or 1.5 clients. The old logic was broken anyway due to overlapping
labels with replica sets.
Automatic merge from submit-queue
Retry job update after failure to prevent modification conflict
This fixes#34585 flake.
@janetkuo || @kubernetes/sig-apps ptal
I've been getting too many emails recently wrt to that issue, so I wanted to "clean" my inbox a bit 😉
Automatic merge from submit-queue
Add e2e test for statefulset updates
Verify that one can (manually) update statefulset template
cc @erictune @foxish @kow3ns @kubernetes/sig-apps
Automatic merge from submit-queue
Cleanup pod in MatchContainerOutput
MatchContainerOutput always creates a pod and does not cleanup. We need
to fix this to be better at re-trying the scenarios.
When there is an error say in the first attempt of ExpectNoErrorWithRetries
(for example in "Pods should contain environment variables for services" test)
the retries logic calls MatchContainerOutput another time and the
podClient.create fails correctly since the pod was not cleaned up the
first time MatchContainerOutput was called.
Fixes#35089
MatchContainerOutput always creates a pod and does not cleanup. We need
to fix this to be better at re-trying the scenarios.
When there is an error say in the first attempt of ExpectNoErrorWithRetries
(for example in "Pods should contain environment variables for services" test)
the retries logic calls MatchContainerOutput another time and the
podClient.create fails correctly since the pod was not cleaned up the
first time MatchContainerOutput was called.
Fixes#35089
Automatic merge from submit-queue
Node Conformance & E2E: Get node name from node object.
This PR changes the node e2e test framework to get node name from apiserver instead of test flags.
When a user tried out the node conformance test, he found that node conformance test will not work properly if kubelet is started with `hostname-override`.
The reason is that node conformance test is using [the default node name - `os.Hostname`](https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/e2e_node_suite_test.go#L124), which may be different from `hostname-override`. This will cause test pods not scheduled, and eventually test timeout.
We can expose a flag from node conformance test, and let user set node name themselves if they are using `hostname-override` on kubelet. However, let the framework automatically detect it from apiserver is more user friendly.
/cc @kubernetes/sig-node
This PR 1) only changes node e2e test framework; 2) fixes a problem in node conformance test which is a 1.5 feature. @saad-ali Can we have this in 1.5?
Automatic merge from submit-queue
Node Conformance Test: Containerize the node e2e test
For #30122, #30174.
Based on #32427, #32454.
**Please only review the last 3 commits.**
This PR packages the node e2e test into a docker image:
- 1st commit: Add `NodeConformance` flag in the node e2e framework to avoid starting kubelet and collecting system logs. We do this because:
- There are all kinds of ways to manage kubelet and system logs, for different situation we need to mount different things into the container, run different commands. It is hard and unnecessary to handle the complexity inside the test suite.
- 2nd commit: Remove all `sudo` in the test container. We do this because:
- In most container, there is no `sudo` command, and there is no need to use `sudo` inside the container.
- It introduces some complexity to use `sudo` inside the test. (https://github.com/kubernetes/kubernetes/issues/29211, https://github.com/kubernetes/kubernetes/issues/26748) In fact we just need to run the test suite with `sudo`.
- 3rd commit: Package the test into a docker container with corresponding `Makefile` and `Dockerfile`. We also added a `run_test.sh` script to start kubelet and run the test container. The script is only for demonstration purpose and we'll also use the script in our node e2e framework. In the future, we should update the script to start kubelet in production way (maybe with `systemd` or `supervisord`).
@dchen1107 @vishh
/cc @kubernetes/sig-node @kubernetes/sig-testing
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
``` release-note
Release alpha version node test container gcr.io/google_containers/node-test-ARCH:0.1 for users to verify their node setup.
```
Automatic merge from submit-queue
Controller changes for perma failed deployments
This PR adds support for reporting failed deployments based on a timeout
parameter defined in the spec. If there is no progress for the amount
of time defined as progressDeadlineSeconds then the deployment will be
marked as failed by a Progressing condition with a ProgressDeadlineExceeded
reason.
Follow-up to https://github.com/kubernetes/kubernetes/pull/19343
Docs at kubernetes/kubernetes.github.io#1337
Fixes https://github.com/kubernetes/kubernetes/issues/14519
@kubernetes/deployment @smarterclayton
The functionality used to exist entirely in the NC which would
previously clean up pods and nodes together. Now, we simply
wait for the PodGC to see that the node is now deleted and clean up the
pods. This may take a while and hence we set a 1 minute timeout.
Automatic merge from submit-queue
[Kubelet] Use the custom mounter script for Nfs and Glusterfs only
This patch reduces the scope for the containerized mounter to NFS and GlusterFS on GCE + GCI clusters
This patch also enabled the containerized mounter on GCI nodes
Shepherding multiple PRs through the submit queue is painful. Hence I combined them into this PR. Please review each commit individually.
cc @jingxu97 @saad-ali
https://github.com/kubernetes/kubernetes/pull/35652 has also been reverted as part of this PR
Automatic merge from submit-queue
Node controller to not force delete pods
Fixes https://github.com/kubernetes/kubernetes/issues/35145
- [x] e2e tests to test Petset, RC, Job.
- [x] Remove and cover other locations where we force-delete pods within the NodeController.
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
``` release-note
Node controller no longer force-deletes pods from the api-server.
* For StatefulSet (previously PetSet), this change means creation of replacement pods is blocked until old pods are definitely not running (indicated either by the kubelet returning from partitioned state, or deletion of the Node object, or deletion of the instance in the cloud provider, or force deletion of the pod from the api-server). This has the desirable outcome of "fencing" to prevent "split brain" scenarios.
* For all other existing controllers except StatefulSet , this has no effect on the ability of the controller to replace pods because the controllers do not reuse pod names (they use generate-name).
* User-written controllers that reuse names of pod objects should evaluate this change.
```
This changes framework.GetReadySchedulableNodesOrDie and
framework.GetMasterAndWorkerNodesOrDie so that nodes that can't take a
generic fake pod due to a taint/toleration mismatch aren't returned.
This is a rehash of #35210, but pulls in the scheduler code.
Automatic merge from submit-queue
remove the non-generated client
Removes the non-generated client from kube. The package has a few methods left, but nothing that needs updating when adding new groups.
@ingvagabund
Automatic merge from submit-queue
Set done to true & return error if RestartPolicy not Always in test framework
Found a small issue with https://github.com/kubernetes/kubernetes/pull/34632, it returns an error if the RestartPolicy is not Always, but the user will never see it because done isn't set to true & they will timeout instead.
@Random-Liu because you wrote that PR
Automatic merge from submit-queue
Adding cascading deletion support to federated namespaces
Ref https://github.com/kubernetes/kubernetes/issues/33612
With this change, whenever a federated namespace is deleted with `DeleteOptions.OrphanDependents = false`, then federation namespace controller first deletes the corresponding namespaces from all underlying clusters before deleting the federated namespace.
cc @kubernetes/sig-cluster-federation @caesarxuchao
```release-note
Adding support for DeleteOptions.OrphanDependents for federated namespaces. Setting it to false while deleting a federated namespace also deletes the corresponding namespace from all registered clusters.
```
Automatic merge from submit-queue
Speed up some networking tests in large clusters
Since we are getting towards testing larger and larger clusters (hopefully 5000-node ones soon-ish), I'm trying to limit the amount of super long tests to minimum.
This should significantly reduce amount of time used by those from test/e2e/networking.go.
@gmarek
Automatic merge from submit-queue
always clean gce resources in service e2e
@bprashanth the previous PR was closed when I squashed my commits.
Here is the new change set, please help to review again.
1). only the following two It() create, I created a string array to persist the LB name so that they can be cleaned in AfterEach(), and the string array was reset after clean up.
```
"should be able to change the type and ports of a service [Slow]"
"should be able to create services of type LoadBalancer and externalTraffic=localOnly"
```
2). Directly call gce api to delete the resource and ignore any error returned.
Automatic merge from submit-queue
e2e: stop tracking resource usage for the "misc" container
There is e2e test checking the resource usage of "misc", and it is not
supported on GCI.