Automatic merge from submit-queue
AllowedNotReadyNodes allowed to be not ready for absolutely *any* reason
It's as good as we allow those many nodes to be not part of the cluster at all, ever.
Btw - currently our 5k-node correctness test fails if "kubelet stopped posting node status" or "route not created", etc (ref: https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scale-correctness/3/build-log.txt)
cc @kubernetes/sig-scalability-misc
We seem to get a lot of flakes due to "connection refused" while running
`kubectl exec`. I can't find any reason this would be caused by the test
flow, so I'm adding retries to see if that helps.
Automatic merge from submit-queue
Migrate to controller references helpers in meta/v1
**What this PR does / why we need it**:
This is a follow up for #48319 that migrates all method usages to new methods in meta/v1.
**Special notes for your reviewer**:
Looking at each commit individually might be easier.
**Release note**:
```release-note
NONE
```
/sig api-machinery
/kind cleanup
Automatic merge from submit-queue (batch tested with PRs 49651, 49707, 49662, 47019, 49747)
improve detectability of deleted pods
**What this PR does / why we need it**:
Adds comment to `waitForPodTerminatedInNamespace` to better explain how it's implemented.
~~It improves pod deletion detection in the e2e framework as follows:~~
~~1. the `waitForPodTerminatedInNamespace` func looks for pod.Status.Phase == _PodFailed_ or _PodSucceeded_ since both values imply that all containers have terminated.~~
~~2. the `waitForPodTerminatedInNamespace` func also ignores the pod's Reason if the passed-in `reason` parm is "". Reason is not really relevant to the pod being deleted or not, but if the caller passes a non-blank `reason` then it will be lower-cased, de-blanked and compared to the pod's Reason (also lower-cased and de-blanked). The idea is to make Reason checking more flexible and to prevent a pod from being considered running when all of its containers have terminated just because of a Reason mis-match.~~
Releated to pr [49597](https://github.com/kubernetes/kubernetes/pull/49597) and issue [49529](https://github.com/kubernetes/kubernetes/issues/49529).
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 45813, 49594, 49443, 49167, 47539)
Add node e2e tests for GKE environment
Ref: https://github.com/kubernetes/kubernetes/issues/46891
This PR adds node e2e tests for validating images used on GKE.
- We pass the `SYSTEM_SPEC_NAME` to the node e2e test process via the flag `--system-spec-name` so that we can skip the environment specific tests using `RunIfSystemSpecNameIs()`.
- Also added `SkipIfContainerRuntimeIs()` as the opposite of `RunIfContainerRuntimeIs()`.
**Release note**:
```
None
```
Automatic merge from submit-queue (batch tested with PRs 49619, 49598, 47267, 49597, 49638)
improve log for pod deletion poll loop
**What this PR does / why we need it**:
It improves some logging related to waiting for a pod to reach a passed-in condition. Specifically, related to issue [49529](https://github.com/kubernetes/kubernetes/issues/49529) where better logging may help to debug the root cause.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 49420, 49296, 49299, 49371, 46514)
Refactoring taint functions to reduce sprawl
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#45060
**Special notes for your reviewer**:
@gmarek @timothysc @k82cn @jayunit100 - I moved some fn's to helpers and some to utils. LMK, if you are ok with this change.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 48377, 48940, 49144, 49062, 49148)
fixit: break sig-cluster-lifecycle tests into subpackage
this is part of fixit week. ref #49161
@kubernetes/sig-cluster-lifecycle-misc
Automatic merge from submit-queue (batch tested with PRs 48139, 48042, 47645, 48054, 48003)
Pipe clusterID into gce_loadbalancer_external.go
**What this PR does / why we need it**: Small cleanup for GCE ELB codes.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#48002
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 47915, 47856, 44086, 47575, 47475)
deprecate created-by annotation for e2e test framework
**What this PR does / why we need it**: This PR deprecates created-by annotation for e2e test framework. This is needed as we now have ControllerRef.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: xref #44407
**Special notes for your reviewer**: This is the third PR for deprecating created-by annotation. Other PRs can be found here: #47469, #47471
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 47470, 47260, 47411, 46852, 46135)
Write reports for each upgrade test
Due to the way Ginkgo runs individual test cases and the level of coordination required for the upgrade tests, they were all run under a single Ginkgo test case. This PR generates and auxiliary report that break out the results of each upgrade test. This is accomplished by:
1) Wrapping `ginkgo.Fail` and `ginkgo.Skip` to get the actual failure or skip messages.
2) Recovering that info in the upgrade test to generate an auxiliary report.
I suggest reviewing commit by commit.
Sample report: https://storage.googleapis.com/krouseytestreports/logs/results/1/artifacts/junit_upgrades.xmlFixes: #47371
Automatic merge from submit-queue (batch tested with PRs 46835, 46856)
Made WaitForReplicas and EnsureDesiredReplicas use PollImmediate and improved logging.
**What this PR does / why we need it**: Most importantly, this results in better logging: timeout is logged at the level of the caller, not the helper function, helping debugging.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 46252, 45524, 46236, 46277, 46522)
Make GCE load-balancers create health checks for nodes
From #14661. Proposal on kubernetes/community#552. Fixes#46313.
Bullet points:
- Create nodes health check and firewall (for health checking) for non-OnlyLocal service.
- Create local traffic health check and firewall (for health checking) for OnlyLocal service.
- Version skew:
- Don't create nodes health check if any nodes has version < 1.7.0.
- Don't backfill nodes health check on existing LBs unless users explicitly trigger it.
**Release note**:
```release-note
GCE Cloud Provider: New created LoadBalancer type Service now have health checks for nodes by default.
An existing LoadBalancer will have health check attached to it when:
- Change Service.Spec.Type from LoadBalancer to others and flip it back.
- Any effective change on Service.Spec.ExternalTrafficPolicy.
```
Automatic merge from submit-queue
move from daemon_restart.go to framework/util.go
**What this PR does / why we need it**:
Moves the func `nodeExec` from daemon_restart.go to framework/util.go. This is the correct file for this func and is a more intuitive pkg for other callers to use. This is a small step of the larger effort of restructuring e2e tests to be more logically structured and easier for newcomers to understand.
```release-note
NONE
```
cc @timothysc @copejon
Automatic merge from submit-queue
util.go: format for
**What this PR does / why we need it**:
format for.
delete redundant para.
make code clean.
**Release note**:
```release-note
NONE
```
Not all CI systems support ssh keys to be present on the node. This
supports the case where "local" provider is being used when running
e2e test, but the environment does not have a SSH key.
Automatic merge from submit-queue (batch tested with PRs 44424, 44026, 43939, 44386, 42914)
Try in cluster config when input KubeConfig is empty
**What this PR does / why we need it**:
Allows for downstream providers to run e2es "incluster" sans kubeconfig
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```
NONE
```
/cc @kubernetes/sig-testing-pr-reviews @marun
Automatic merge from submit-queue (batch tested with PRs 44447, 44456, 43277, 41779, 43942)
Clean up pre-ControllerRef compatibility logic
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#43323
**Special notes for your reviewer**:
No
**Release note**:
```
NONE
```
Automatic merge from submit-queue
fix wrong error return
Signed-off-by: Crazykev <crazykev@zju.edu.cn>
**What this PR does / why we need it**: The err return here is wrong, correct it.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 41728, 42231)
Adding new tests to e2e/vsphere_volume_placement.go
**What this PR does / why we need it**:
Adding new tests to e2e/vsphere_volume_placement.go
Below is the tests description and test steps.
**Test Back-to-back pod creation/deletion with different volume sources on the same worker node**
1. Create volumes - vmdk2, vmdk1 is created in the test setup.
2. Create pod Spec - pod-SpecA with volume path of vmdk1 and NodeSelector set to label assigned to node1.
3. Create pod Spec - pod-SpecB with volume path of vmdk2 and NodeSelector set to label assigned to node1.
4. Create pod-A using pod-SpecA and wait for pod to become ready.
5. Create pod-B using pod-SpecB and wait for POD to become ready.
6. Verify volumes are attached to the node.
7. Create empty file on the volume to make sure volume is accessible. (Perform this step on pod-A and pod-B)
8. Verify file created in step 5 is present on the volume. (perform this step on pod-A and pod-B)
9. Delete pod-A and pod-B
10. Repeatedly (5 times) perform step 4 to 9 and verify associated volume's content is matching.
11. Wait for vmdk1 and vmdk2 to be detached from node.
12. Delete vmdk1 and vmdk2
**Test multiple volumes from different datastore within the same pod**
1. Create volumes - vmdk2 on non default shared datastore.
2. Create pod Spec with volume path of vmdk1 (vmdk1 is created in test setup on default datastore) and vmdk2.
3. Create pod using spec created in step-2 and wait for pod to become ready.
4. Verify both volumes are attached to the node on which pod are created. Write some data to make sure volume are accessible.
5. Delete pod.
6. Wait for vmdk1 and vmdk2 to be detached from node.
7. Create pod using spec created in step-2 and wait for pod to become ready.
8. Verify both volumes are attached to the node on which PODs are created. Verify volume contents are matching with the content written in step 4.
9. Delete POD.
10. Wait for vmdk1 and vmdk2 to be detached from node.
11. Delete vmdk1 and vmdk2
**Test multiple volumes from same datastore within the same pod**
1. Create volumes - vmdk2, vmdk1 is created in testsetup
2. Create pod Spec with volume path of vmdk1 (vmdk1 is created in test setup) and vmdk2.
3. Create pod using spec created in step-2 and wait for pod to become ready.
4. Verify both volumes are attached to the node on which pod are created. Write some data to make sure volume are accessible.
5. Delete pod.
6. Wait for vmdk1 and vmdk2 to be detached from node.
7. Create pod using spec created in step-2 and wait for pod to become ready.
8. Verify both volumes are attached to the node on which PODs are created. Verify volume contents are matching with the content written in step 4.
9. Delete POD.
10. Wait for vmdk1 and vmdk2 to be detached from node.
11. Delete vmdk1 and vmdk2
**Which issue this PR fixes**
fixes #
**Special notes for your reviewer**:
Executed tests against K8S v1.5.3 release
**Release note**:
```release-note
NONE
```
cc: @kerneltime @abrarshivani @BaluDontu @tusharnt @pdhamdhere
Automatic merge from submit-queue (batch tested with PRs 42237, 42297, 42279, 42436, 42551)
Reword PVC polling message to log a more readable message.
**What this PR does / why we need it**:
Previous message used to report an error is misleading and poorly written. This PR changes the log to be more readable.
```release-note
NONE
```
In particular, we should not assume ControllerRefs are necessarily set.
However, we can still use ControllerRefs that do exist to avoid
interfering with controllers that do use it.
Automatic merge from submit-queue
Fix Deployment upgrade test.
**What this PR does / why we need it**:
When the upgrade test operates on Deployments in a pre-1.6 cluster (i.e. during the Setup phase), it needs to use the v1.5 deployment/util logic. In particular, the v1.5 logic does not filter children to only those with a matching ControllerRef.
**Which issue this PR fixes**:
Fixes#42738
**Special notes for your reviewer**:
**Release note**:
```release-note
```
cc @kubernetes/sig-apps-pr-reviews
When the upgrade test operates on Deployments in a pre-1.6 cluster
(i.e. during the Setup phase), it needs to use the v1.5 deployment/util
logic. In particular, the v1.5 logic does not filter children to only
those with a matching ControllerRef.
Automatic merge from submit-queue
e2e test: Log container output on TestContainerOutput error
When a pod started with TestContainerOutput or TestContainerOutputRegexp
fails from unknown reason, we should log all output of all its containers
so we can analyze what went wrong.
This would help us to see what wrong in https://github.com/kubernetes/kubernetes/issues/40811 - a container is running there for 3 minutes and dies and we want to see what it did for these 3 minutes.
```release-note
NONE
```
When a pod started with TestContainerOutput or TestContainerOutputRegexp
fails from unknown reason, we should log all output of all its containers
so we can analyze what went wrong.
Automatic merge from submit-queue (batch tested with PRs 42652, 42681, 42708, 42730)
e2e: fix restarting the apiserver
The string used to match the image name of the apiserver (e.g., `gcr.io/google_containers/kube-apiserver:3be...`),
but this no longer works. Change the test to locate the kube-apiserver container by name.
The list functions in deployment/util are used outside the Deployment
controller itself. Therefore, they don't do actual adoption/orphaning.
However, they still need to avoid listing things that don't belong.
Automatic merge from submit-queue (batch tested with PRs 38957, 41819, 41851, 40667, 41373)
Move pvutil.go from e2e package to framework package
**What this PR does / why we need it**:
This PR moves pvutil.go to the e2e/framework package.
I am working on a PV upgrade test, and would like to use some of the wrapper functions in pvutil.go. However, the upgrade test is in the upgrade package, and not the e2e package, and it cannot import the e2e package because it would create a circular dependency. So pvutil.go needs to be moved out of e2e in order to break the circular dependency. This is a refactoring name change, no logic has been modified.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
**Special notes for your reviewer**:
**Release note**:
NONE
Automatic merge from submit-queue (batch tested with PRs 41421, 41440, 36765, 41722)
Use watch param instead of deprecated /watch/ prefix
Switches clients to use watch param instead of /watch/ prefix
```release-note
Clients now use the `?watch=true` parameter to make watch API calls, instead of the `/watch/` path prefix
```
Automatic merge from submit-queue (batch tested with PRs 40345, 38183, 40236, 40861, 40900)
Remove checks for pods responding in deployment e2e tests
Fixes#39879
Remove it because it caused deployment e2e tests sometimes timed out waiting for pods responding, and pods responding isn't related to deployment controller and is not a prerequisite of deployment e2e tests.
@kargakis
Remove it because it caused deployment e2e tests sometimes timed out
waiting for pods responding, and pods responding isn't related to
deployment controller and is not a prerequisite of deployment e2e tests.
Automatic merge from submit-queue
Add an upgrade test for secrets.
**What this PR does / why we need it**: This PR adds an upgrade test for secrets. It creates a secret and makes sure that pods can consume it before an after an upgrade.
Automatic merge from submit-queue (batch tested with PRs 40696, 39914, 40374)
Forgiveness library changes
**What this PR does / why we need it**:
Splited from #34825, contains library changes that are needed to implement forgiveness:
1. ~~make taints-tolerations matching respect timestamps, so that one toleration can just tolerate a taint for only a period of time.~~ As TaintManager is caching taints and observing taint changes, time-based checking is now outside the library (in TaintManager). see #40355.
2. make tolerations respect wildcard key.
3. add/refresh some related functions to wrap taints-tolerations operation.
**Which issue this PR fixes**:
Related issue: #1574
Related PR: #34825, #39469
~~Please note that the first 2 commits in this PR come from #39469 .~~
**Special notes for your reviewer**:
~~Since currently we have `pkg/api/helpers.go` and `pkg/api/v1/helpers.go`, there are some duplicated periods of code laying in these two files.~~
~~Ideally we should move taints-tolerations related functions into a separate package (pkg/util/taints), and make it a unified set of implementations. But I'd just suggest to do it in a follow-up PR after Forgiveness ones done, in case of feature Forgiveness getting blocked to long.~~
**Release note**:
```release-note
make tolerations respect wildcard key
```
Automatic merge from submit-queue (batch tested with PRs 40584, 40319)
ssh support for local
**What this PR does / why we need it**: adds local deployment support for e2e tests. Useful for non-cloud, simple testing.
**Special notes for your reviewer**: Formerly this pr was part of #38214
**Release note**:
```
NONE
```
Automatic merge from submit-queue (batch tested with PRs 38592, 39949, 39946, 39882)
move api/errors to apimachinery
`pkg/api/errors` is a set of helpers around `meta/v1.Status` that help to create and interpret various apiserver errors. Things like `.NewNotFound` and `IsNotFound` pairings. This pull moves it into apimachinery for use by the clients and servers.
@smarterclayton @lavalamp First commit is the move plus minor fitting. Second commit is straight replace and generation.
Automatic merge from submit-queue (batch tested with PRs 38592, 39949, 39946, 39882)
Add optional per-request context to restclient
**What this PR does / why we need it**: It adds per-request contexts to restclient's API, and uses them to add timeouts to all proxy calls in the e2e tests. An entire e2e shouldn't hang for hours on a single API call.
**Which issue this PR fixes**: #38305
**Special notes for your reviewer**:
This adds a feature to the low-level rest client request feature that is entirely optional. It doesn't affect any requests that don't use it. The api of the generated clients does not change, and they currently don't take advantage of this.
I intend to patch this in to 1.5 as a mostly test only change since it's not going to affect any controller, generated client, or user of the generated client.
cc @kubernetes/sig-api-machinery
cc @saad-ali
Automatic merge from submit-queue
replace global registry in apimachinery with global registry in k8s.io/kubernetes
We'd like to remove all globals, but our immediate problem is that a shared registry between k8s.io/kubernetes and k8s.io/client-go doesn't work. Since client-go makes a copy, we can actually keep a global registry with other globals in pkg/api for now.
@kubernetes/sig-api-machinery-misc @lavalamp @smarterclayton @sttts
Automatic merge from submit-queue (batch tested with PRs 38426, 38917, 38891, 38935)
Support different image during GCE node upgrade
**What this PR does / why we need it**: It lets GCE upgrade tests upgrade to a GCI node image.
**Which issue this PR fixes**: fixes#37855
Automatic merge from submit-queue
Fix Recreate for Deployments and stop using events in e2e tests
Fixes https://github.com/kubernetes/kubernetes/issues/36453 by removing events from the deployment tests. The test about events during a Rolling deployment is redundant so I just removed it (we already have another test specifically for Rolling deployments).
Closes https://github.com/kubernetes/kubernetes/issues/32567 (preferred to use pod LISTs instead of a new status API field for replica sets that would add many more writes to replica sets).
@kubernetes/deployment
Automatic merge from submit-queue (batch tested with PRs 38830, 38750)
[Federation] Stop cleaning federation namespace in e2e tests
when --clean-start=true flag is provided to e2e tests it would cleanup all the leftover namespaces except `default` and `kube-system` and because of this when we run e2e tests in federation soak test job, the federation control plane is destroyed before it runs the tests and all tests start to fail.
So adding federation-system to the list of namespace to be left intact and also changed the default federation namespace name from `federation` to `federation-system` to be consistent with the newer method of deploying federation using kubefed.
@madhusudancs @nikhiljindal
Automatic merge from submit-queue (batch tested with PRs 38830, 38750)
Remove the ReadyReplica version guard
**What this PR does / why we need it**: Removes outlived version guards.
**Which issue this PR fixes**: fixes#37310
Automatic merge from submit-queue
Add a package for handling version numbers (including non-"Semantic" versions)
As noted in #32401, we are using Semantic Version-parsing libraries to parse version numbers that aren't necessarily "Semantic". Although, contrary to what I'd said there, it turns out that this wasn't actually currently a problem for the iptables code, because the regexp used to extract the version number out of the "iptables --version" output only pulled out three components, so given "iptables v1.4.19.1", it would have extracted just "1.4.19". Still, it could be a problem if they later release "1.5" rather than "1.5.0", or if we eventually need to _compare_ against a 4-digit version number.
Also, as noted in #23854, we were also using two different semver libraries in different parts of the code (plus a wrapper around one of them in pkg/version).
This PR adds pkg/util/version, with code to parse and compare both semver and non-semver version strings, and then updates kubernetes to use it everywhere (including getting rid of a bunch of code duplication in kubelet by making utilversion.Version implement the kubecontainer.Version interface directly).
Ironically, this does not actually allow us to get rid of either of the vendored semver libraries, because we still have other dependencies that depend on each of them. (cadvisor uses blang/semver and etcd uses coreos/go-semver)
fixes#32401, #23854
Automatic merge from submit-queue
Add an option to run Job in Density/Load config
cc @timothysc @jeremyeder
@erictune @soltysh - I run this test and it seems to me that Job has noticeably worse performance than Deployment. I'll create an issue for this, but this PR is for easy repro.
Automatic merge from submit-queue (batch tested with PRs 37325, 38313, 38141, 38321, 38333)
Fix running e2e with 'Completed' kube-system pods
As of now, e2e runner keeps waiting for pods in `kube-system` namespace to be "Running and Ready" if there are any pods in `Completed` state in that namespace.
This for example happens after following [Kubernetes Hosted Installation](http://docs.projectcalico.org/v2.0/getting-started/kubernetes/installation/#kubernetes-hosted-installation) instructions for Calico, making it impossible to run conformance tests against the cluster. It's also to possible to reproduce the problem like that:
```
$ cat testjob.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: tst
namespace: kube-system
spec:
template:
metadata:
name: tst
spec:
containers:
- name: tst
image: busybox
command: ["echo", "test"]
restartPolicy: Never
$ kubectl create -f testjob.yaml
$ go run hack/e2e.go -v --test --test_args='--ginkgo.focus=existing\s+RC'
```
Automatic merge from submit-queue
Delete regional static-ip instead of global for type=lb
Global vs region is the difference between
```
$ gcloud compute addresses delete foo --global
$ gcloud compute addresses delete foo --region us-central1
```
Type=LoadBalancer users the second type and were were doing the first.
Also adds some logging.
Automatic merge from submit-queue (batch tested with PRs 38173, 38151, 38197, 38221)
test: wait for ready replica set before adopting
Reworked version of https://github.com/kubernetes/kubernetes/pull/36439 which was reverted in https://github.com/kubernetes/kubernetes/pull/38049. This PR doesn't use any of the new status API added in replica sets so it should cause no trouble with upgrade tests.
@kubernetes/deployment @smarterclayton
Automatic merge from submit-queue (batch tested with PRs 37328, 38102, 37261, 31321, 38146)
Fixes flake: wait for dns pods terminating after test completed
From #37194. Based on #36600. Please only look at the second commit.
As mentioned in [comment](https://github.com/kubernetes/kubernetes/issues/37194#issuecomment-262007174), "DNS horizontal autoscaling" test does not wait for the additional pods to be terminated and this may lead to the failure of later tests.
This fix adds a wait loop at the end of the serial test to ensure the cluster recovers to the original state. In the non-serial test it does not wait for the additional pods terminating because it will not affect other tests, given they are able to be run simultaneously. Plus wait for pods terminating will take certain amount of time.
Note this only fixes certain case of #37194. I noticed there are other failures irrelevant to dns autoscaler. LIke [this one](https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce-serial/34/).
@bprashanth @Random-Liu
Automatic merge from submit-queue (batch tested with PRs 36352, 36538, 37976, 36374)
test: update deployment helper to return better error messages
@kubernetes/deployment the problem with https://github.com/kubernetes/kubernetes/issues/36270 is that the selector key is never added in the deployment but this change would make it clearer.
Automatic merge from submit-queue (batch tested with PRs 37094, 37663, 37442, 37808, 37826)
Moved gobindata, refactored ReadOrDie refs
**What this PR does / why we need it**: Having gobindata inside of test/e2e/framework prevents external projects from importing the framework. Moving it out and managing refs fixes this problem.
**Which issue this PR fixes**: fixes#37007
Automatic merge from submit-queue (batch tested with PRs 37997, 37939, 37990, 36700, 37258)
Add cluster-level AppArmor E2E test
My goal is to reuse this test for an automated cluster upgrade test.
Automatic merge from submit-queue
test: update rollover test to wait for available rs before adopting
Scenario that happened in https://github.com/kubernetes/kubernetes/issues/35355#issuecomment-257808460
-- Replica set that is about to be adopted has 2 out of 4 ready replicas
-- Deployment is created with 4 replicas, adopts pre-existing replica set, creates a new one, and starts rolling replicas over to the new replica set.
```
Nov 2 01:38:17.088: INFO: At 2016-11-02 01:38:04 -0700 PDT - event for test-rollover-deployment: {deployment-controller } ScalingReplicaSet: Scaled down replica set test-rollover-controller to 3
Nov 2 01:38:17.088: INFO: At 2016-11-02 01:38:04 -0700 PDT - event for test-rollover-deployment: {deployment-controller } ScalingReplicaSet: Scaled up replica set test-rollover-deployment-2505289747 to 1
Nov 2 01:38:17.088: INFO: At 2016-11-02 01:38:04 -0700 PDT - event for test-rollover-deployment-2505289747: {replicaset-controller } SuccessfulCreate: Created pod: test-rollover-deployment-2505289747-iuiei
Nov 2 01:38:17.088: INFO: At 2016-11-02 01:38:04 -0700 PDT - event for test-rollover-deployment-2505289747-iuiei: {default-scheduler } Scheduled: Successfully assigned test-rollover-deployment-2505289747-iuiei to gke-jenkins-e2e-default-pool-33c0400e-6q5m
Nov 2 01:38:17.088: INFO: At 2016-11-02 01:38:05 -0700 PDT - event for test-rollover-deployment: {deployment-controller } ScalingReplicaSet: Scaled up replica set test-rollover-deployment-2505289747 to 2
```
At this point there is no minimum availability for the Deployment (maxUnavailable is 1 meaning desired minimum available is 3 but we only have 2), and the new replica set uses a non-existent image. New replica set is scaled up to 1 (maxSurge is 1), then old replica set is scaled down by one, because cleanupUnhealthyReplicas observes that it has 2 unhealthy replicas - it can only scale down one though because the [maximum replicas it can cleanup is one](d87dfa2723/pkg/controller/deployment/rolling.go (L125)) (4+1-3-1). New replica set is scaled to 2. Available replicas are still 2 (third replica from the old replica set has yet to come up).
-- Deployment is rolled over with a new update. Test reaches for the WaitForDeploymentStatus check but there are only 2 availableReplicas (maxUnavailable is still violated).
This change makes the test wait for a healthy replica set before proceeding thus it should never hit the scenario described above.
@kubernetes/deployment
- Remaining spaghetti untangled
- Missed bazel update and a few hardcoded refs
- New instance of framework.ReadOrDie reference removed post rebase
- Resolve new clientset rebase
- Fixed e2e/generated BUILD dep
- A space
- Missed gobindata ref in golang.sh
Automatic merge from submit-queue
Fix package aliases to follow golang convention
Some package aliases are not not align with golang convention https://blog.golang.org/package-names. This PR fixes them. Also adds a verify script and presubmit checks.
Fixes#35070.
cc/ @timstclair @Random-Liu
Checking the result.Code prior to err in the if statement causes a panic
if result is nil. It turns out the formatting of the error is already in
IssueSSHCommandWithResult, so removing redundant code is enough to fix
the issue. Logging the SSH result was also redundant, so I removed that
as well.
This disables ready replica checking for 1.3 masters, but only from 1.4
or 1.5 clients. The old logic was broken anyway due to overlapping
labels with replica sets.
Automatic merge from submit-queue
Retry job update after failure to prevent modification conflict
This fixes#34585 flake.
@janetkuo || @kubernetes/sig-apps ptal
I've been getting too many emails recently wrt to that issue, so I wanted to "clean" my inbox a bit 😉
Automatic merge from submit-queue
Add e2e test for statefulset updates
Verify that one can (manually) update statefulset template
cc @erictune @foxish @kow3ns @kubernetes/sig-apps
MatchContainerOutput always creates a pod and does not cleanup. We need
to fix this to be better at re-trying the scenarios.
When there is an error say in the first attempt of ExpectNoErrorWithRetries
(for example in "Pods should contain environment variables for services" test)
the retries logic calls MatchContainerOutput another time and the
podClient.create fails correctly since the pod was not cleaned up the
first time MatchContainerOutput was called.
Fixes#35089
Automatic merge from submit-queue
Node Conformance & E2E: Get node name from node object.
This PR changes the node e2e test framework to get node name from apiserver instead of test flags.
When a user tried out the node conformance test, he found that node conformance test will not work properly if kubelet is started with `hostname-override`.
The reason is that node conformance test is using [the default node name - `os.Hostname`](https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/e2e_node_suite_test.go#L124), which may be different from `hostname-override`. This will cause test pods not scheduled, and eventually test timeout.
We can expose a flag from node conformance test, and let user set node name themselves if they are using `hostname-override` on kubelet. However, let the framework automatically detect it from apiserver is more user friendly.
/cc @kubernetes/sig-node
This PR 1) only changes node e2e test framework; 2) fixes a problem in node conformance test which is a 1.5 feature. @saad-ali Can we have this in 1.5?
Automatic merge from submit-queue
Controller changes for perma failed deployments
This PR adds support for reporting failed deployments based on a timeout
parameter defined in the spec. If there is no progress for the amount
of time defined as progressDeadlineSeconds then the deployment will be
marked as failed by a Progressing condition with a ProgressDeadlineExceeded
reason.
Follow-up to https://github.com/kubernetes/kubernetes/pull/19343
Docs at kubernetes/kubernetes.github.io#1337
Fixes https://github.com/kubernetes/kubernetes/issues/14519
@kubernetes/deployment @smarterclayton