Automatic merge from submit-queue (batch tested with PRs 44406, 41543, 44071, 44374, 44299)
Decouple remotecommand
Refactored unversioned/remotecommand to decouple it from undesirable dependencies:
- term package now is not required, and functionality required to resize terminal size can be plugged in directly in kubectl
- in order to remove dependency on kubelet package - constants from kubelet/server/remotecommand were moved to separate util package (pkg/util/remotecommand)
- remotecommand_test.go moved to pkg/client/tests module
Automatic merge from submit-queue (batch tested with PRs 44447, 44456, 43277, 41779, 43942)
Clean up pre-ControllerRef compatibility logic
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#43323
**Special notes for your reviewer**:
No
**Release note**:
```
NONE
```
Automatic merge from submit-queue
add kubelet tests to verify host clean up
**What this PR does / why we need it**:
Increasingly we're seeing more failures in persistent volume e2e tests where pv tests are run in parallel with disruptive tests. The quick solution is to tag the pv tests as Flaky. This pr addresses one cause of the flakiness and adds a disruptive kubelet test.
Once this pr is shown to not produce flakes the [Flaky] tag for the "HostCleanup" tests will be removed in a separate pr.
+ Adds volume tests to _kubelet.go_ motivated by issues [31272](https://github.com/kubernetes/kubernetes/issues/31272) and [37657](https://github.com/kubernetes/kubernetes/issues/37657)
+ Addresses reverted pr [41178](https://github.com/kubernetes/kubernetes/pull/41178) and negates the need for pr [41229](https://github.com/kubernetes/kubernetes/pull/41229)
**Which issue this PR fixes**
Adds regression tests to cover issues: #31272 and #37657
**Special notes for your reviewer**:
It's possible that one of the new tests, which relies on the existence of _/usr/sbin/rpc.nfsd_ in the nfs-server pod, will not work in the GCI container env. If this turns out to be true then I will add a `SkipIfProviderIs("gke")` to the `It` block.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
[e2e] Bump up pod deletion time for source pod IP test
From #44225.
Source pod IP e2e test is pretty flaky lately, and most of the failures seem to be timeout waiting for pod "kube-proxy-mode-detector" to disappear. Didn't found any other thing suspicious.
This PR bumps pod deletion timeout to DefaultPodDeletionTimeout, which is 3 minutes. Hopefully it will mitigate the flakes.
/assign @freehan
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Adding load balancer src cidrs to GCE cloudprovider
**What this PR does / why we need it**:
As of January 31st, 2018, GCP will be sending health checks and l7 traffic from two CIDRs and legacy health checks from three CIDS. This PR moves them into the cloudprovider package and provides a flag for override.
Another PR will need to be address firewall rule creation for external L4 network loadbalancing #40778
**Which issue this PR fixes**
Step one of #40778
Step one of https://github.com/kubernetes/ingress/issues/197
**Release note**:
```release-note
Add flags to GCE cloud provider to override known L4/L7 proxy & health check source cidrs
```
Automatic merge from submit-queue
Add test for provisioning with storage class
This PR re-introduces e2e test for dynamic provisioning with storage classes.
It adds the same test as it was merged in PR #32485 with an extra patch adding region to AWS calls. It works well on my AWS setup, however I'm using shared company account and I can't run kube-up.sh and run the tests in the "official" way.
@zmerlynn, can you please try to run tests that led to #34961?
@justinsb, you're my AWS guru, would there be a way how to introduce fully initialized AWS cloud provider into e2e test framework? It would simplify everything. GCE has it there, but it's easier to initialize, I guess. See https://github.com/kubernetes/kubernetes/blob/master/test/e2e/pd.go#L486 for example - IMO tests should not talk to AWS directly.
Automatic merge from submit-queue
fix wrong error return
Signed-off-by: Crazykev <crazykev@zju.edu.cn>
**What this PR does / why we need it**: The err return here is wrong, correct it.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 41189, 43818)
Reduce deployment replicas in e2e tests
Fixes https://github.com/kubernetes/kubernetes/issues/41063
There are still two tests that run multiple replicas (testScaledRolloutDeployment, testIterativeDeployments), we may want to revisit them at some point.
Automatic merge from submit-queue
Extract e2e utility code into framework
**What this PR does / why we need it**:
There's a growing dependency on Volume e2e utilities related to creating / test against NFS volumes. For this reason, it's useful to relocate the relevant functions to the `framework` pkg. Doing so makes these utility functions available to e2e tests outside the `e2e` package.
This PR only moves code from the `e2e` package to `framework` and handle the relevant changes in calls. It does not change any logic.
```release-note
NONE
```
@jingxu97 I think there's value here in reducing duplicate code in the `common` package, given that these functions have been copied down to it. However, there's been some divergence. Can you PTAL and let me know if there's any reason we can't remove the duplicate `common` code?
cc @jeffvance
Automatic merge from submit-queue (batch tested with PRs 44097, 42772, 43880, 44031, 44066)
[Federation] Improve e2e test setup
This PR improves federation e2e test setup:
- reuses e2e framework setup (``NewDefaultFramework``) instead of duplicating it
- ensures ``FederationAfterEach`` is called if an error occurs in ``FederationBeforeEach`` (as per the [example](https://github.com/kubernetes/kubernetes/blob/master/test/e2e/framework/framework.go#L161) of the e2e framework)
- skips creation of a test namespace in the hosting cluster (not used for a federation e2e test)
cc: @kubernetes/sig-federation-pr-reviews @kubernetes/sig-testing-pr-reviews
Moved remaining util functions
moved cinder specific function back to volumes.go, will have to be extracted later when a cinder e2e package is created.
remove dupe code from common/volume.go
Moved [Volume] tags to KubeDescribe
Automatic merge from submit-queue
test/e2e_node: prepull images with CRI
Part of https://github.com/kubernetes/kubernetes/issues/40739
- This PR builds on top of #40525 (and contains one commit from #40525)
- The second commit contains a tiny change in the `Makefile`.
- Third commit is a patch to be able to prepull images using the CRI (as opposed to run `docker` to pull images which doesn't make sense if you're using CRI most of the times)
Marked WIP till #40525 makes its way into master
@Random-Liu @lucab @yujuhong @mrunalp @rhatdan
Automatic merge from submit-queue (batch tested with PRs 42667, 43923)
Adding test to perform volume operations storm
**What this PR does / why we need it**:
Adding new test to perform volume operations storm
Test Steps
1. Create storage class for thin Provisioning.
2. Create 30 PVCs using above storage class in annotation, requesting 2 GB files.
3. Wait until all disks are ready and all PVs and PVCs get bind. (**CreateVolume** storm)
4. Create pod to mount volumes using PVCs created in step 2. (**AttachDisk** storm)
5. Wait for pod status to be running.
6. Verify all volumes accessible and available in the pod.
7. Delete pod.
8. wait until volumes gets detached. (**DetachDisk** storm)
9. Delete all PVCs. This should delete all Disks. (**DeleteVolume** storm)
10. Delete storage class.
This test will help validate issue reported at https://github.com/vmware/kubernetes/issues/71
**Which issue this PR fixes**
fixes #
**Special notes for your reviewer**:
executed test on 1.5.3 release with `VOLUME_OPS_SCALE` set to `5`
Will execute test with the changes made on PR - https://github.com/kubernetes/kubernetes/pull/42422, with the `VOLUME_OPS_SCALE` set to `30`
**Release note**:
```release-note
None
```
cc: @abrarshivani @BaluDontu @tusharnt @pdhamdhere @luomiao @kerneltime
Automatic merge from submit-queue (batch tested with PRs 42617, 43247, 43509, 43644, 43820)
Adding a SSH tunnel check to GKE upgrades
**What this PR does / why we need it**: After an upgrade, it takes some time fore the master to re-establish SSH tunnels to the nodes. This adds a wait for GKE since it runs in this configuration.
**Which issue this PR fixes**: fixes#43611, #43612
Comments, log lines, exported MakePersistentVolume
Moved pv/pvcConfig assignment out of diskName check, nil ptrs and configs afterEach
change label/selector code to use k8s defined types from generic string:string maps
adjust for gce refactor
Automatic merge from submit-queue (batch tested with PRs 41728, 42231)
Adding new tests to e2e/vsphere_volume_placement.go
**What this PR does / why we need it**:
Adding new tests to e2e/vsphere_volume_placement.go
Below is the tests description and test steps.
**Test Back-to-back pod creation/deletion with different volume sources on the same worker node**
1. Create volumes - vmdk2, vmdk1 is created in the test setup.
2. Create pod Spec - pod-SpecA with volume path of vmdk1 and NodeSelector set to label assigned to node1.
3. Create pod Spec - pod-SpecB with volume path of vmdk2 and NodeSelector set to label assigned to node1.
4. Create pod-A using pod-SpecA and wait for pod to become ready.
5. Create pod-B using pod-SpecB and wait for POD to become ready.
6. Verify volumes are attached to the node.
7. Create empty file on the volume to make sure volume is accessible. (Perform this step on pod-A and pod-B)
8. Verify file created in step 5 is present on the volume. (perform this step on pod-A and pod-B)
9. Delete pod-A and pod-B
10. Repeatedly (5 times) perform step 4 to 9 and verify associated volume's content is matching.
11. Wait for vmdk1 and vmdk2 to be detached from node.
12. Delete vmdk1 and vmdk2
**Test multiple volumes from different datastore within the same pod**
1. Create volumes - vmdk2 on non default shared datastore.
2. Create pod Spec with volume path of vmdk1 (vmdk1 is created in test setup on default datastore) and vmdk2.
3. Create pod using spec created in step-2 and wait for pod to become ready.
4. Verify both volumes are attached to the node on which pod are created. Write some data to make sure volume are accessible.
5. Delete pod.
6. Wait for vmdk1 and vmdk2 to be detached from node.
7. Create pod using spec created in step-2 and wait for pod to become ready.
8. Verify both volumes are attached to the node on which PODs are created. Verify volume contents are matching with the content written in step 4.
9. Delete POD.
10. Wait for vmdk1 and vmdk2 to be detached from node.
11. Delete vmdk1 and vmdk2
**Test multiple volumes from same datastore within the same pod**
1. Create volumes - vmdk2, vmdk1 is created in testsetup
2. Create pod Spec with volume path of vmdk1 (vmdk1 is created in test setup) and vmdk2.
3. Create pod using spec created in step-2 and wait for pod to become ready.
4. Verify both volumes are attached to the node on which pod are created. Write some data to make sure volume are accessible.
5. Delete pod.
6. Wait for vmdk1 and vmdk2 to be detached from node.
7. Create pod using spec created in step-2 and wait for pod to become ready.
8. Verify both volumes are attached to the node on which PODs are created. Verify volume contents are matching with the content written in step 4.
9. Delete POD.
10. Wait for vmdk1 and vmdk2 to be detached from node.
11. Delete vmdk1 and vmdk2
**Which issue this PR fixes**
fixes #
**Special notes for your reviewer**:
Executed tests against K8S v1.5.3 release
**Release note**:
```release-note
NONE
```
cc: @kerneltime @abrarshivani @BaluDontu @tusharnt @pdhamdhere
Automatic merge from submit-queue (batch tested with PRs 42237, 42297, 42279, 42436, 42551)
Cleanup federation_util.go in e2e/framework
The only function GetValidDNSSubdomainName in test/e2e/framework/federation_util.go is no longer used for some time now. so cleaning it up.
cc @kubernetes/sig-federation-pr-reviews @madhusudancs
Automatic merge from submit-queue (batch tested with PRs 42237, 42297, 42279, 42436, 42551)
Reword PVC polling message to log a more readable message.
**What this PR does / why we need it**:
Previous message used to report an error is misleading and poorly written. This PR changes the log to be more readable.
```release-note
NONE
```
In particular, we should not assume ControllerRefs are necessarily set.
However, we can still use ControllerRefs that do exist to avoid
interfering with controllers that do use it.
Automatic merge from submit-queue
Fix Deployment upgrade test.
**What this PR does / why we need it**:
When the upgrade test operates on Deployments in a pre-1.6 cluster (i.e. during the Setup phase), it needs to use the v1.5 deployment/util logic. In particular, the v1.5 logic does not filter children to only those with a matching ControllerRef.
**Which issue this PR fixes**:
Fixes#42738
**Special notes for your reviewer**:
**Release note**:
```release-note
```
cc @kubernetes/sig-apps-pr-reviews
When the upgrade test operates on Deployments in a pre-1.6 cluster
(i.e. during the Setup phase), it needs to use the v1.5 deployment/util
logic. In particular, the v1.5 logic does not filter children to only
those with a matching ControllerRef.
Automatic merge from submit-queue (batch tested with PRs 43018, 42713)
Log instead of fail on GLBCs tendency to leak resources
**What this PR does / why we need it**:
Stops upgrade tests from flaking because the GLBC does not cleanup all resources due to a race condition.
**Which issue this PR fixes**: fixes#38569
**Special notes for your reviewer**:
To be reviewed by @mml
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 42775, 42991, 42968, 43029)
Initial breakout of scheduling e2es to help assist in assignment and refactoring
**What this PR does / why we need it**:
This PR segregates the scheduling specific e2es to isolate the library which will assist both in refactoring but also auto-assignment of issues.
**Which issue this PR fixes**
xref: https://github.com/kubernetes/kubernetes/issues/42691#issuecomment-285563265
**Special notes for your reviewer**:
All this change does is shuffle code around and quarantine. Behavioral, and other cleanup changes, will be in follow on PRs. As of today, the e2es are a monolith and there is massive symbol pollution, this 1st step allows us to segregate the e2es and tease apart the dependency mess.
**Release note**:
```
NONE
```
/cc @kubernetes/sig-scheduling-pr-reviews @kubernetes/sig-testing-pr-reviews @marun @skriss
/cc @gmarek - same trick for load + density, etc.
Automatic merge from submit-queue
e2e test: Log container output on TestContainerOutput error
When a pod started with TestContainerOutput or TestContainerOutputRegexp
fails from unknown reason, we should log all output of all its containers
so we can analyze what went wrong.
This would help us to see what wrong in https://github.com/kubernetes/kubernetes/issues/40811 - a container is running there for 3 minutes and dies and we want to see what it did for these 3 minutes.
```release-note
NONE
```
When a pod started with TestContainerOutput or TestContainerOutputRegexp
fails from unknown reason, we should log all output of all its containers
so we can analyze what went wrong.
Automatic merge from submit-queue (batch tested with PRs 42734, 42745, 42758, 42814, 42694)
Implement automated downgrade testing.
Node version cannot be higher than the master version, so we must
switch the node version first. Also, we must use the upgrade script
from the appropriate version for GCE.
Node version cannot be higher than the master version, so we must
switch the node version first. Also, we must use the upgrade script
from the appropriate version for GCE.
Automatic merge from submit-queue (batch tested with PRs 42652, 42681, 42708, 42730)
e2e: fix restarting the apiserver
The string used to match the image name of the apiserver (e.g., `gcr.io/google_containers/kube-apiserver:3be...`),
but this no longer works. Change the test to locate the kube-apiserver container by name.
Automatic merge from submit-queue (batch tested with PRs 41890, 42593, 42633, 42626, 42609)
Remove everything that is not new from batch/v2alpha1
Fixes#37166.
@lavalamp you've asked for it
@erictune this is a prereq for moving CronJobs to beta. I initially planned to put all in one PR, but after I did that I figured out it'll be easier to review separately. ptal
@kubernetes/api-approvers @kubernetes/sig-api-machinery-pr-reviews ptal
Automatic merge from submit-queue (batch tested with PRs 42080, 41653, 42598, 42555)
Fix resource cleanup in ingress_utils.go within e2e/framework
**What this PR does / why we need it**:
The GLBC is failing to delete resources during the etcd rollback tests and the e2e cleanup is leaking them. After a short while, tests are failing to create new resources.
This PR addresses the e2e/framework's ability to delete GLBC-created resources and adds more logging.
**Which issue this PR fixes**:
Helps #38569 but does not completely close this flake
**Special notes for your reviewer**:
Resources were not being deleted because resource names were being truncated and then their ability to be deleted was determined by the entire cluster id existing in the name. Truncated names also have an extra '0' append to the end of their name (unknown origin). This PR tries to match on a common prefix.
Minor changes were made to improve log readability.
**Testing this PR**:
This was tested by running a master upgrade test and by adding a second forwarding-rule mid-run. This forwarding rule referenced the same url-map used by the first forwarding-rule created by the GLBC. Therefore, the GLBC will be able to delete the forwarding-rule but not anymore L7 resources. This second forwarding rule's name was nearly identical to the first forwarding rule so that the cleanup code will find it.
As you can see from the test run below, the cleanup code deleted all the resources that the GLBC could not.
```log
...
Mar 5 18:35:53.112: INFO: Monitoring glbc's cleanup of gce resources:
k8s-fws-e2e-tests-ingress-upgrsde-0px85-static-ip--5f38ac0e2420 (forwarding rule)
k8s-tps-e2e-tests-ingress-upgrade-0px85-static-ip--5f38ac0e2420 (target-https-proxy)
k8s-um-e2e-tests-ingress-upgrade-0px85-static-ip--5f38ac0e24260 (url-map)
k8s-be-32331--5f38ac0e2426f796 (backend-service)
k8s-be-32613--5f38ac0e2426f796 (backend-service)
k8s-be-32331--5f38ac0e2426f796 (http-health-check)
k8s-be-32613--5f38ac0e2426f796 (http-health-check)
k8s-ig--5f38ac0e2426f796 (instance-group)
k8s-ssl-e2e-tests-ingress-upgrade-0px85-static-ip--5f38ac0e2420 (ssl-certificate)
STEP: Performing final delete of any remaining resources
Mar 5 18:35:54.055: INFO: Deleting forwarding-rules: k8s-fws-e2e-tests-ingress-upgrsde-0px85-static-ip--5f38ac0e2420
Mar 5 18:36:06.945: INFO: Deleting target-https-proxies: k8s-tps-e2e-tests-ingress-upgrade-0px85-static-ip--5f38ac0e2420
Mar 5 18:36:14.301: INFO: Deleting url-map: k8s-um-e2e-tests-ingress-upgrade-0px85-static-ip--5f38ac0e24260
Mar 5 18:36:18.309: INFO: Deleting backed-service: k8s-be-32331--5f38ac0e2426f796
Mar 5 18:36:22.112: INFO: Deleting backed-service: k8s-be-32613--5f38ac0e2426f796
Mar 5 18:36:26.192: INFO: Deleting http-health-check: k8s-be-32331--5f38ac0e2426f796
Mar 5 18:36:29.846: INFO: Deleting http-health-check: k8s-be-32613--5f38ac0e2426f796
Mar 5 18:36:33.722: INFO: Deleting instance-group: k8s-ig--5f38ac0e2426f796
Mar 5 18:36:37.762: INFO: Deleting ssl-certificate: k8s-ssl-e2e-tests-ingress-upgrade-0px85-static-ip--5f38ac0e2420
STEP: No resources leaked.
Mar 5 18:36:46.441: INFO: Deleting addresses: e2e-tests-ingress-upgrade-0px85-static-ip
Mar 5 18:36:53.902: INFO: L7 controller failed to delete all cloud resources on time. timed out waiting for the condition
...
```
Automatic merge from submit-queue (batch tested with PRs 42080, 41653, 42598, 42555)
StatefulSet: Respect ControllerRef
**What this PR does / why we need it**:
This is part of the completion of the [ControllerRef](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/controller-ref.md) proposal. It brings StatefulSet into full compliance with ControllerRef. See the individual commit messages for details.
**Which issue this PR fixes**:
Fixes#36859
**Special notes for your reviewer**:
**Release note**:
```release-note
StatefulSet now respects ControllerRef to avoid fighting over Pods. At the time of upgrade, **you must not have StatefulSets with selectors that overlap** with any other controllers (such as ReplicaSets), or else [ownership of Pods may change](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/controller-ref.md#upgrading).
```
cc @erictune @kubernetes/sig-apps-pr-reviews
The list functions in deployment/util are used outside the Deployment
controller itself. Therefore, they don't do actual adoption/orphaning.
However, they still need to avoid listing things that don't belong.
Automatic merge from submit-queue (batch tested with PRs 42443, 38924, 42367, 42391, 42310)
Fix StatefulSet e2e flake
**What this PR does / why we need it**:
Fixes StatefulSet e2e flake by ensuring that the StatefulSet controller has observed the unreadiness of Pods prior to attempting to exercise scale functionality.
**Which issue this PR fixes**
fixes#41889
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 41306, 42187, 41666, 42275, 42266)
Bump test timeouts to make secret tests work in large clusters
The previous Get/Update pattern with no retry on resource version mismatch
would flake with the following error:
"the object has been modified; please apply your changes to the latest
version and try again"
Automatic merge from submit-queue (batch tested with PRs 41644, 42020, 41753, 42206, 42212)
Ingress-glbc upgrade tests
Basically #41676 but with some fixes and added comments. @bprashanth has been away this week and it's desirable to have this in before code freeze.
Automatic merge from submit-queue
Extend experimental support to multiple Nvidia GPUs
Extended from #28216
```release-note
`--experimental-nvidia-gpus` flag is **replaced** by `Accelerators` alpha feature gate along with support for multiple Nvidia GPUs.
To use GPUs, pass `Accelerators=true` as part of `--feature-gates` flag.
Works only with Docker runtime.
```
1. Automated testing for this PR is not possible since creation of clusters with GPUs isn't supported yet in GCP.
1. To test this PR locally, use the node e2e.
```shell
TEST_ARGS='--feature-gates=DynamicKubeletConfig=true' FOCUS=GPU SKIP="" make test-e2e-node
```
TODO:
- [x] Run manual tests
- [x] Add node e2e
- [x] Add unit tests for GPU manager (< 100% coverage)
- [ ] Add unit tests in kubelet package