Automatic merge from submit-queue
e2e test: Log container output on TestContainerOutput error
When a pod started with TestContainerOutput or TestContainerOutputRegexp
fails from unknown reason, we should log all output of all its containers
so we can analyze what went wrong.
This would help us to see what wrong in https://github.com/kubernetes/kubernetes/issues/40811 - a container is running there for 3 minutes and dies and we want to see what it did for these 3 minutes.
```release-note
NONE
```
When a pod started with TestContainerOutput or TestContainerOutputRegexp
fails from unknown reason, we should log all output of all its containers
so we can analyze what went wrong.
Automatic merge from submit-queue (batch tested with PRs 42734, 42745, 42758, 42814, 42694)
Implement automated downgrade testing.
Node version cannot be higher than the master version, so we must
switch the node version first. Also, we must use the upgrade script
from the appropriate version for GCE.
Node version cannot be higher than the master version, so we must
switch the node version first. Also, we must use the upgrade script
from the appropriate version for GCE.
Automatic merge from submit-queue (batch tested with PRs 42652, 42681, 42708, 42730)
e2e: fix restarting the apiserver
The string used to match the image name of the apiserver (e.g., `gcr.io/google_containers/kube-apiserver:3be...`),
but this no longer works. Change the test to locate the kube-apiserver container by name.
Automatic merge from submit-queue (batch tested with PRs 41890, 42593, 42633, 42626, 42609)
Remove everything that is not new from batch/v2alpha1
Fixes#37166.
@lavalamp you've asked for it
@erictune this is a prereq for moving CronJobs to beta. I initially planned to put all in one PR, but after I did that I figured out it'll be easier to review separately. ptal
@kubernetes/api-approvers @kubernetes/sig-api-machinery-pr-reviews ptal
Automatic merge from submit-queue (batch tested with PRs 42080, 41653, 42598, 42555)
Fix resource cleanup in ingress_utils.go within e2e/framework
**What this PR does / why we need it**:
The GLBC is failing to delete resources during the etcd rollback tests and the e2e cleanup is leaking them. After a short while, tests are failing to create new resources.
This PR addresses the e2e/framework's ability to delete GLBC-created resources and adds more logging.
**Which issue this PR fixes**:
Helps #38569 but does not completely close this flake
**Special notes for your reviewer**:
Resources were not being deleted because resource names were being truncated and then their ability to be deleted was determined by the entire cluster id existing in the name. Truncated names also have an extra '0' append to the end of their name (unknown origin). This PR tries to match on a common prefix.
Minor changes were made to improve log readability.
**Testing this PR**:
This was tested by running a master upgrade test and by adding a second forwarding-rule mid-run. This forwarding rule referenced the same url-map used by the first forwarding-rule created by the GLBC. Therefore, the GLBC will be able to delete the forwarding-rule but not anymore L7 resources. This second forwarding rule's name was nearly identical to the first forwarding rule so that the cleanup code will find it.
As you can see from the test run below, the cleanup code deleted all the resources that the GLBC could not.
```log
...
Mar 5 18:35:53.112: INFO: Monitoring glbc's cleanup of gce resources:
k8s-fws-e2e-tests-ingress-upgrsde-0px85-static-ip--5f38ac0e2420 (forwarding rule)
k8s-tps-e2e-tests-ingress-upgrade-0px85-static-ip--5f38ac0e2420 (target-https-proxy)
k8s-um-e2e-tests-ingress-upgrade-0px85-static-ip--5f38ac0e24260 (url-map)
k8s-be-32331--5f38ac0e2426f796 (backend-service)
k8s-be-32613--5f38ac0e2426f796 (backend-service)
k8s-be-32331--5f38ac0e2426f796 (http-health-check)
k8s-be-32613--5f38ac0e2426f796 (http-health-check)
k8s-ig--5f38ac0e2426f796 (instance-group)
k8s-ssl-e2e-tests-ingress-upgrade-0px85-static-ip--5f38ac0e2420 (ssl-certificate)
STEP: Performing final delete of any remaining resources
Mar 5 18:35:54.055: INFO: Deleting forwarding-rules: k8s-fws-e2e-tests-ingress-upgrsde-0px85-static-ip--5f38ac0e2420
Mar 5 18:36:06.945: INFO: Deleting target-https-proxies: k8s-tps-e2e-tests-ingress-upgrade-0px85-static-ip--5f38ac0e2420
Mar 5 18:36:14.301: INFO: Deleting url-map: k8s-um-e2e-tests-ingress-upgrade-0px85-static-ip--5f38ac0e24260
Mar 5 18:36:18.309: INFO: Deleting backed-service: k8s-be-32331--5f38ac0e2426f796
Mar 5 18:36:22.112: INFO: Deleting backed-service: k8s-be-32613--5f38ac0e2426f796
Mar 5 18:36:26.192: INFO: Deleting http-health-check: k8s-be-32331--5f38ac0e2426f796
Mar 5 18:36:29.846: INFO: Deleting http-health-check: k8s-be-32613--5f38ac0e2426f796
Mar 5 18:36:33.722: INFO: Deleting instance-group: k8s-ig--5f38ac0e2426f796
Mar 5 18:36:37.762: INFO: Deleting ssl-certificate: k8s-ssl-e2e-tests-ingress-upgrade-0px85-static-ip--5f38ac0e2420
STEP: No resources leaked.
Mar 5 18:36:46.441: INFO: Deleting addresses: e2e-tests-ingress-upgrade-0px85-static-ip
Mar 5 18:36:53.902: INFO: L7 controller failed to delete all cloud resources on time. timed out waiting for the condition
...
```
Automatic merge from submit-queue (batch tested with PRs 42080, 41653, 42598, 42555)
StatefulSet: Respect ControllerRef
**What this PR does / why we need it**:
This is part of the completion of the [ControllerRef](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/controller-ref.md) proposal. It brings StatefulSet into full compliance with ControllerRef. See the individual commit messages for details.
**Which issue this PR fixes**:
Fixes#36859
**Special notes for your reviewer**:
**Release note**:
```release-note
StatefulSet now respects ControllerRef to avoid fighting over Pods. At the time of upgrade, **you must not have StatefulSets with selectors that overlap** with any other controllers (such as ReplicaSets), or else [ownership of Pods may change](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/controller-ref.md#upgrading).
```
cc @erictune @kubernetes/sig-apps-pr-reviews
The list functions in deployment/util are used outside the Deployment
controller itself. Therefore, they don't do actual adoption/orphaning.
However, they still need to avoid listing things that don't belong.
Automatic merge from submit-queue (batch tested with PRs 42443, 38924, 42367, 42391, 42310)
Fix StatefulSet e2e flake
**What this PR does / why we need it**:
Fixes StatefulSet e2e flake by ensuring that the StatefulSet controller has observed the unreadiness of Pods prior to attempting to exercise scale functionality.
**Which issue this PR fixes**
fixes#41889
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 41306, 42187, 41666, 42275, 42266)
Bump test timeouts to make secret tests work in large clusters
The previous Get/Update pattern with no retry on resource version mismatch
would flake with the following error:
"the object has been modified; please apply your changes to the latest
version and try again"
Automatic merge from submit-queue (batch tested with PRs 41644, 42020, 41753, 42206, 42212)
Ingress-glbc upgrade tests
Basically #41676 but with some fixes and added comments. @bprashanth has been away this week and it's desirable to have this in before code freeze.
Automatic merge from submit-queue
Extend experimental support to multiple Nvidia GPUs
Extended from #28216
```release-note
`--experimental-nvidia-gpus` flag is **replaced** by `Accelerators` alpha feature gate along with support for multiple Nvidia GPUs.
To use GPUs, pass `Accelerators=true` as part of `--feature-gates` flag.
Works only with Docker runtime.
```
1. Automated testing for this PR is not possible since creation of clusters with GPUs isn't supported yet in GCP.
1. To test this PR locally, use the node e2e.
```shell
TEST_ARGS='--feature-gates=DynamicKubeletConfig=true' FOCUS=GPU SKIP="" make test-e2e-node
```
TODO:
- [x] Run manual tests
- [x] Add node e2e
- [x] Add unit tests for GPU manager (< 100% coverage)
- [ ] Add unit tests in kubelet package
Automatic merge from submit-queue (batch tested with PRs 41962, 42055, 42062, 42019, 42054)
PV upgrade test
**What this PR does / why we need it**:
This PR adds a PV upgrade test to the new upgrade test framework. Before, this test had to be done manually. Currently the upgrade test framework only works on the GCE environment, so I plan to add support for other providers later. In order to write the test, I had to modify and refactor some volume test util libraries. I reran the impacted tests to make sure they still passed.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
It's probably easier to review the two commits separately. I split it up into the refactor changes, and the upgrade test changes.
**Release note**:
NONE
cc @saad-ali @krousey
Automatic merge from submit-queue (batch tested with PRs 35408, 41915, 41992, 41964, 41925)
e2e/upgrade: add sysctls
Add sysctl upgrade tests.
How can these be effectively tested?
Automatic merge from submit-queue (batch tested with PRs 41994, 41969, 41997, 40952, 40576)
Guaranteed admission for Critical Pods
This is the first step in implementing node-level preemption for critical pods.
It defines the AdmissionFailureHandler interface, which allows callers, like the kubelet, to define how failed predicates are handled, and take steps to correct failures if necessary.
In the kubelet's implementation, it triggers preemption if the pod being admitted is critical, and if the only failed predicates are InsufficientResourceErrors, then it prempts (not yet implemented) other other pods to allow admission of the critical pod.
cc: @vishh
Fix bash command's execution in MakePod().
Add isPriviledged as a parameter to MakePod().
Move PD utils to pv_util.go
Ran all the tests in pd.go, persistent_volumes.go,
persistent_volumes-disruptive.go.
These changes are needed for the PV upgrade test I am working on.