Automatic merge from submit-queue (batch tested with PRs 53227, 53120). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
E2E test to verify clean up of stale dummy VM for vSphere dynamic provisioning
Verify if the dummy stale VM's created during dynamic provisioning are deleted by the clean up routine in vSphere cloud Provider.
**Testing Done:**
- Create a storage class with invalid policy on a VSAN datastore.
- Create a PVC using the above storage class
- Verify if the PVC is not bound.
- Delete the PVC.
- Sleep for 6 minutes so that vSphere Cloud Provider clean up routine can delete the stale dummy VM's.
- Verify if the VM is not present. Otherwise fail the test.
@rohitjogvmw @divyenpatel
```release-note
None
```
e2e tests provide only an (IPv4) ping test for external connectivity.
We need a way to conditionally run a ping6 external connectivity check,
and disable the (IPv4) ping-based external connectivity check,
for end-to-end testing on IPv6-only clusters.
This feature will be needed for creating gating IPv6 CI tests.
fixes#53383
RunCmd uses Go's os/exec library to run commands directly. Since these
are not run through a shell, we can't use shell syntax for piping for
file redirection. The proper way to do that is to create a Command
object and set the Std{in,out,err} pipes appropriately. Luckily sed
can handle the behavior we need without having to manually set this up.
Automatic merge from submit-queue (batch tested with PRs 50988, 50509, 52660, 52663, 52250). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Added device plugin e2e kubelet failure test
Signed-off-by: Renaud Gaubert <renaud.gaubert@gmail.com>
**What this PR does / why we need it**:
This is part of issue #52859 (fixes#52859)
This PR adds a e2e_node test for the device plugin.
Specifically it implements testing of failure handling by the device plugin components in case Kubelet restart / crashes.
I might try to refactor the GPU tests in a later PR.
**Special notes for your reviewer**:
@jiayingz @vishh
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 52721, 53057, 52493, 52998, 52896). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Move deployment collision avoidance e2e test to integration
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: ref #52113
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51067, 52319, 52803, 52961, 51972). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Add support for skeleton in GetSigner
Adding support for skeleton to GetSigner to be able to run
e2e tests against a bare metal multinode cluster.
Closes#35613
Automatic merge from submit-queue (batch tested with PRs 52905, 52766). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Refactor parsing cluster autoscaler status, add logging error
Minor improvements to autoscaling test suite and e2e framework.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Fix GCE LB resource cleanup for service e2e tests.
**What this PR does / why we need it**: Fix GCE LB resource cleanup logic.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#52347
**Special notes for your reviewer**:
/assign @shyamjvs @nicksardo
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 52485, 52443, 52597, 52450, 51971). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Removing PrometheusPushGateway --prom-push-gateway flag from e2e tests.
**What this PR does / why we need it**: Removing obsolete PrometheusPushGateway --prom-push-gateway flag from e2e tests.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#45947
**Special notes for your reviewer**:
**Release note**:
```release-note
Removing `--prom-push-gateway` flag from e2e tests
```
Automatic merge from submit-queue (batch tested with PRs 50392, 52108, 52083, 52134, 51526). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
e2e: minor changes to network/service testing utils
Add more logging to help debug. Also refactor several functions to improve
reusability.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Support kubernetes-anywhere provider
**What this PR does / why we need it**:
Implements a new `kubernetes-anywhere` provider to allow upgrade testing in the e2e binary. This is the final step to allow https://github.com/kubernetes/test-infra/pull/4495 and https://github.com/kubernetes/kubernetes-anywhere/pull/450.
**Which issue this PR fixes**:
https://github.com/kubernetes/kubeadm/issues/311
**Special notes for your reviewer**:
Some questions I had
- Does the `--provider` flag specified [here](dbbf6261e0/jobs/config.json (L8587)) get sent to the flag defined [here](https://github.com/kubernetes/kubernetes/blob/master/test/e2e/framework/test_context.go#L219)? Or should I add another `--provider` flag inside `--upgrade_args` like this: `--upgrade_args=... --provider=kubernetes-anywhere`?
- Is it necessary to add waiting logic after the `make` command, or will it implicitly handle that by itself?
Some other points:
- I chose `sed` to manipulate the current kubernetes-anywhere `.config` rather than duplicating another [`anywhere.go`](https://github.com/kubernetes/test-infra/blob/master/kubetest/anywhere.go). One suggestion was to use `jq` but since the config on disk is not serialized to JSON yet, I'm not sure how that'd work.
- Since I don't have a GCE/GKE account or vCenter, I can't actually verify the e2e binary works. I've managed to build it, but if somebody could quickly run a smoke test, I'd appreciate it. This is my first poke around test-infra and e2e, so there might be some plumbing missing
/cc @jessicaochen @luxas @pipejakob @roberthbailey
Automatic merge from submit-queue (batch tested with PRs 51824, 50476, 52451, 52009, 52237)
Improve apiserver metrics reporting
Normalize "WATCHLIST" to "WATCH", add "scope" to the other metrics (listing 50k pods is != listing pods in a namespace), and add a new scope "resource" to cover individual resource calls.
This roughly aligns metrics with our ACL model (technically resource scope is GET, but POST to a subresource and POST to a namespace are not the same thing).
```release-note
WATCHLIST calls are now reported as WATCH verbs in prometheus for the apiserver_request_* series. A new "scope" label is added to all apiserver_request_* values that is either 'cluster', 'resource', or 'namespace' depending on which level the query is performed at.
```
Automatic merge from submit-queue (batch tested with PRs 52376, 52439, 52382, 52358, 52372)
Remove the conversion of client config
It was needed because the clientset code in client-go was a copy of the clientset code in Kubernetes.. client-go is authoritative now, so we can remove the nasty copy.
It turns out that at some points while the Node is recovering from a
reboot, we get a different kind of error ("unable to upgrade
connection"). Since we can't distinguish these transient errors from an
error encountered after successfully executing the remote command,
let's just retry all errors for 5min. If this doesn't work, I'm gonna
blame it on sig-node.
Automatic merge from submit-queue
Pipe in upgrade image target for kube-proxy migration tests
**What this PR does / why we need it**:
https://k8s-testgrid.appspot.com/sig-network#gci-gce-latest-upgrade-kube-proxy-ds&width=20
and
https://k8s-testgrid.appspot.com/sig-network#gci-gce-latest-downgrade-kube-proxy-ds&width=20
are still failing.
Reproduced it locally and found node image is being default to debian during upgrade (it was gci before upgrade) because we don't pass in `gci` via `--upgrade--target`. And for some reasons (haven't figured out yet), the upgraded node uses debian image with gci startupscripts...
This PR pipes in `--upgrade-target` for kube-proxy migration tests, hopefully in conjunction with https://github.com/kubernetes/test-infra/pull/4447 it will bring the tests back to normal.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #NONE
**Special notes for your reviewer**:
Sorry for bothering again.
/assign @krousey
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 52097, 52054)
Move paused deployment e2e tests to integration
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: xref #52113
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
StatefulSet: Deflake e2e RunHostCmd.
The initial retry up to 20s was giving up too soon. I'm seeing this test flake because the Node rebooted and it takes ~2min to recover. Now StatefulSet RunHostCmd calls will use the same 5min timeout as with other Pod state checks.
ref #48031
The initial retry up to 20s was giving up too soon.
I'm seeing this test flake because the Node rebooted and it takes ~2min
to recover.
Now StatefulSet RunHostCmd calls will use the same 5min timeout as with
other Pod state checks.
The feature is still Alpha and at times, the IP address previously used
by the load balancer in the test will not completely freed even after
the load balancer is long gone. In this case, the test URL with the IP
would return a 404 response. Tolerate this error and retry until the new
load balancer is fully established.
Job failure policy integration in JobController. From the
JobSpec.BackoffLimit the JobController will define the backoff
duration between Job retry.
It use the ```workqueue.RateLimitingInterface``` to store the number of
"retry" as "requeue" and the default Job backoff initial duration is set
during the initialization of the ```workqueue.RateLimiter.
Since the number of retry for each job is store in a local structure
"JobController.queue" if the JobController restarts the number of retries
will be lost and the backoff duration will be reset to 0.
Add e2e test for Job backoff failure policy