Automatic merge from submit-queue (batch tested with PRs 50550, 50768)
Don't SSH to master for metrics in case of GKE
cc @kubernetes/sig-scalability-misc @crassirostris
Automatic merge from submit-queue (batch tested with PRs 49904, 50484, 50214)
Adding support for internal IP for e2e tests
Currently IssueSSHComand in util.go only checks for External IP address
to ssh, this PR adds check for internal IP too.
Closes#50630
Automatic merge from submit-queue (batch tested with PRs 50537, 49699, 50160, 49025, 50205)
When not using a CloudProvider, set both InternalIP and ExternalIP on Nodes
#36095 changed all of the cloudproviders to set both InternalIP and ExternalIP on Nodes, but the non-cloudprovider fallback code now only sets InternalIP.
This causes the test "should be able to create a functioning NodePort service" in test/e2e/service.go to fail on cloud-provider-less clusters, because (with LegacyHostIP gone), it now will only try to work with ExternalIPs, and will fail if the node has only an InternalIP.
There isn't much other code that assumes that ExternalIP will always be set (there's something in pkg/master/master.go, but I don't know what it's doing, so maybe it's only useful in the case where InternalIP != ExternalIP anyway). But given that several of the cloudproviders (mesos, ovirt, rackspace) now explicitly set both InternalIP and ExternalIP to the same value always, it seemed right to do that in the fallback case too.
@deads2k FYI
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Remove deprecated ESIPP beta annotations
**What this PR does / why we need it**:
Remove deprecated ESIPP beta annotations.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#50187
**Special notes for your reviewer**:
/assign @MrHohn
/sig network
**Release note**:
```release-note
Beta annotations `service.beta.kubernetes.io/external-traffic` and `service.beta.kubernetes.io/healthcheck-nodeport` have been removed. Please use fields `service.spec.externalTrafficPolicy` and `service.spec.healthCheckNodePort` instead.
```
Automatic merge from submit-queue
Migrate to controller references helpers in meta/v1
**What this PR does / why we need it**:
This is a follow up for #48319 that migrates all method usages to new methods in meta/v1.
**Special notes for your reviewer**:
Looking at each commit individually might be easier.
**Release note**:
```release-note
NONE
```
/sig api-machinery
/kind cleanup
Automatic merge from submit-queue (batch tested with PRs 45186, 50440)
Add functionality needed by Cluster Autoscaler to Kubemark Provider.
Make adding nodes asynchronous. Add method for getting target
size of node group. Add method for getting node group for node.
Factor out some common code.
**Release note**:
```
NONE
```
Make adding nodes asynchronous. Add method for getting target
size of node group. Add method for getting node group for node.
Factor out some common code.
Automatic merge from submit-queue
Add waitForFailure for e2e test framework
**What this PR does / why we need it**:
Add waitForFailure for e2e test framework, this could reduce the reliance on logs.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
Part of #44118. Refer https://github.com/kubernetes/kubernetes/pull/48858#discussion_r128331726
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Add a simple cloud provider for e2e tests on kubemark
**What this PR does / why we need it**:
Adds a simplified cloud provider for kubemark. This enables us to add and
remove nodes and operate on nodegroups while running tests on kubemark.
This is needed to run scalability tests for cluster autoscaler on kubemark.
See https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/proposals/kubemark_integration.md
**Release note**:
```
NONE
```
Automatic merge from submit-queue
StatefulSet scale subresource
**What this PR does / why we need it**: This PR implements scale subresource for StatefulSet.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#46005
**Special notes for your reviewer**:
**Release note**:
```release-note
StatefulSet uses scale subresource when scaling in accord with ReplicationController, ReplicaSet, and Deployment implementations.
```
**Feature Checklist**:
- [x] Introduce Registry interface for storage purpose
- [x] Introduce `ScaleREST New(), Get() and Update()` utility functions
- [x] Create a `ScaleREST` object at `NewREST()` and return it
- [x] Enable scale subresource by adding `/scale` field to the storage map
**Testing Checklist**:
- Unit testing
- [x] Modify `newStorage()` to call `NewStorage()`, and change all unit tests accordingly
- [x] Add unit tests for `ScaleREST Get() and Update()` utility functions
- [x] Add missing unit test for `ShortNames`
- Manual testing
- [x] Verify existence of the subresource using `kubectl proxy` command
- [x] Modify the subresource using `curl` via `POST`
- e2e testing
- [x] Add e2e tests using `RESTClient`
Automatic merge from submit-queue (batch tested with PRs 49524, 46760, 50206, 50166, 49603)
Handled taints on node in batch.
**What this PR does / why we need it**:
Enhanced helpers to handled taints on node in batch.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#49522
**Release note**:
```release-note
None
```
This is needed for cluster autoscaler e2e test to
run on kubemark. We need the ability to add and
remove nodes and operate on nodegroups. Kubemark
does not provide this at the moment.
Automatic merge from submit-queue (batch tested with PRs 50091, 50231, 50238, 50236, 50243)
Fix storage tests for multizone test configuration.
**What this PR does / why we need it**:
This PR modifies "[sig-storage] Volumes PD should be mountable with (ext3|ext4)" tests to schedule pods in zone, where PD is created.
This is to make the test work in multizone environment.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 50091, 50231, 50238, 50236, 50243)
Move the sig-instrumentation test to a dedicated folder
Move the last remaining test to sig-instrumentation folder, also move "metrics" package to the "framework" folder
Related issue: https://github.com/kubernetes/kubernetes/issues/49161
/cc @xiangpengzhao @piosz
Automatic merge from submit-queue (batch tested with PRs 50091, 50231, 50238, 50236, 50243)
Modify e2e.go to arbitrarily pick one of zones we have nodes in for multizone tests.
**What this PR does / why we need it**:
When e2e runs in multizone configuration, zone config property can be empty.
This PR, in that case, overrides an empty value with arbitrarily chosen zone that we have nodes in.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 50119, 48366, 47181, 41611, 49547)
Add basic install and mount flexvolumes e2e tests
fixes https://github.com/kubernetes/kubernetes/issues/47010
These two tests install a skeleton "dummy" flex driver, attachable and non-attachable respectively, then test that a pod can successfully use the flex driver. They are labeled disruptive because kubelet and controller-manager get restarted as part of the flex install. IMO it's important to keep this install procedure as part of the test to isolate any bugs with the startup plugin probe code.
There is a bit of an ugly dependency on cluster/gce/config-test.sh because --flex-volume-plugin-dir must be set to a dir that's readable from controller-manager container and writable by the flex e2e test. The default path is not writable on GCE masters with read-only root so I picked a location that looks okay.
In the "dummy" drivers I trick kubelet into thinking there is a mount point by doing "mount -t tmpfs none ${MNTPATH} >/dev/null 2>&1", hope that is okay.
I have only tested on GCE and theoretically they may work on AWS but I don't think there is a need to test on multiple cloudproviders.
-->
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50000, 49954, 49943, 50018, 49607)
Update the DeleteReplicaSet in rs_util.go to use server side reaper
**What this PR does / why we need it**:
fix#47832
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#47832
**Special notes for your reviewer**:
**Release note**:
```
None
```
Automatic merge from submit-queue (batch tested with PRs 49990, 49997, 44278, 49936, 49891)
Allow mode in e2e-framework to gather metrics only from master
This should enable getting metrics for our 5k-node clusters.
cc @kubernetes/sig-scalability-misc @gmarek
Automatic merge from submit-queue (batch tested with PRs 49898, 49897, 49919, 48860, 49491)
Fix usage a make(struct, len()) followed by append()
A couple of places in the code we allocate with make() but then use
append(), instead of copy() or direct assignment. This results in a
slice with len() zero elements at the front followed by the expected
data. The correct form for such usage is `make(struct, 0, len())`.
I found these by running:
```
$ git grep -EI -A7 'make\([^,]*, len\(' | grep 'append(' -B7 | grep -v vendor
```
And then manually looking through the results. I'm sure something better
could exist.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 49651, 49707, 49662, 47019, 49747)
improve detectability of deleted pods
**What this PR does / why we need it**:
Adds comment to `waitForPodTerminatedInNamespace` to better explain how it's implemented.
~~It improves pod deletion detection in the e2e framework as follows:~~
~~1. the `waitForPodTerminatedInNamespace` func looks for pod.Status.Phase == _PodFailed_ or _PodSucceeded_ since both values imply that all containers have terminated.~~
~~2. the `waitForPodTerminatedInNamespace` func also ignores the pod's Reason if the passed-in `reason` parm is "". Reason is not really relevant to the pod being deleted or not, but if the caller passes a non-blank `reason` then it will be lower-cased, de-blanked and compared to the pod's Reason (also lower-cased and de-blanked). The idea is to make Reason checking more flexible and to prevent a pod from being considered running when all of its containers have terminated just because of a Reason mis-match.~~
Releated to pr [49597](https://github.com/kubernetes/kubernetes/pull/49597) and issue [49529](https://github.com/kubernetes/kubernetes/issues/49529).
**Release note**:
```release-note
NONE
```
A couple of places in the code we allocate with make() but then use
append(), instead of copy() or direct assignment. This results in a
slice with len() zero elements at the front followed by the expected
data. The correct form for such usage is `make(struct, 0, len())`.
I found these by running:
```
$ git grep -EI -A7 'make\([^,]*, len\(' | grep 'append(' -B7 | grep -v vendor
```
And then manually looking through the results. I'm sure something better
could exist.
Automatic merge from submit-queue (batch tested with PRs 49538, 49708, 47665, 49750, 49528)
Add a support for GKE regional clusters in e2e tests.
**What this PR does / why we need it**:
Add a support for GKE regional clusters in e2e tests.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 45813, 49594, 49443, 49167, 47539)
Add node e2e tests for GKE environment
Ref: https://github.com/kubernetes/kubernetes/issues/46891
This PR adds node e2e tests for validating images used on GKE.
- We pass the `SYSTEM_SPEC_NAME` to the node e2e test process via the flag `--system-spec-name` so that we can skip the environment specific tests using `RunIfSystemSpecNameIs()`.
- Also added `SkipIfContainerRuntimeIs()` as the opposite of `RunIfContainerRuntimeIs()`.
**Release note**:
```
None
```
Automatic merge from submit-queue (batch tested with PRs 49619, 49598, 47267, 49597, 49638)
improve log for pod deletion poll loop
**What this PR does / why we need it**:
It improves some logging related to waiting for a pod to reach a passed-in condition. Specifically, related to issue [49529](https://github.com/kubernetes/kubernetes/issues/49529) where better logging may help to debug the root cause.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 49420, 49296, 49299, 49371, 46514)
Refactoring taint functions to reduce sprawl
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#45060
**Special notes for your reviewer**:
@gmarek @timothysc @k82cn @jayunit100 - I moved some fn's to helpers and some to utils. LMK, if you are ok with this change.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Add a new API version apps/v1beta2
xref: #49135
This PR adds a new API version `apps/v1beta2` which contains a copy (of types, conversions, and defaults) of `apps/v1beta1` StatefulSet, Deployment, and their subresources. Note that `apps/v1beta2` is still WIP and we will make breaking changes to it before releasing 1.8.
Moving core controllers (StatefulSet, Deployment, ReplicaSet, DaemonSet) to `apps/v1beta2` is the first step of moving them to `apps/v1` (GA).
This PR is a starting point for DaemonSet and ReplicaSet to move from `/extensions` to `/apps` and for Deployment and StatefulSet to make some breaking changes (e.g. new defaults and/or remove deprecated fields).
```release-note
Add a new API version apps/v1beta2
```
Automatic merge from submit-queue (batch tested with PRs 48377, 48940, 49144, 49062, 49148)
fixit: break sig-cluster-lifecycle tests into subpackage
this is part of fixit week. ref #49161
@kubernetes/sig-cluster-lifecycle-misc
Automatic merge from submit-queue (batch tested with PRs 47509, 46821, 45319, 49121, 49125)
Tolerate a missing MasterName (for GKE)
When testing, GKE created clusters don't provide a `MasterName`, so don't throw a warning and give up when that happens.
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 47509, 46821, 45319, 49121, 49125)
volume i/o tests for storage plugins
**What this PR does / why we need it**:
Addresses issues [25268](https://github.com/kubernetes/kubernetes/issues/25268) and [28367](https://github.com/kubernetes/kubernetes/issues/28367), though it may be weak re. the streaming i/o issue. @matchstick
**Special notes for your reviewer**:
This is a new file. Plugins other than NFS, GlusterFS, iSCSI, and Ceph-RBD code will need to be supported in a separate PR.
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 48043, 48200, 49139, 36238, 49130)
Implement equivalence cache by caching and re-using predicate result
The last part of #30844, I opened a new PR instead of overwrite the old one because we changed some basic assumption by allowing invalidating equivalence cache item by individual predicate.
The idea of this PR is based on discussion in https://github.com/kubernetes/kubernetes/issues/32024
- [x] Pods belong to same controllerRef considered to be equivalent
- [x] ` podFitsOnNode` will use cached predicate result if it's available
- [x] Equivalence cache will be updated when if a fresh new predicate is done
- [x] `factory.go` will invalid specific predicate cache(s) based on the object change
- [x] Since `schedule` and `bind` are async, we need to optimistically invalid affected cache(s) before `bind`
- [x] Fully unit test of affected files
- [x] e2e test to verify cache update/invalid workflow
- [x] performance test results
- [x] Some nits fixes related but expected to result in `needs-rebase` so they are split to: #36060#35968#37512
cc @wojtek-t @davidopp
Automatic merge from submit-queue
Azure PD (Managed/Blob)
This is exactly the same code as this [PR](https://github.com/kubernetes/kubernetes/pull/41950). It has a clean set of generated items. We created a separate PR to accelerate the accept/merge the PR
CC @colemickens
CC @brendandburns
**What this PR does / why we need it**:
1. Adds K8S support for Azure Managed Disks.
2. Adds support for dedicated blob disks (1:1 to storage account) in addition to shared blob disks (n:1 to storage account).
3. Automatically manages the underlying storage accounts. New storage accounts are created at 50% utilization. Max is 100 disks, 60 disks per storage account.
2. Addresses the current issues with Blob Disks:
..* Significantly faster attach process. Disks are now usually available for pods on nodes under 30 sec if formatted, under a min if not formatted.
..* Adds support to move disks between nodes.
..* Adds consistent attach/detach behavior, checks if the disk is leased/attached on a different node before attempting to attach to target nodes.
..* Fixes a random hang behavior on Azure VMs during mount/format (for both blob + managed disks).
..* Fixes a potential conflict by avoiding the use of disk names for mount paths. The new plugin uses hashed disk uri for mount path.
The existing AzureDisk is used as is. Additional "kind" property was added allowing the user to decide if the pd will be shared, dedicated or managed (Azure Managed Disks are used).
Due to the change in mounting paths, existing PDs need to be recreated as PV or PVCs on the new plugin.
Automatic merge from submit-queue (batch tested with PRs 48864, 48651, 47703)
Enable logexporter mechanism to dump logs from k8s nodes to GCS directly
Ref https://github.com/kubernetes/kubernetes/issues/48513
This adds support for logexporter from k8s side. Next I'll send a PR adding support from test-infra side.
/cc @kubernetes/sig-scalability-misc @kubernetes/test-infra-maintainers @fejta @wojtek-t @gmarek
Automatic merge from submit-queue
add dockershim checkpoint node e2e test
Add a bunch of disruptive cases to test kubelet/dockershim's checkpoint work flow.
Some steps are quite hacky. Not sure if there is better ways to do things.
Automatic merge from submit-queue (batch tested with PRs 48555, 48849)
GCE: Fix panic when service loadbalancer has static IP address
Fixes#48848
```release-note
Fix service controller crash loop when Service with GCP LoadBalancer uses static IP (#48848, @nicksardo)
```
Automatic merge from submit-queue (batch tested with PRs 48594, 47042, 48801, 48641, 48243)
Prepare to introduce websockets for exec and portforward
Refactor the code in remotecommand to better represent the structure of
what is common between portforward and exec.
Ref #48633
Automatic merge from submit-queue
[e2e-ingress] Get node tag from instance under GKE
**What this PR does / why we need it**: Making ingress CI green again.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#48167
**Special notes for your reviewer**:
/assign @nicksardo
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 47918, 47964, 48151, 47881, 48299)
Add ApiEndpoint support to GCE config.
**What this PR does / why we need it**:
Add the ability to change ApiEndpoint for GCE.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
None
```
Automatic merge from submit-queue (batch tested with PRs 47850, 47835, 46197, 47250, 48284)
Do not fail on error when deleting ingress
Fixes#48239
If the api server or master is unavailable, the test should manually teardown load balancer resources.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 47850, 47835, 46197, 47250, 48284)
Allocate clusterIP when change service type from ExternalName to ClusterIP
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#35354#46190
**Special notes for your reviewer**:
/cc @smarterclayton @thockin
**Release note**:
```release-note
NONE
```