Automatic merge from submit-queue (batch tested with PRs 49420, 49296, 49299, 49371, 46514)
Refactoring taint functions to reduce sprawl
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#45060
**Special notes for your reviewer**:
@gmarek @timothysc @k82cn @jayunit100 - I moved some fn's to helpers and some to utils. LMK, if you are ok with this change.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Add a new API version apps/v1beta2
xref: #49135
This PR adds a new API version `apps/v1beta2` which contains a copy (of types, conversions, and defaults) of `apps/v1beta1` StatefulSet, Deployment, and their subresources. Note that `apps/v1beta2` is still WIP and we will make breaking changes to it before releasing 1.8.
Moving core controllers (StatefulSet, Deployment, ReplicaSet, DaemonSet) to `apps/v1beta2` is the first step of moving them to `apps/v1` (GA).
This PR is a starting point for DaemonSet and ReplicaSet to move from `/extensions` to `/apps` and for Deployment and StatefulSet to make some breaking changes (e.g. new defaults and/or remove deprecated fields).
```release-note
Add a new API version apps/v1beta2
```
Automatic merge from submit-queue (batch tested with PRs 48377, 48940, 49144, 49062, 49148)
fixit: break sig-cluster-lifecycle tests into subpackage
this is part of fixit week. ref #49161
@kubernetes/sig-cluster-lifecycle-misc
Automatic merge from submit-queue (batch tested with PRs 47509, 46821, 45319, 49121, 49125)
Tolerate a missing MasterName (for GKE)
When testing, GKE created clusters don't provide a `MasterName`, so don't throw a warning and give up when that happens.
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 47509, 46821, 45319, 49121, 49125)
volume i/o tests for storage plugins
**What this PR does / why we need it**:
Addresses issues [25268](https://github.com/kubernetes/kubernetes/issues/25268) and [28367](https://github.com/kubernetes/kubernetes/issues/28367), though it may be weak re. the streaming i/o issue. @matchstick
**Special notes for your reviewer**:
This is a new file. Plugins other than NFS, GlusterFS, iSCSI, and Ceph-RBD code will need to be supported in a separate PR.
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 48043, 48200, 49139, 36238, 49130)
Implement equivalence cache by caching and re-using predicate result
The last part of #30844, I opened a new PR instead of overwrite the old one because we changed some basic assumption by allowing invalidating equivalence cache item by individual predicate.
The idea of this PR is based on discussion in https://github.com/kubernetes/kubernetes/issues/32024
- [x] Pods belong to same controllerRef considered to be equivalent
- [x] ` podFitsOnNode` will use cached predicate result if it's available
- [x] Equivalence cache will be updated when if a fresh new predicate is done
- [x] `factory.go` will invalid specific predicate cache(s) based on the object change
- [x] Since `schedule` and `bind` are async, we need to optimistically invalid affected cache(s) before `bind`
- [x] Fully unit test of affected files
- [x] e2e test to verify cache update/invalid workflow
- [x] performance test results
- [x] Some nits fixes related but expected to result in `needs-rebase` so they are split to: #36060#35968#37512
cc @wojtek-t @davidopp
Automatic merge from submit-queue
Azure PD (Managed/Blob)
This is exactly the same code as this [PR](https://github.com/kubernetes/kubernetes/pull/41950). It has a clean set of generated items. We created a separate PR to accelerate the accept/merge the PR
CC @colemickens
CC @brendandburns
**What this PR does / why we need it**:
1. Adds K8S support for Azure Managed Disks.
2. Adds support for dedicated blob disks (1:1 to storage account) in addition to shared blob disks (n:1 to storage account).
3. Automatically manages the underlying storage accounts. New storage accounts are created at 50% utilization. Max is 100 disks, 60 disks per storage account.
2. Addresses the current issues with Blob Disks:
..* Significantly faster attach process. Disks are now usually available for pods on nodes under 30 sec if formatted, under a min if not formatted.
..* Adds support to move disks between nodes.
..* Adds consistent attach/detach behavior, checks if the disk is leased/attached on a different node before attempting to attach to target nodes.
..* Fixes a random hang behavior on Azure VMs during mount/format (for both blob + managed disks).
..* Fixes a potential conflict by avoiding the use of disk names for mount paths. The new plugin uses hashed disk uri for mount path.
The existing AzureDisk is used as is. Additional "kind" property was added allowing the user to decide if the pd will be shared, dedicated or managed (Azure Managed Disks are used).
Due to the change in mounting paths, existing PDs need to be recreated as PV or PVCs on the new plugin.
Automatic merge from submit-queue (batch tested with PRs 48864, 48651, 47703)
Enable logexporter mechanism to dump logs from k8s nodes to GCS directly
Ref https://github.com/kubernetes/kubernetes/issues/48513
This adds support for logexporter from k8s side. Next I'll send a PR adding support from test-infra side.
/cc @kubernetes/sig-scalability-misc @kubernetes/test-infra-maintainers @fejta @wojtek-t @gmarek
Automatic merge from submit-queue
add dockershim checkpoint node e2e test
Add a bunch of disruptive cases to test kubelet/dockershim's checkpoint work flow.
Some steps are quite hacky. Not sure if there is better ways to do things.
Automatic merge from submit-queue (batch tested with PRs 48555, 48849)
GCE: Fix panic when service loadbalancer has static IP address
Fixes#48848
```release-note
Fix service controller crash loop when Service with GCP LoadBalancer uses static IP (#48848, @nicksardo)
```
Automatic merge from submit-queue (batch tested with PRs 48594, 47042, 48801, 48641, 48243)
Prepare to introduce websockets for exec and portforward
Refactor the code in remotecommand to better represent the structure of
what is common between portforward and exec.
Ref #48633
Automatic merge from submit-queue
[e2e-ingress] Get node tag from instance under GKE
**What this PR does / why we need it**: Making ingress CI green again.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#48167
**Special notes for your reviewer**:
/assign @nicksardo
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 47918, 47964, 48151, 47881, 48299)
Add ApiEndpoint support to GCE config.
**What this PR does / why we need it**:
Add the ability to change ApiEndpoint for GCE.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
None
```
Automatic merge from submit-queue (batch tested with PRs 47850, 47835, 46197, 47250, 48284)
Do not fail on error when deleting ingress
Fixes#48239
If the api server or master is unavailable, the test should manually teardown load balancer resources.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 47850, 47835, 46197, 47250, 48284)
Allocate clusterIP when change service type from ExternalName to ClusterIP
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#35354#46190
**Special notes for your reviewer**:
/cc @smarterclayton @thockin
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 48214, 48154)
Adding a retry and traceroute to the master version checking
This is hitting a lot of connection refused errors in the e2e upgrade tests. We should make this more robust in case this is intermittent network errors. In the event of an error, attempt to log a traceroute to the master.
cc @kubernetes/sig-cluster-lifecycle-bugs @dchen1107
#47379
Automatic merge from submit-queue (batch tested with PRs 48139, 48042, 47645, 48054, 48003)
Pipe clusterID into gce_loadbalancer_external.go
**What this PR does / why we need it**: Small cleanup for GCE ELB codes.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#48002
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 47915, 47856, 44086, 47575, 47475)
deprecate created-by annotation for e2e test framework
**What this PR does / why we need it**: This PR deprecates created-by annotation for e2e test framework. This is needed as we now have ControllerRef.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: xref #44407
**Special notes for your reviewer**: This is the third PR for deprecating created-by annotation. Other PRs can be found here: #47469, #47471
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 47470, 47260, 47411, 46852, 46135)
Write reports for each upgrade test
Due to the way Ginkgo runs individual test cases and the level of coordination required for the upgrade tests, they were all run under a single Ginkgo test case. This PR generates and auxiliary report that break out the results of each upgrade test. This is accomplished by:
1) Wrapping `ginkgo.Fail` and `ginkgo.Skip` to get the actual failure or skip messages.
2) Recovering that info in the upgrade test to generate an auxiliary report.
I suggest reviewing commit by commit.
Sample report: https://storage.googleapis.com/krouseytestreports/logs/results/1/artifacts/junit_upgrades.xmlFixes: #47371
Automatic merge from submit-queue (batch tested with PRs 46835, 46856)
Made WaitForReplicas and EnsureDesiredReplicas use PollImmediate and improved logging.
**What this PR does / why we need it**: Most importantly, this results in better logging: timeout is logged at the level of the caller, not the helper function, helping debugging.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 46885, 47197)
Fix e2e ns deletion message for flake analysis
**What this PR does / why we need it**:
Let's us know when pods have a missing deletion timestamp.
**Special notes for your reviewer**:
helps https://github.com/kubernetes/kubernetes/issues/47135
Automatic merge from submit-queue (batch tested with PRs 47065, 47157, 47143)
Use actual hostname when creating network e2e test pod
**What this PR does / why we need it**:
This changes a e2e framework network test Pod use the actual hostname value to match the `kubernetes.io/hostname` label in it's `NodeSelector`. Currently it assumes the Node name will match that hostname label which is not true in all environments.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
Fixescoreos/tectonic-installer#1018
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 46235, 44786, 46833, 46756, 46669)
implements StatefulSet update
**What this PR does / why we need it**:
1. Implements rolling update for StatefulSets
2. Implements controller history for StatefulSets.
3. Makes StatefulSet status reporting consistent with DaemonSet and ReplicaSet.
https://github.com/kubernetes/features/issues/188
**Special notes for your reviewer**:
**Release note**:
```release-note
Implements rolling update for StatefulSets. Updates can be performed using the RollingUpdate, Paritioned, or OnDelete strategies. OnDelete implements the manual behavior from 1.6. status now tracks
replicas, readyReplicas, currentReplicas, and updatedReplicas. The semantics of replicas is now consistent with DaemonSet and ReplicaSet, and readyReplicas has the semantics that replicas did prior to this release.
```
Automatic merge from submit-queue (batch tested with PRs 46997, 47021)
Don't parse human-readable output from gcloud in tests
This is the reason `[k8s.io] Services should be able to change the type and ports of a service [Slow]` is currently failing on GKE e2e tests. For GKE jobs we run a prerelease version of gcloud, in which the default command output was changed.
gcloud's default output for commands is human readable, and is subject to change. Anything scripting against gcloud should always pass `--format=json|yaml|value(...)` so you get standardized output.
fixes: #46918
Implements history utilities for ControllerRevision in the controller/history package
StatefulSetStatus now has additional fields for consistency with DaemonSet and Deployment
StatefulSetStatus.Replicas now represents the current number of createdPods and StatefulSetStatus.ReadyReplicas is the current number of ready Pods
Automatic merge from submit-queue (batch tested with PRs 46972, 42829, 46799, 46802, 46844)
Multizone static pv test
**What this PR does / why we need it**:
Adds an e2e test for checking that pods get scheduled to the same zone as statically created PVs. This tests the PersistentVolumeLabel admission controller, which adds zone and region labels when PVs are created. As part of this, I also had to make changes to volume test utility code to pass in a zone parameter for creating PDs, and also had to add an argument to the e2e test program to accept a list of zones.
Fixes#46995
**Special notes for your reviewer**:
It's probably easier to review each commit separately.
**Release note**:
NONE
Automatic merge from submit-queue
Respect PDBs during node upgrades and add test coverage to the ServiceTest upgrade test.
This is still a WIP... needs to be squashed at least, and I don't think it's currently passing until I increase the scale of the RC, but please have a look at the general outline. Thanks!
Fixes#38336
@kow3ns @bdbauer @krousey @erictune @maisem @davidopp
```
On GCE, node upgrades will now respect PodDisruptionBudgets, if present.
```
Respect PDBs during node upgrades and add test coverage to the
ServiceTest upgrade test. Modified that test so that we include pod
anti-affinity constraints and a PDB.
Automatic merge from submit-queue (batch tested with PRs 46076, 43879, 44897, 46556, 46654)
Local storage plugin
**What this PR does / why we need it**:
Volume plugin implementation for local persistent volumes. Scheduler predicate will direct already-bound PVCs to the node that the local PV is at. PVC binding still happens independently.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
Part of #43640
**Release note**:
```
Alpha feature: Local volume plugin allows local directories to be created and consumed as a Persistent Volume. These volumes have node affinity and pods will only be scheduled to the node that the volume is at.
```
Automatic merge from submit-queue
Support grabbing test suite metrics
**What this PR does / why we need it**:
Add support for grabbing metrics that cover the entire test suite's execution.
Update the "interesting" controller-manager metrics to match the
current names for the garbage collector, and add namespace controller
metrics to the list.
If you enable `--gather-suite-metrics-at-teardown`, the metrics file is written to a file with a name such as `MetricsForE2ESuite_2017-05-25T20:25:57Z.json` in the `--report-dir`. If you don't specify `--report-dir`, the metrics are written to the test log output.
I'd like to enable this for some of the `pull-*` CI jobs, which will require a separate PR to test-infra.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
@kubernetes/sig-testing-pr-reviews @smarterclayton @wojtek-t @gmarek @derekwaynecarr @timothysc
Automatic merge from submit-queue (batch tested with PRs 45534, 37212, 46613, 46350)
[e2e]Fix define redundant parameter
When timeout to reach HTTP service, redundant parameter make the
error is nil.
Automatic merge from submit-queue (batch tested with PRs 46407, 46457)
GCE - Refactor API for firewall and backend service creation
**What this PR does / why we need it**:
- Currently, firewall creation function actually instantiates the firewall object; this is inconsistent with the rest of GCE api calls. The API normally gets passed in an existing object.
- Necessary information for firewall creation, (`computeHostTags`,`nodeTags`,`networkURL`,`subnetworkURL`,`region`) were private to within the package. These now have public getters.
- Consumers might need to know whether the cluster is running on a cross-project network. A new `OnXPN` func will make that information available.
- Backend services for regions have been added. Global ones have been renamed to specify global.
- NamedPort management of instance groups has been changed from an `AddPortsToInstanceGroup` func (and missing complementary `Remove...`) to a single, simple `SetNamedPortsOfInstanceGroup`
- Addressed nitpick review comments of #45524
ILB needs the regional backend services and firewall refactor. The ingress controller needs the new `OnXPN` func to decide whether to create a firewall.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Apply KubeProxyEndpointLagTimeout to ESIPP tests
Fixes#46533.
The previous construction of ESIPP tests is weird, so I redo it a bit.
A 30 seconds `KubeProxyEndpointLagTimeout` is introduced, as these tests ain't verifying performance, may be better to not make it too tight.
/assign @thockin
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 46252, 45524, 46236, 46277, 46522)
Make GCE load-balancers create health checks for nodes
From #14661. Proposal on kubernetes/community#552. Fixes#46313.
Bullet points:
- Create nodes health check and firewall (for health checking) for non-OnlyLocal service.
- Create local traffic health check and firewall (for health checking) for OnlyLocal service.
- Version skew:
- Don't create nodes health check if any nodes has version < 1.7.0.
- Don't backfill nodes health check on existing LBs unless users explicitly trigger it.
**Release note**:
```release-note
GCE Cloud Provider: New created LoadBalancer type Service now have health checks for nodes by default.
An existing LoadBalancer will have health check attached to it when:
- Change Service.Spec.Type from LoadBalancer to others and flip it back.
- Any effective change on Service.Spec.ExternalTrafficPolicy.
```
Update the "interesting" controller-manager metrics to match the
current names for the garbage collector, and add namespace controller
metrics to the list.
Automatic merge from submit-queue (batch tested with PRs 45949, 46009, 46320, 46423, 46437)
Unregister some metrics
delete some registered metrics since they are not observed
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 45573, 46354, 46376, 46162, 46366)
Subresources are not included in apiserver prometheus metrics
Subresources are very often completely different code paths and errors
generated on those code paths are important to distinguish.
@kubernetes/sig-api-machinery-pr-reviews
```release-note
The Prometheus metrics for the kube-apiserver for tracking incoming API requests and latencies now return the `subresource` label for correctly attributing the type of API call.
```
Automatic merge from submit-queue (batch tested with PRs 38990, 45781, 46225, 44899, 43663)
Support parallel scaling on StatefulSets
Fixes#41255
```release-note
StatefulSets now include an alpha scaling feature accessible by setting the `spec.podManagementPolicy` field to `Parallel`. The controller will not wait for pods to be ready before adding the other pods, and will replace deleted pods as needed. Since parallel scaling creates pods out of order, you cannot depend on predictable membership changes within your set.
```
Module remotecommand originally part of kubernetes/pkg/client/unversioned was moved
to client-go/tools, and will be used as authoritative in kubectl, e2e and other places.
Module remotecommand relies on util/exec module which will be copied to client-go/pkg/util
Automatic merge from submit-queue
Reorganize kubelet tree so apis can be independently versioned
@yujuhong @lavalamp @thockin @bgrant0607
This is an example of how we might reorganize `pkg/kubelet` so the apis it exposes can be independently versioned. This would also provide a logical place to put the `KubeletConfiguration` type, which currently lives in `pkg/apis/componentconfig`; it could live in e.g. `pkg/kubelet/apis/config` instead.
Take a look when you have a chance and let me know what you think. The most significant change in this PR is reorganizing `pkg/kubelet/api` to `pkg/kubelet/apis`, the rest is pretty much updating import paths and `BUILD` files.
Automatic merge from submit-queue (batch tested with PRs 45623, 45241, 45460, 41162)
Promotes Source IP preservation for Virtual IPs from Beta to GA
Fixes#33625. Feature issue: kubernetes/features#27.
Bullet points:
- Declare 2 fields (ExternalTraffic and HealthCheckNodePort) that mirror the ESIPP annotations.
- ESIPP alpha annotations will be ignored.
- Existing ESIPP beta annotations will still be fully supported.
- Allow promoting beta annotations to first class fields or reversely.
- Disallow setting invalid ExternalTraffic and HealthCheckNodePort on services. Default ExternalTraffic field for nodePort or loadBalancer type service to "Global" if not set.
**Release note**:
```release-note
Promotes Source IP preservation for Virtual IPs to GA.
Two api fields are defined correspondingly:
- Service.Spec.ExternalTrafficPolicy <- 'service.beta.kubernetes.io/external-traffic' annotation.
- Service.Spec.HealthCheckNodePort <- 'service.beta.kubernetes.io/healthcheck-nodeport' annotation.
```
Automatic merge from submit-queue (batch tested with PRs 45685, 45572, 45624, 45723, 45733)
Remove a test utility function that is redundant and kinda broken
Framework.WaitForAnEndpoint() has no timeout, so if something goes wrong and the endpoint doesn't get created, the test will hang forever. (This is happening for some reason sometimes in OpenShift right now, and when the CI system eventually times out and kills the VM, it loses the logs that would explain what failed.)
There's already another nearly-identical WaitForEndpoint() method that *does* take a timeout, so people can just use that instead.
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 45653, 45719, 45729, 45730, 44250)
Print pod startup latency metric as perfdata
Follows #45657
This should print pod startup latency in same format as api calls latencies.
cc @wojtek-t @gmarek