Automatic merge from submit-queue
Block on master-creation step for large clusters (>50 nodes) in kube-up
I recently noticed a failure in our 5000-node scale test where the master failed to initialize within time. But it went on and created all 5000 nodes due to not blocking on master creation. Turned out the master VM wasn't even created:
```
W0808 10:00:49.340] ERROR: (gcloud.compute.instances.create) Could not fetch resource:
... Try a different zone, or try again later.
```
Even some of our 100-node tests are flaking occasionally during cluster startup (with master validation step timing out) and I think the reason is the same (issue - https://github.com/kubernetes/kubernetes/issues/49453)
We should block on that step for large clusters.
cc @kubernetes/sig-scalability-misc @gmarek
Currently, in federated end2end tests, the creation of services are
done with a randomize NodePort selection. It causing e2e test
flakes if the creation of a federated service failed if the port is
not available.
Now the util.CreateService(...) function is re trying to create the
service on different nodePort in an error case. The method retries until
success or 10 creation retry with other random NodePorts.
If never the service has not been created properly on one of the
federated cluster, a Service shards cleanup is executed before retrying
again the federated service creation.
fixes#44018
Automatic merge from submit-queue
Add a simple cloud provider for e2e tests on kubemark
**What this PR does / why we need it**:
Adds a simplified cloud provider for kubemark. This enables us to add and
remove nodes and operate on nodegroups while running tests on kubemark.
This is needed to run scalability tests for cluster autoscaler on kubemark.
See https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/proposals/kubemark_integration.md
**Release note**:
```
NONE
```
Currently getPortByIp() get port of instance only based on IP.
If there are two instances in diffent network and the CIDR of
their subnet are same, getPortByIp() will be conflict.
My PR gets port based on IP and Name of instance.
Automatic merge from submit-queue (batch tested with PRs 48068, 49587)
DNS name error message improvement
**What this PR does / why we need it**:
Small error message fix. The error message misled me slightly / would have saved time if correct
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
n/a
**Special notes for your reviewer**:
n/a
**Release note**:
n/a
Automatic merge from submit-queue
Add e2e test for cronjob chained removal
This is test proving https://github.com/kubernetes/kubernetes/pull/44058 works with cronjobs. This will fail until the aforementioned PR merges.
@caesarxuchao ptal
Automatic merge from submit-queue
complete and correct code comment
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 50254, 50174, 50179)
Moved node/testutil to upper dir.
**What this PR does / why we need it**:
Moved node/testutil to upper directory (`pkg/controller`); it's used by several controllers, and we need to test some helper func in `pkg/controller`.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: part of #49522
**Release note**:
```release-note
N/A
```
Automatic merge from submit-queue (batch tested with PRs 50254, 50174, 50179)
kubeadm: Add back labels for the Static Pod control plane
**What this PR does / why we need it**:
This Labels section has been removed now for a short period during the v1.8 dev cycle, but I found a good use-case for it; namely filtering Mirror Pods by the `component=kube-*` label when waiting for the self-hosted control plane to come up after an upgrade. It's not _really_ neccessary, but nice to have.
Also noticed the lack of coverage for this func, so added a small unit test.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Dependency for: https://github.com/kubernetes/kubernetes/pull/48899
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
@kubernetes/sig-cluster-lifecycle-pr-reviews @dmmcquay @timothysc @mattmoyer
Automatic merge from submit-queue (batch tested with PRs 50254, 50174, 50179)
Revert "Merge pull request #47353 from apelisse/http-cache"
Some issues were discovered with the caching merged in #47353:
* uses a disk-based cache that is not safe between processes (does not use atomic fs operations)
* writes get/list responses to disk that should not be cached (like `kubectl get secrets`)
* is vulnerable to partially written cache responses being used as responses to future requests
* breaks uses of the client transport that make use of websockets
* defaults to enabling the cache for any client builder using RecommendedConfigOverrideFlags or DefaultClientConfig which affects more components than just kubectl
This reverts commit fc89743dca, reversing changes made to 29ab38e898.
Automatic merge from submit-queue (batch tested with PRs 45993, 50293)
[Federation] HPA controller
This PR implements the design listed in https://github.com/kubernetes/community/pull/593.
This is still a work in progress, and needs more unit tests to be added.
I will add the integration tests and e2e tests in a separate PR(s).
@kubernetes/sig-federation-pr-reviews
**Release note**:
```
Horizontal Pod Autoscaling is now available as an alpha feature in federation.
It can be used to distribute and scale workload across clusters joined in a federation.
In its current form, it works only on cpu utilization and the support for other metrics is yet to be built in.
```
Automatic merge from submit-queue (batch tested with PRs 50208, 50259, 49702, 50267, 48986)
Relax restrictions on environment variable names.
Fixes#2707
The POSIX standard restricts environment variable names to uppercase letters, digits, and the underscore character in shell contexts only. For generic application usage, it is stated that all other characters shall be tolerated. (Reference [here](http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap08.html), my prose reasoning [here](https://github.com/kubernetes/kubernetes/issues/2707#issuecomment-285309156).)
This change relaxes the rules to some degree. Namely, we stop requiring environment variable names to be strict `C_IDENTIFIERS` and start permitting lowercase, dot, and dash characters.
Public container images using environment variable names beyond the shell-only context can benefit from this relaxation. Elasticsearch is one popular example.
Automatic merge from submit-queue (batch tested with PRs 50208, 50259, 49702, 50267, 48986)
Move ownership of proxy test to sig-network directory
```release-note
None
```
Automatic merge from submit-queue (batch tested with PRs 50208, 50259, 49702, 50267, 48986)
fix the typo of intializing
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50208, 50259, 49702, 50267, 48986)
provide the failing health as part of the controller error
When the controllers fail to start because the master is unhealthy, the healthz message is a useful starting point for debugging. This provides it in the error returned.
Automatic merge from submit-queue
Ignore the available volume when calling DetachDisk
Fix#50207
If user detachs the volume by nova in openstack env, volume becomes
available. If nova instance is been deleted, nova will detach it
automatically and become available. So the "available" is fine since that means the
volume is detached from instance already.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Removed un-used InodePressure condition.
**What this PR does / why we need it**:
Removed un-used InodePressure condition; kubelet did not report it anymore, so remove it.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#49103
**Release note**:
```release-note
The node condition 'NodeInodePressure' was removed, as kubelet did not report it.
```