Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix computing number of nodes to break in CA e2e unhealthy cluster test.
**What this PR does / why we need it**:
Fixes computing number of nodes to break in the e2e for Cluster Autoscaler acting on an unhealthy cluster.
**Release note**:
```
NONE
```
@MaciekPytel
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
add pod UID
**What this PR does / why we need it**:
Add pod UID. The test passes but we'll get error info when run with `GOFLAGS=-v`:
```
E0723 09:18:18.393249 45452 node_info.go:477] Cannot get pod key, err: Cannot get cache key for pod with empty UID
E0723 09:18:18.393440 45452 node_info.go:490] Cannot get pod key, err: Cannot get cache key for pod with empty UID
```
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fix acr could not be listed in sp issue
**What this PR does / why we need it**:
after granting sp access to azure ACR , pull image from ACR would fail, and after wait about 15-30min(or restart kubelet directly), pull image would succeed. Root cause is that `servicePrincipalToken` needs to be refreshed when doing `registryClient.List`, otherwise it will always return empty registry list. Pull image error would be like following:
```
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 8m (x3 over 8m) default-scheduler 0/1 nodes are available: 1 Insufficient cpu.
Normal Scheduled 8m default-scheduler Successfully assigned nginx-server-776564f79c-zhtjk to aks-nodepool1-20881069-0
Normal SuccessfulMountVolume 8m kubelet, aks-nodepool1-20881069-0 MountVolume.SetUp succeeded for volume "default-token-4t7tk"
Normal SuccessfulMountVolume 8m kubelet, aks-nodepool1-20881069-0 MountVolume.SetUp succeeded for volume "pvc-5c1f0521-739f-11e8-9b69-0a58ac1f09c2"
Warning Failed 8m (x5 over 8m) kubelet, aks-nodepool1-20881069-0 Error: ImagePullBackOff
Normal BackOff 8m (x5 over 8m) kubelet, aks-nodepool1-20881069-0 Back-off pulling image "andyacr.azurecr.io/nginx-server:1.0.0"
Warning Failed 8m (x2 over 8m) kubelet, aks-nodepool1-20881069-0 Error: ErrImagePull
Warning Failed 8m (x2 over 8m) kubelet, aks-nodepool1-20881069-0 Failed to pull image "andyacr.azurecr.io/nginx-server:1.0.0": rpc error: code = Unknown desc = Error response from daemon: Get https://andyacr.azurecr.io/v2/nginx-server/manifests/1.0.0: unauthorized: authentication required
```
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#65225
**Special notes for your reviewer**:
After discuss with dong, `registryClient.List` won't be necessary, instead we return `{"*.azurecr.io", "*.azurecr.cn", "*.azurecr.de", "*.azurecr.us"}` like aws, gce code logic, it will do the url matching.
I will cherry pick this PR to all supported version, every version has this issue.
**Release note**:
```
fix acr could not be listed in sp issue
```
/sig azure
/assign @feiskyer @khenidak @brendandburns @karataliu
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
use subtest for table units (pkg/scheduler/core)
**What this PR does / why we need it**: Update scheduler's unit table tests to use subtest
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
**Special notes for your reviewer**:
breaks up PR: https://github.com/kubernetes/kubernetes/pull/63281
/ref #63267
**Release note**:
```release-note
This PR will leverage subtests on the existing table tests for the scheduler units.
Some refactoring of error/status messages and functions to align with new approach.
```
Automatic merge from submit-queue (batch tested with PRs 66410, 66398, 66061, 66397, 65558). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
dry-run: Create feature-gate flag
Creates a feature gate flag for dry-run. Currently, dry-run query parameter is completely blocking all requests, once the feature is implemented, the flag will allow the parameter to pass if enabled.
cc @jennybuckley @deads2k @liggitt @lavalamp
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 66410, 66398, 66061, 66397, 65558). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix volume limit for EBS on m5 and c5 instances
This is a fix for lower volume limits on m5 and c5 instance types while we wait for https://github.com/kubernetes/features/issues/554 to land GA.
This problem became urgent because many of our users are trying to migrate to those instance types in light of spectre/meltdown vulnerability but lower volume limit on those instance types often causes cluster instability. Yes they can workaround by configuring the scheduler with lower limit but often this becomes somewhat difficult to do when cluster is mixed.
The newer default limits were picked from https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/volume_limits.html
Text about spectre/meltdown is available on - https://community.bitnami.com/t/spectre-variant-2/54961/5
/sig storage
/sig scheduling
```release-note
Fix volume limit for EBS on m5 and c5 instance types
```
Automatic merge from submit-queue (batch tested with PRs 66410, 66398, 66061, 66397, 65558). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Passing `KUBE_TEST_ARGS` variable to make through process environment instead of command line flags
**What this PR does / why we need it**:
Passing `KUBE_TEST_ARGS` variable to make through process environment instead of command line flags.
`$` character has special meaning in `make`, if `KUBE_TEST_ARGS` contains `$`, it cannot be passed to `make test`. Actually, we can simply pass variables to make through process environment. This makes following scenario to work:
```
export KUBE_TEST_ARGS='-run ^TestVolumeBinding$'
make test-integration WHAT=./test/integration/scheduler
```
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 66410, 66398, 66061, 66397, 65558). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fix logs command to be generic for all resources again
--all-containers should not have been allowed as it was because it only worked for pods. This approach does not make sense for a polymorphic command. Rather than roll it back, I'll take the time to make it generic. Because of this and other pods-only options, we now have inconsistencies with the command that should be addressed separately.
@CaoShuFeng
/assign @juanvallejo @soltysh
@kubernetes/sig-cli-maintainers
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 66410, 66398, 66061, 66397, 65558). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Update Godeps after removing influx tests
When the influx tests were removed, it should have also removed the
influx deps from Godeps. This does that.
See also #66175
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Adds a build example with KUBE_BUILD_PLATFORMS
**What this PR does / why we need it**:
I didn't manage to find any guides about how to build the specific binary for the specific platform in docs (in this repo and in the community repo). Also it was quite hard for me to follow build scripts. At the end, I found help on the kubernetes-dev channel in Slack:`KUBE_BUILD_PLATFORMS` seems to do what I want. I think, would be great to have it documented somehow: The "Key scripts" section of the build doc seems to be a good place to mention it.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 66341, 66405, 66403, 66264, 66447). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Move api rules list under api-approvers-owned package
Additions to this list should be rare and carefully reviewed
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 66341, 66405, 66403, 66264, 66447). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
extend timeout to workaround slow arm64 math
**What this PR does / why we need it**:
The math/big functions are slow on arm64. There is improvement coming
with go1.11 but until such time as that version can be used to build releases,
if a server uses rsa certificates on arm64, the math load for the multitude
of watches over-taxes the ability of the processor and the TLS connections
time out. Retries will also not succeed and serve to exacerbate the problem.
By extending the timeout, the TLS connections will eventually be
successful and the load will drop.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#64649
**Special notes for your reviewer**:
This was tested on a Raspberry Pi 3
**Release note**:
```release-note
Extend TLS timeouts to work around slow arm64 math/big
```
Automatic merge from submit-queue (batch tested with PRs 66341, 66405, 66403, 66264, 66447). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
indicate which scheme has conflicting data
An oversight when adding scheme origination to error messages. There are a couple state dependent panics which are useful to gain info on.
@sttts
@kubernetes/sig-api-machinery-misc
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 66341, 66405, 66403, 66264, 66447). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Bump subpath test pod timeout to 5 minutes
Fixes: #66376
It seems that sometimes `Attach` on GCP side can take multiple minutes (when recently detached from another node) and therefore the old timeout of 1 minute is not long enough. 5 minutes is standard elsewhere and seems to be long enough for the long tail of GCE Attach time.
Also the ext4 test was really hard to debug because all the old PD tests call all the pods "pd-injector" and all the volume mounts "pd-volume." Made a change to make that more readable by adding a length 4 random string suffix.
/kind flake
/sig storage
/assign @msau42
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 66341, 66405, 66403, 66264, 66447). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubeadm: stop setting UID in the kubelet ConfigMap
**What this PR does / why we need it**: kubeadm: stop setting UID in the kubelet ConfigMap
**Which issue(s) this PR fixes**:
Fixes https://github.com/kubernetes/kubeadm/issues/921#
**Release note**:
```release-note
kubeadm: stop setting UID in the kubelet ConfigMap
```
Automatic merge from submit-queue (batch tested with PRs 66152, 66406, 66218, 66278, 65660). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Refactor kubelet node status setters, add test coverage
This internal refactor moves the node status setters to a new package, explicitly injects dependencies to facilitate unit testing, and adds individual unit tests for the setters.
I gave each setter a distinct commit to facilitate review.
Non-goals:
- I intentionally excluded the class of setters that return a "modified" boolean, as I want to think more carefully about how to cleanly handle the behavior, and this PR is already rather large.
- I would like to clean up the status update control loops as well, but that belongs in a separate PR.
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 66152, 66406, 66218, 66278, 65660). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Handle errors
**What this PR does / why we need it**:
This is a followup PR for https://github.com/kubernetes/kubernetes/pull/64664 to handle errors returned from `.AddToScheme()` in places where they are not handled.
**Release note**:
```release-note
NONE
```
/kind cleanup
/sig api-machinery
/cc @sttts
Automatic merge from submit-queue (batch tested with PRs 66152, 66406, 66218, 66278, 65660). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix panic printing pods with nominatedNode names
Fixes#66379
```release-note
kubectl: fixes a panic displaying pods with nominatedNodeName set
```
Automatic merge from submit-queue (batch tested with PRs 66152, 66406, 66218, 66278, 65660). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Update crictl to v1.11.1.
Update `crictl` to v1.11.1 to fix several bugs. Release note: https://github.com/kubernetes-incubator/cri-tools/releases/tag/v1.11.1
@kubernetes/sig-node-pr-reviews @kubernetes/sig-cluster-lifecycle-pr-reviews
@kubernetes/sig-gcp-pr-reviews
Signed-off-by: Lantao Liu <lantaol@google.com>
```release-note
Update crictl to v1.11.1.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Do not attempt to convert nil object during DELETE webhook admission
Fixes#66412
```release-note
fixes a panic when using a mutating webhook admission plugin with a DELETE operation
```
Automatic merge from submit-queue (batch tested with PRs 66098, 66389, 66400, 66413, 66378). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix a typo in csiPlugin comment
**What this PR does / why we need it**:
Fix a typo in csiPlugin comment.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 66098, 66389, 66400, 66413, 66378). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Only build generators for building platform
**What this PR does / why we need it**:
The code generator binaries are built using the `hack/make-rules/build.sh` script. If you run a command like `make kubectl KUBE_BUILD_PLATFORMS=darwin/amd64` that passes a `KUBE_BUILD_PLATFORMS` that *doesn't* include the building platform, the code generator binaries will be built on the target platform, but NOT the building platform (which doesn't make sense). Also, if you specify `make cross` it will build the code generators for *all* platforms, which is also unnecessary.
This change ensures that `KUBE_BUILD_PLATFORMS` is an empty string when building the code generators, which ensures that they will only be built for the building platform.
This also fixes an issue when using the linux-based cross build container to target a specific platform other than linux/amd64.
Example command that is broken: `build/run.sh make kubectl KUBE_BUILD_PLATFORMS=darwin/amd64`
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 66098, 66389, 66400, 66413, 66378). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
GCE: Return correct error type and HTTP Status code for operation errors
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#66399
**Special notes for your reviewer**:
/assign bowei, zihongz, rramkumar
/cc bowei
**Release note**:
```release-note
GCE: Fixes loadbalancer creation and deletion issues appearing in 1.10.5.
```
Automatic merge from submit-queue (batch tested with PRs 66098, 66389, 66400, 66413, 66378). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
verify-generated-docs: Use exit code rather than comparison to empty …
…string
Checking the exit code rather than the empty string has the advantage
that you can identify what was found (comparison consumes the string, so
it's not printed). It makes debugging much easier when something is wrong.
One of my pull-request is failing saying that I need to run update, but it keeps saying that even after I ran update. And since I can't see what's wrong, it's quite hard to debug.
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add initial availability zones support for Azure nodes
**What this PR does / why we need it**:
The first part of [Azure Availability Zone feature](https://github.com/kubernetes/features/issues/586).
This PR adds initial availability zone (AZ) support for Azure nodes. With this PR, Azure nodes with AZ will have label `failure-domain.beta.kubernetes.io/zone=<region>-<zoneID>`, e.g. `southeastasia-1`.
It also updates instance metadata api-version to 2017-12-01, which is required for AZ.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
VirtualMachineScaleSetVM doesn't have AZ info yet. It will be supported later after new Azure Go SDK releases.
**Release note**:
```release-note
Azure nodes with availability zone now will have label `failure-domain.beta.kubernetes.io/zone=<region>-<zoneID>`.
```
/kind feature
/sig azure
/assign @brendandburns @khenidak @andyzhangx
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Re-design equivalence class cache to two level cache
**What this PR does / why we need it**:
The current ecache introduced a global lock across all the nodes, and this patch tried to assign ecache per node to eliminate that global lock. The improvement of scheduling performance and throughput are both significant.
**CPU Profile Result**
Machine: 32-core 60GB GCE VM
1k nodes 10k pods bench test (we've highlighted the critical function):
1. Current default scheduler with ecache enabled:
![equivlance class cache bench test 001](https://user-images.githubusercontent.com/1701782/42196992-51b0a32a-7eb3-11e8-89ee-f13383091a00.jpeg)
2. Current default scheduler with ecache disabled:
![equivlance class cache bench test 002](https://user-images.githubusercontent.com/1701782/42196993-51eb0c68-7eb3-11e8-9326-1a7762072863.jpeg)
3. Current default scheduler with this patch and ecache enabled:
![equivlance class cache bench test 003](https://user-images.githubusercontent.com/1701782/42196994-52280ed8-7eb3-11e8-8100-690e2af2cf2f.jpeg)
**Throughput Test Result**
1k nodes 3k pods `scheduler_perf` test:
Current default scheduler, ecache is disabled:
```bash
Minimal observed throughput for 3k pod test: 200
PASS
ok k8s.io/kubernetes/test/integration/scheduler_perf 30.091s
```
With this patch, ecache is enabled:
```bash
Minimal observed throughput for 3k pod test: 556
PASS
ok k8s.io/kubernetes/test/integration/scheduler_perf 11.119s
```
**Design and implementation:**
The idea is: we re-designed ecache into a "two level cache".
The first level cache holds the global lock across nodes and sync is needed only when node is added or deleted, which is of much lower frequency.
The second level cache is assigned per node and its lock is restricted to per node level, thus there's no need to bother the global lock during whole predicate process cycle. For more detail, please check [the original discussion](https://github.com/kubernetes/kubernetes/issues/63784#issuecomment-399848349).
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#63784
**Special notes for your reviewer**:
~~Tagged as WIP to make sure this does not break existing code and tests, we can start review after CI is happy.~~
**Release note**:
```release-note
Re-design equivalence class cache to two level cache
```
The math/big functions are slow on arm64. There is improvement coming
with go1.11 but in the mean time if a server uses rsa certificates on
arm64, the math load for the multitude of watches over taxes the ability
of the processor and the TLS connections time out. Retries will also not
succeed and serve to exacerbate the problem.
By extending the timeout, the TLS connections will eventually be
successful and the load will drop.
Fixes#64649
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fixes operation for "create on update"
**What this PR does / why we need it**:
Set operation to `admission.Create` for create-on-update requests.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#65553
**Special notes for your reviewer**:
**Release note**:
```release-note
Checks CREATE admission for create-on-update requests instead of UPDATE admission
```