Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
iptables proxier_test typo
**What this PR does / why we need it**:
The definition of `makeTestService` is
```
func makeTestService(namespace, name string, svcFunc func(*api.Service)) api.Service {
...
}
```
but in function `TestClusterIPReject`, use
makeTestService(svcPortName.Namespace, svcPortName.`Namespace`, func(svc *api.Service)
should be
makeTestService(svcPortName.Namespace, svcPortName.`Name`, func(svc *api.Service)
I think it's a typo
/area kube-proxy
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Make kubernetes json serializer case sensitive
This PR imported the latest jsoniterator library so that case sensitivity during unmarhsaling is optional. The PR also set Kubernetes json serializer to be case sensitive.
Kubernetes json serializer had been case sensitive for 1.1-1.7 as we were using ugorji. This PR restores the behavior.
Fix#64612.
```release-notes
Kubernetes json deserializer is now case-sensitive as it was before 1.8.
If your config files contains fields with wrong case, the config files will be now invalid.
```
Automatic merge from submit-queue (batch tested with PRs 64272, 64630). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
GCE: Fix operation polling and error handling
Cloud functions using the generated API are bursting operation GET calls because we don't wait a minimum amount of time.
Fixes#64712Fixes#64858
**Changes**
- `operationPollInterval` is now 1.5 seconds instead of 3 seconds.
- `operationPollRateLimiter` is now configured with 5 QPS / 5 burst instead of 10 QPS / 10 burst.
- `gceRateLimiter` is now configured with a `MinimumRateLimiter` to wait the above `operationPollInterval` duration _before_ waiting on the token rate limiter.
- Operations are now rate limited on the very first GET call.
- Operations are polled until `DONE` or context times out (even if operations.get fails continuously).
- Compute operations are checked for errors when they're recognized as `DONE`.
- All "wrapper" funcs now generate a context with an hour timeout.
`ingress-gce` will need to update its vendor and utilize the `MinimumRateLimiter` as well. Since ingress creates rate limiters based off flags, we'll need to check the resource type and operation while parsing the flags and wrap the appropriate one.
**Special notes for your reviewer**:
/assign bowei
/cc bowei
**Fix Example**
Creating an external load balancer
without fix: https://pastebin.com/raw/NNkeNWS3
with fix: https://pastebin.com/raw/x2iMLW5S (a difference of about 200 GET calls)
**Release note**:
```release-note
GCE: Fixes operation polling to adhere to the specified interval. Furthermore, operation errors are now returned instead of ignored.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Re-use private key after failed CSR
**What this PR does / why we need it**:
This fixes a regression introduced in 1.11.
If we create a new key on each CSR, if CSR fails the next attempt will
create a new one instead of reusing previous CSR.
If approver/signer don't handle CSRs as quickly as new nodes come up,
they can pile up and approver would keep handling old abandoned CSRs and
Nodes would keep timing out on startup.
**Release note**:
```release-note
NONE
```
If we create a new key on each CSR, if CSR fails the next attempt will
create a new one instead of reusing previous CSR.
If approver/signer don't handle CSRs as quickly as new nodes come up,
they can pile up and approver would keep handling old abandoned CSRs and
Nodes would keep timing out on startup.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add kms-plugin-container.manifest to release manifest tarball.
**What this PR does / why we need it**:
cluster/gce/manifests/kms-plugin-container.manifest needs to be included into the manifests' release tarball.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
dockershim/network: add dcbw to OWNERS as an approver
I've been involved with the kubelet network code, including most
of this code, for a couple years and contributed a good number
of PRs for these directories. I've also been a SIG Network
co-lead for couple years.
I've also been on the CNI maintainers team for a couple years.
```release-note
NONE
```
@freehan @thockin @kubernetes/sig-network-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 64974, 65009, 65018). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Increase logexporter timeout and add debug logs
Ref - https://github.com/kubernetes/kubernetes/issues/63030#issuecomment-396335294
So it seems that logexporter isn't running on too many nodes on our 5k node cluster (~40% of nodes). As a result we fallback to ssh-based copying for so many nodes which is slow and hence the job times out. My feeling is it's because of slow scheduling of logexporter pods (and hence quite some nodes didn't even get the chance to run those pods before we delete the daemonset).
/cc @wojtek-t @krzyzacy
```release-note
NONE
```
/sig scalability
/kind bug
/priority important-soon
/milestone v1.11
/status approved-for-milestone
Automatic merge from submit-queue (batch tested with PRs 64974, 65009, 65018). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
daemon: add custom node indexer
**What this PR does / why we need it**:
<img width="863" alt="screen shot 2018-06-11 at 20 54 03" src="https://user-images.githubusercontent.com/44136/41279030-ad842020-6e2b-11e8-80d4-0a71ee415d30.png">
Based on this CPU profile, it looks like a lot of CPU cycles/cores are spend by retrieving a list of **all** pods in the cluster. On large clusters with multiple daemonset this might lead to locking the shared pod informer List() of every other controller that might need it or use it.
The indexer in the PR will index the pods based on nodeName assigned for these pods. That means the amount of pods returned from the ByIndex() function is fairly small and the call should be fast.
Additionally we can also use this index to check whether a node already run the pod without listing all pods in the cluster again.
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubeadm: Fix small-ish bugs for v1.11
**What this PR does / why we need it**:
Fixes a bunch of bugs I noticed when I was reading the source code:
- `--cloud-provider` should also be propagated to the kubelet when converting configs from v1alpha1 to v1alpha2
- The validation for `.NodeRegistration.Name` is practically non-existent, just verifies the name isn't in upper case. Instead we currently do that validation in preflight checks, which is in the totally wrong place.
- Now that we pull images in preflight checks, the timeout for the kubelet to start the Static Pods should be kinda short, as it doesn't depend on internet connection
- I think the shorthand for `kubeadm reset --force` ought to be `-f`
- The common flags between `upgrade apply` and `upgrade plan` were registered as global flags for the `upgrade` command, although they make no sense for `upgrade diff` and/or `upgrade node config`. Hence, I moved them to be locally registered.
- Just because we vendor `glog` we have a lot of unnecessary/annoying flags registered in glog's `init()` function. Let's hide these properly.
- I saw that `kubeadm upgrade apply` doesn't write down the new kubelet config that should be used, now that is the case. Also, the CRISocket annotation information is now preserved properly on upgrade (and is configurable using the `--cri-socket` flag)
- If `kubeadm join` is run against a v1.10 cluster without the `kubelet-config-1.10` configmap, it shouldn't fail.
What I will still investigate:
- `kubeadm token create` should have a flag called `--ttl`, not `--token-ttl` as it is now (this snuck in in this dev cycle)
- That `--dry-run` works properly for `upgrade`, end to end.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
@kubernetes/sig-cluster-lifecycle-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 64862, 65020). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubeadm - fix local etcd grpc gateway
**What this PR does / why we need it**:
etcd 3.2 uses the server certificate as the client cert for the grpc
gateway, this updates the generation of the etcd server certificate to
add client usage to resolve the issue.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/kubeadm/issues/910
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Revert #64189: Fix Windows CNI for the sandbox case
**What this PR does / why we need it**:
This reverts PR #64189, which breaks DNS for Windows containers.
Refer https://github.com/kubernetes/kubernetes/pull/64189#issuecomment-395248704
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#64861
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
cc @madhanrm @PatrickLang @alinbalutoiu @dineshgovindasamy
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Adding scale error retries
**What this PR does / why we need it**:
ScaleWithRetries will retry all retryable errors, not only conflict error.
ref #63030
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
- etcd 3.2 uses the server certificate as the client cert for the grpc
gateway, this updates the generation of the etcd server certificate to
add client usage to resolve the issue.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubeadm - set peer urls for default etcd instance
**What this PR does / why we need it**:
Override the default peer URLs for the default etcd instance. Previously we left the defaults, which meant the peer URL was unsecured previously.
**Release note**:
```release-note
kubeadm - Ensure the peer port is secured by explicitly setting the peer URLs for the default etcd instance.
kubeadm - Ensure that the etcd certificates are generated using a proper CN
kubeadm - Update generated etcd peer certificate to include localhost addresses for the default configuration.
kubeadm - Increase the manifest update timeout to make upgrades a bit more reliable.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Extended e2e/aggr test wait on sample-apiserver
**What this PR does / why we need it**:
The e2e/aggegator test usually passes. When it does it seems
to take almost 30 seconds for the sample-apiserver to start returning
2xx rather than 4xx to flunder requests. On the failing tests I looked
at it was taking almost 45 seconds for the sample-apiserver to become
healthy. I bumped the wait/timeout in the test for this case to 60
seconds. I also added a log statement to make it easier to track how
long it was taking for the sample-apiserver to come up.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#63622
**Special notes for your reviewer**:
Once we have a bit more history I will log a bug for the long start up time.
We should also determine how far back we want to check compatibility.
The current test is checking 1.7.
**Release note**:
```release-note
None
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
volume: decrease memory allocations for debugging messages
**What this PR does / why we need it**:
<img width="1769" alt="screen shot 2018-06-11 at 13 15 31" src="https://user-images.githubusercontent.com/44136/41230128-ebf7233c-6d7e-11e8-899d-6251a5fde236.png">
On large clusters, where the glog is not running on V(5) using the format as: `glog.V(5).Infof(fmt.Sprintf(....))` will cause the code inside `Infof()` to be ran and generate a tons of memory allocations even if the output of those messages are not returned to the console...
This patch should reduce those calls and also the string allocations done by message generation.
**Release note**:
```release-note
NONE
```
- Set peer urls for default etcd instance to avoid leaving peer port unsecured
- Update generated etcd peer certificate SANs to include localhost
- Use a proper CN for etcd server and peer certificates
- Increase the manifest update timeout
Automatic merge from submit-queue (batch tested with PRs 64503, 64903, 64643, 64987). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Update cadvisor godeps to v0.30.1 to revert cadvisor#1916
/sig node
/priority critical-urgent
/kind bug
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#64933
**Release note**:
```release-note
Kubernetes depends on v0.30.1 of cAdvisor
```
Automatic merge from submit-queue (batch tested with PRs 64503, 64903, 64643, 64987). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Use unix.EpollWait to determine when memcg events are available to be Read
**What this PR does / why we need it**:
This fixes a file descriptor leak introduced in https://github.com/kubernetes/kubernetes/pull/60531 when the `--experimental-kernel-memcg-notification` kubelet flag is enabled. The root of the issue is that `unix.Read` blocks indefinitely when reading from an event file descriptor and there is nothing to read. Since we refresh the memcg notifications, these reads accumulate until the memcg threshold is crossed, at which time all reads complete. However, if the node never comes under memory pressure, the node can run out of file descriptors.
This PR changes the eviction manager to use `unix.EpollWait` to wait, with a 10 second timeout, for events to be available on the eventfd. We only read from the eventfd when there is an event available to be read, preventing an accumulation of `unix.Read` threads, and allowing the event file descriptors to be reclaimed by the kernel.
This PR also breaks the creation, and updating of the memcg threshold into separate portions, and performs creation before starting the periodic synchronize calls. It also moves the logic of configuring memory thresholds into memory_threshold_notifier into a separate file.
This also reverts https://github.com/kubernetes/kubernetes/pull/64582, as the underlying leak that caused us to disable it for testing is fixed here.
Fixes#62808
**Release note**:
```release-note
NONE
```
/sig node
/kind bug
/priority critical-urgent
Automatic merge from submit-queue (batch tested with PRs 64503, 64903, 64643, 64987). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fix a bug of wrong parameters which could cause token projection failure
**What this PR does / why we need it**:
a parameter wrong order bug
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 64503, 64903, 64643, 64987). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Create system:cluster-autoscaler account & role and introduce it to C…
**What this PR does / why we need it**:
This PR adds cluster-autoscaler ClusterRole & binding, to be used by the Cluster Autoscaler (kubernetes/autoscaler repository).
It also updates GCE scripts to make CA use the cluster-autoscaler user account.
User account instead of Service account is chosen to be more in line with kube-scheduler.
**Which issue(s) this PR fixes**:
Fixes [issue 383](https://github.com/kubernetes/autoscaler/issues/383) from kubernetes/autoscaler.
**Special notes for your reviewer**:
This PR might be treated as a security fix since prior to it CA on GCE was using system:cluster-admin account, assumed due to default handling of unsecured & unauthenticated traffic over plain HTTP.
**Release note**:
```release-note
A cluster-autoscaler ClusterRole is added to cover only the functionality required by Cluster Autoscaler and avoid abusing system:cluster-admin role.
action required: Cloud providers other than GCE might want to update their deployments or sample yaml files to reuse the role created via add-on.
```
Investigated issue 63622. The test usually passes. When it does it seems
to take almost 30 seconds for the sample-apiserver to start returning
2xx rather than 4xx to flunder requests. On the failing tests I looked
at it was taking almost 45 seconds for the sample-apiserver to become
healthy. I bumped the wait/timeout in the test for this case to 60
seconds. I also added a log statement to make it easier to track how
long it was taking for the sample-apiserver to come up. Once we have a
bit more history I will log a bug for the long start up time.
Fixed go format error.
Automatic merge from submit-queue (batch tested with PRs 64945, 64977). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fix kubeadm init does not inform the users it is pulling images
**What this PR does / why we need it**:
Kubeadm should print a message informing the user that it is pre-pulling images before starting the init sequence, making users aware of the action in progress, which can take some time in environments with poor internet connection.
**Which issue(s) this PR fixes**:
fixes https://github.com/kubernetes/kubeadm/issues/908
**Release note**:
```release-note
NONE
```