Thanks to some great sleuthing by ikruglov!
kube-controller-manager defaults --leader-elect to true. We should
do the same for kube-scheduler. kube-scheduler used to have this
set to true, but it got lost during refactoring in:
efb2bb71cd
This PR makes two changes. One is to introduce a parameter
for the HTTP/2 setting that an api-server sends to its clients
telling them how many streams they may have concurrently open in
an HTTP/2 connection. If left at its default value of zero,
this means to use the default in golang's HTTP/2 code (which
is currently 250).
The other change is to make the recommended options for an aggregated
api-server set this limit to 1000. The limit of 250 is annoyingly low
for the use case of many controllers watching objects of Kinds served
by an aggregated api-server reached through the main api-server (in
its mode as a proxy for the aggregated api-server, in which it uses a
single HTTP/2 connection for all calls proxied to that aggregated
api-server).
Fixes#60042
Seperate loop and plugin control in the kube-controller-manager.
Adding an "--external-plugin" flag to specify a plugin to load when
cloud-provider is set to "external". Flag has no effect currently
when the cloud-provider is not set to external. The expectation is
that the cloud provider and external plugin flags would go away once
all cloud providers are on stage 2 cloud-controller-manager solutions.
Managing the control loops more directly based on start up flags.
Addressing issue brought up by @wlan0
Switched to using the main node controller in CCM.
Changes to enable full NodeController to start in CCM.
Fix related tests.
Unifying some common code between KCM and CCM.
Fix related tests and comments.
Folded in feedback from @jhorwit2 and @wlan0
ClusterCIDR and ServiceCIDR are settings that are only used if at least
AllocateNodeCIDRs is set. The route controller requires in addition to
it for ConfigureCloudRoutes to be true as well. Since
AllocateNodeCIDRs is by default false, if guard the parsing of these
settings in order to not unnecessarily spam logs. Amend the
documentation of kube-controller-manager for the 2 settings to point
out the requirement of AllocateNodeCIDRs to be true as well
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Make HPA tolerance a flag
**What this PR does / why we need it**:
Make HPA tolerance configurable as a flag. This change allows us to use
different tolerance values in production/testing.
**Which issue this PR fixes**:
Fixes#18155
**Release note:**
```release-note
Control HPA tolerance through the `horizontal-pod-autoscaler-tolerance` flag.
```
Signed-off-by: mattjmcnaughton <mattjmcnaughton@gmail.com>
Fix#18155
Make HPA tolerance configurable as a flag. This change allows us to use
different tolerance values in production/testing.
Signed-off-by: mattjmcnaughton <mattjmcnaughton@gmail.com>
Automatic merge from submit-queue (batch tested with PRs 52751, 52898, 52633, 52611, 52609). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Add default value for RouteReconciliationPeriod in cloud controller manager
**What this PR does / why we need it**:
Add default sync period value config for RouteReconciliationPeriod. For now the default value is 0, which means zero cooldown time.
The value is taken from kube-controller-manager:
b2b079b95a/cmd/kube-controller-manager/app/options/options.go (L73)
**Which issue this PR fixes**
**Special notes for your reviewer**:
**Release note**:
Automatic merge from submit-queue (batch tested with PRs 47232, 48625, 48613, 48567, 39173)
Include leaderelection in client-go;
Fix#39117
Fix https://github.com/kubernetes/client-go/issues/28
This PR:
* includes the leaderelection to the staging client-go
* to avoid conflict with golang's testing package, renames package /testing to /testutil, and renames cache/testing to cache/testframework
```release-note
client-go now includes the leaderelection package
```
Automatic merge from submit-queue
controller-manager: fix horizontal-pod-autoscaler-use-rest-clients fl…
…ag help info
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Allow the list of resources the garbage collector controller should
ignore to be customizable, so downstream integrators can add their own
resources to the list, if necessary.
Automatic merge from submit-queue (batch tested with PRs 44862, 42241, 42101, 43181, 44147)
Feature/hpa upscale downscale delay configurable
**What this PR does / why we need it**:
Makes "upscale forbidden window" and "downscale forbidden window" duration configurable in arguments of kube-controller-manager. Those are options of horizontal pod autoscaler.
**Special notes for your reviewer**:
Please have a look @DirectXMan12 , the PR as discussed in Slack.
**Release note**:
```
Make "upscale forbidden window" and "downscale forbidden window" duration configurable in arguments of kube-controller-manager. Those are options of horizontal pod autoscaler. Right now are hardcoded 3 minutes for upscale, and 5 minutes to downscale. But sometimes cluster administrator might want to change this for his own needs.
```
If CIDRAllocatorType is set to `CloudCIDRAllocator`, then allocation
of CIDR allocation instead is done by the external cloud provider and
the node controller is only responsible for reflecting the allocation
into the node spec.
- Splits off the rangeAllocator from the cidr_allocator.go file.
- Adds cloudCIDRAllocator, which is used when the cloud provider allocates
the CIDR ranges externally. (GCE support only)
- Updates RBAC permission for node controller to include PATCH
Automatic merge from submit-queue (batch tested with PRs 42692, 42169, 42173)
DaemonSet: Respect ControllerRef
**What this PR does / why we need it**:
This is part of the completion of the [ControllerRef](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/controller-ref.md) proposal. It brings DaemonSet into full compliance with ControllerRef. See the individual commit messages for details.
**Which issue this PR fixes**:
This ensures that DaemonSet does not fight with other controllers over control of Pods.
**Special notes for your reviewer**:
**Release note**:
```release-note
DaemonSet now respects ControllerRef to avoid fighting over Pods.
```
cc @erictune @kubernetes/sig-apps-pr-reviews
This commit switches over the HPA controller to use the custom metrics
API. It also converts the HPA controller to use the generated client
in k8s.io/metrics for the resource metrics API.
In order to enable support, you must enable
`--horizontal-pod-autoscaler-use-rest-clients` on the
controller-manager, which will switch the HPA controller's MetricsClient
implementation over to use the standard rest clients for both custom
metrics and resource metrics. This requires that at the least resource
metrics API is registered with kube-aggregator, and that the controller
manager is pointed at kube-aggregator. For this to work, Heapster
must be serving the new-style API server (`--api-server=true`).
When default reconciler sync period is set to 5 second, we often see
rateLimit issue for a large cluster. This PR is change the period to 1
minute to mitigate this problem.
Make this period longer means that there might be some period of time
that the cached information in master's attach_detach_controller is out
of date. The node might use this information to mount to the wrong
device. For GCE PD, since device path is uniquely associated with volume
id, so mount operation will just fail because of this outdated
information. For AWS, before kubelet might mount to the wrong volume
because device path could be reused immediately once it is available.
But after PR #38818, device path will only be reused after all device
paths have been explored. That means it is very unlikely that kubelet will
mount to a wrong volume that is using the old device path that had been
assigned to the same node.
Automatic merge from submit-queue
Enable the garbage collector by default
Turning GC on by default.
Memory usage of GC is back to normal after #30943. The CPU usage is a little higher than the cap in scalability test (1.11 core vs. 1 core). This PR adjusted the default GC worker to 20 to see if that helps CPU usage.
@kubernetes/sig-api-machinery @wojtek-t @lavalamp
This allows kube-controller-manager to allocate CIDRs to nodes (with
allocate-node-cidrs=true), but will not try to configure them on the
cloud provider, even if the cloud provider supports Routes.
The default is configure-cloud-routes=true, and it will only try to
configure routes if allocate-node-cidrs is also configured, so the
default behaviour is unchanged.
This is useful because on AWS the cloud provider configures routes by
setting up VPC routing table entries, but there is a limit of 50
entries. So setting configure-cloud-routes on AWS would allow us to
continue to allocate node CIDRs as today, but replace the VPC
route-table mechanism with something not limited to 50 nodes.
We can't just turn off the cloud-provider entirely because it also
controls other things - node discovery, load balancer creation etc.
Fix#25602
Split controller cache into actual and desired state of world.
Controller will only operate on volumes scheduled to nodes that
have the "volumes.kubernetes.io/controller-managed-attach" annotation.