Automatic merge from submit-queue
Make big clusters work again after introduction of subnets
This PR does two things:
- make IP aliases automatically pick Node IP Range based on number of Nodes,
- fix logic for starting clusters >4095 Nodes that was broken by introduction of subnets,
cc @wojtek-t @shyamjvs
```release-note
Setting env var ENABLE_BIG_CLUSTER_SUBNETS=true will allow kube-up.sh to start clusters bigger that 4095 Nodes on GCE.
```
Ref https://github.com/kubernetes/kubernetes/issues/47344
Automatic merge from submit-queue
Insert Cynerva and Kjackal to approvers list
**What this PR does / why we need it**:
Per the membership reviews, we're looking to promote Konstantinos and
George to approvers to help distribute the review/bug load for the `cluster/juju` code
tree.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
**Special notes for your reviewer**:
cc @marcoceppi and @tvansteenburgh
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 47038, 47105)
extending DefaultExternalHost for any registred cloud provider
**What this PR does / why we need it**: this PR enables DefaultExternalHost to work with any registered cloud provider.
**Which issue this PR fixes** : fixes#46567
**Special notes for your reviewer**:
**Release note**:
```release-note
When determining the default external host of the kube apiserver, any configured cloud provider is now consulted
```
Automatic merge from submit-queue
Use endpoints informer for the endpoint controller
This substantially reduces the number of API calls made by the endpoint
controller. Currently the controller makes an API call per endpoint for
each service that is synced. When the 30s resync is triggered, this
results in an API call for every single endpoint in the cluster. This
quickly exceeds the default qps/burst limit of 20/30 even in small
clusters, leading to delays in endpoint updates.
This change modifies the controller to use the endpoint informer cache
for all endpoint GETs. This means we only make API calls for changes in
endpoints. As a result, qps only depends on the pod activity in the
cluster, rather than the number of services.
**What this PR does / why we need it**:
Address endpoint update delays as described in https://github.com/kubernetes/kubernetes/issues/47597.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
https://github.com/kubernetes/kubernetes/issues/47597
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue
kubeadm: Make kube-proxy RollingUpgradeable
**What this PR does / why we need it**:
Sets the right updateStrategy for kube-proxy.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Fixes: https://github.com/kubernetes/kubeadm/issues/319
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
@pipejakob @timothysc @kubernetes/sig-cluster-lifecycle-pr-reviews
This substantially reduces the number of API calls made by the endpoint
controller. Currently the controller makes an API call per endpoint for
each service that is synced. When the 30s resync is triggered, this
results in an API call for every single endpoint in the cluster. This
quickly exceeds the default qps/burst limit of 20/30 even in small
clusters, leading to delays in endpoint updates.
This change modifies the controller to use the endpoint informer cache
for all endpoint GETs. This means we only make API calls for changes in
endpoints. As a result, qps only depends on the pod activity in the
cluster, rather than the number of services.
Automatic merge from submit-queue
godoc update for scheduler predicates.
**What this PR does / why we need it**:
This is a follow up PR for https://github.com/kubernetes/kubernetes/pull/46621
/cc @timothysc
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
none
```
Automatic merge from submit-queue (batch tested with PRs 47484, 47904, 48034)
prioritize messages for long steps
This pull prioritizes the trace messages, so steps that are unusually large come out at the info level and all details come out a v(4) level.
Automatic merge from submit-queue (batch tested with PRs 47484, 47904, 48034)
fix nits in kubelet server
Signed-off-by: allencloud <allen.sun@daocloud.io>
**What this PR does / why we need it**:
fix nits in kubelet server
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
NONE
**Special notes for your reviewer**:
NONE
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 48092, 47894, 47983)
fix systemd service file for custom args.
`KUBE_SCHEDULER_ARGS` and `KUBELET_ARGS` are used to custom args for scheduler or kubelet by users.
But if there are more than one params in `KUBELET_ARGS`, for example, if I set KUBELET_ARGS="--cgroups-per-qos=false --enforce-node-allocatable=", the kubelet will judge the `false --enforce-node-allocatable=` as the value of `cgroups-per-qos`. Because `${KUBELET_ARGS}` in kubelet.service will expands the variable into one word. And if I take `$KUBELET_ARGS` instead, kubelet will worker perfectly.
For more info, please click [EnvironmentFiles and support for /etc/sysconfig files](http://fedoraproject.org/wiki/Packaging:Systemd#EnvironmentFiles_and_support_for_.2Fetc.2Fsysconfig_files). This bug is reported by @huanxingyouyoutoo. And I make this PR for her to fix it.
**Release note**:
```
NONE
```
Automatic merge from submit-queue (batch tested with PRs 48092, 47894, 47983)
Skip Deployment upgrade test on 1.5 and earlier.
The test relies on implementation details and would need a rewrite to work for older clusters.
xref #47685
Automatic merge from submit-queue (batch tested with PRs 48012, 47443, 47702, 47178)
Fix setting juju worker labels during deployment
**What this PR does / why we need it**: Allows for setting the labels of juju workers during deployment (eg inside a bundle)
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#47176
**Special notes for your reviewer**:
**Release note**:
```
Fix bug in setting Juju kubernetes-worker labels in bundle.yaml files.
```
Automatic merge from submit-queue (batch tested with PRs 48012, 47443, 47702, 47178)
Don't bother with a mutable transformer for identity
The default value transformer can safely be the identity transformer - mutability is not required if the caller doesn't need transformation.
Automatic merge from submit-queue (batch tested with PRs 48012, 47443, 47702, 47178)
incluster config will be used when creating external shared informers.
**What this PR does / why we need it**:
Previously the loopback configuration was used to talk to the server.
As a consequence a custom API server was unable to talk to the root API server.
This PR changes the above by using incluster configuration to create shared informers.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 48012, 47443, 47702, 47178)
Extending timeout waiting for delete node to become ready before the test ends.
**What this PR does / why we need it**: It seems to take longer than 5 minutes for the node to recover. Changing the timeout to 10 minutes.
This is an extension of PR #46746
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#48008
/release-note-none
Automatic merge from submit-queue
Update custom-resources example in client-go
- Update client-go examples `README.md` to point to the CustomResources example instead of the deprecated TPR one.
- Delete `staging/src/k8s.io/client-go/examples/custom-resources`.
Fixing #47743.
**Release note**:
```release-note
NONE
```
/cc @ahmetb @sttts
Automatic merge from submit-queue (batch tested with PRs 44058, 48085, 48077, 48076, 47823)
don't pass CRI error through to waiting state reason
Raw gRPC errors are getting into the `Reason` field of the container status `State`, causing it to be output inline on a `kubectl get pod`
xref https://bugzilla.redhat.com/show_bug.cgi?id=1449820
Basically the issue is that the err and msg are reversed in `startContainer()`. The msg is short and the err is long. It should be the other way around.
This PR changes `startContainer()` to return a short error that becomes the Reason and the extracted gPRC error description that becomes the Message.
@derekwaynecarr @smarterclayton @eparis
Automatic merge from submit-queue (batch tested with PRs 44058, 48085, 48077, 48076, 47823)
Fix error in local-cluster-up
When $GO_OUT is not set, line 152 outputs an error.
```
./hack/local-up-cluster.sh: line 152: [: ==: unary operator expected
```
This occurs because the if condition expands as `if [ == "" ]`. This results in an error because == is a binary operator and expects something on the LHS.
**Release note**:
```release-note
NONE
```
cc @sttts
FYI @Gouthamve
Automatic merge from submit-queue (batch tested with PRs 44058, 48085, 48077, 48076, 47823)
Retry finding RBAC version if not found in discovery cache
Alternate to https://github.com/kubernetes/kubernetes/pull/47995
xref #47977
The caching discovery client can indicate whether it used fresh discovery data. `kubefed init` should invalidate and recheck if it doesn't find an RBAC API group
```release-note
`kubefed init` correctly checks for RBAC API enablement.
```
Automatic merge from submit-queue (batch tested with PRs 44058, 48085, 48077, 48076, 47823)
Move iptables logging in kubeproxy from Errorf to V(2).Infof
Fixes https://github.com/kubernetes/kubernetes/issues/48052
This will stop fluentd from OOM'ing in reasonably large clusters with services due to kube-proxy. You'll still get iptables printed on setups which run at >= v2, but we can at least optout.
@bowei Does this look reasonable?
cc @kubernetes/sig-network-misc
Automatic merge from submit-queue (batch tested with PRs 44058, 48085, 48077, 48076, 47823)
Make background garbage collection cascading
Fix#44046, fix#47843 where user reported that the garbage collector didn't delete pods when a deployment was deleted with PropagationPolicy=Background.
The cause is that when propagating background garbage collection request, the garbage collector deletes dependents with DeleteOptions.PropagationPolicy=nil, which means the default GC policy of a resource (defined by its REST strategy) and the existing GC-related finalizers will decide how the delete request is propagated further. Unfortunately, the default GC policy for RS is orphaning, so the pods are behind when a deployment is deleted.
This PR changes the garbage collector to delete dependents with DeleteOptions.PropagationPolicy=Background when the owner is deleted in background. This means the dependent's existing GC finalizers will be overridden, making orphaning less flexible (see this made-up [case](https://github.com/kubernetes/kubeadm/issues/149#issuecomment-278942012)). I think sacrificing the flexibility of orphaning is worthwhile, because making the behavior of background garbage collection matching users' expectation is more important.
cc @lavalamp @kargakis @krmayankk @enisoc
```release-note
The garbage collector now cascades deletion properly when deleting an object with propagationPolicy="background". This resolves issue [#44046](https://github.com/kubernetes/kubernetes/issues/44046), so that when a deployment is deleted with propagationPolicy="background", the garbage collector ensures dependent pods are deleted as well.
```
Automatic merge from submit-queue (batch tested with PRs 47860, 47170)
Fix restart action on juju kubernetes-master
**What this PR does / why we need it**: Restart action of kubernetes-master of Juju is not functioning.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/299
**Special notes for your reviewer**:
**Release note**:
```
Fix: Restart action of juju's kubernetes-master restarts the respective snap based services
```
Automatic merge from submit-queue (batch tested with PRs 47860, 47170)
Make fluentd log to stdio instead of a dedicated file
Lower verbosity also, to reduce volume of system logs exported to the backend.
Fix https://github.com/kubernetes/kubernetes/issues/43772
/cc @piosz
Automatic merge from submit-queue (batch tested with PRs 48036, 48022)
apiextensions-apiserver: fix build
Can't build CRD due to this bug. This PR will fix it.