Automatic merge from submit-queue (batch tested with PRs 52376, 52439, 52382, 52358, 52372)
Add new api groups to the GCE advanced audit policy
Fixes https://github.com/kubernetes/kubernetes/issues/52265
It introduces the missing api groups, that were introduced in 1.8 release.
@piosz there's also the 'metrics' api group, should we audit it?
Automatic merge from submit-queue (batch tested with PRs 51601, 52153, 52364, 52362, 52342)
Make advanced audit policy on GCP configurable
Related to https://github.com/kubernetes/kubernetes/issues/52265
Make GCP audit policy configurable
/cc @tallclair
Automatic merge from submit-queue (batch tested with PRs 52339, 52343, 52125, 52360, 52301)
Switch default audit policy to beta and omit RequestReceived stage
Related to https://github.com/kubernetes/kubernetes/issues/52265
```release-note
By default, clusters on GCE no longer sends RequestReceived audit event, if advanced audit is configured.
```
Automatic merge from submit-queue
[GCE kube-up] Add a warning for kube-proxy DaemonSet option
**What this PR does / why we need it**:
Add a warning for kube-proxy DaemonSet option for GCE kube-up so that user will be aware of the risks.
Ref: https://github.com/kubernetes/kubernetes/issues/23225
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #NONE
**Special notes for your reviewer**:
/assign @bowei
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 52227, 52120)
Use COS for nodes in testing clusters by default, and bump COS.
Addresses part of issue #51487. May assist with #51961 and #50695.
CVM is being deprecated, and falls out of support on 2017/10/01. We shouldn't run test jobs on it. So start using COS for all test jobs.
The default value of `KUBE_NODE_OS_DISTRIBUTION` for clusters created for testing will now be gci. Testjobs that do not specify this value will now run on clusters using COS (aka GCI) as the node OS, instead of CVM, the previous default.
This change only affects testing; non-testing clusters already use COS by default.
In addition, bump the version of COS from `cos-stable-60-9592-84-0` to `cos-stable-60-9592-90-0`.
```release-note
NONE
```
/cc @yujuhong, @mtaufen, @fejta, @krzyzacy
Automatic merge from submit-queue
Add cluster up configuration for certificate signing duration.
```release-note
Add CLUSTER_SIGNING_DURATION environment variable to cluster configuration scripts
to allow configuration of signing duration of certificates issued via the Certificate
Signing Request API.
```
Addresses part of issue #51487.
This is a big change for testing; any testjobs that do not
set an explicit KUBE_NODE_OS_DISTRIBUTION will have been running
on CVM, but after this PR will start running COS.
CVM is being deprecated, and falls out of support on 2018/10/01.
In addition, bump the patch version of COS from
cos-stable-60-9592-84-0 to cos-stable-60-9592-90-0.
Automatic merge from submit-queue (batch tested with PRs 51921, 51829, 51968, 51988, 51986)
COS/GCE: bump the max pids for the docker service
**What this PR does / why we need it**:
TasksMax limits how many threads/processes docker can create. Insufficient limit affects container starts.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
fixes#51977
**Special notes for your reviewer**:
**Release note**:
```release-note
Ensure TasksMax is sufficient for docker
```
Automatic merge from submit-queue (batch tested with PRs 51921, 51829, 51968, 51988, 51986)
Fix unbound variable in configure-helper.sh
This isn't plumbed yet on GKE, so results in an unbound variable.
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51186, 50350, 51751, 51645, 51837)
Set up DNS server in containerized mounter path
During NFS/GlusterFS mount, it requires to have DNS server to be able to
resolve service name. This PR gets the DNS server ip from kubelet and
add it to the containerized mounter path. So if containerized mounter is
used, service name could be resolved during mount
**Release note**:
```release-note
Allow DNS resolution of service name for COS using containerized mounter. It fixed the issue with DNS resolution of NFS and Gluster services.
```
During NFS/GlusterFS mount, it requires to have DNS server to be able to
resolve service name. This PR gets the DNS server ip from kubelet and
add it to the containerized mounter path. So if containerized mounter is
used, service name could be resolved during mount
Automatic merge from submit-queue (batch tested with PRs 49727, 51792)
Introducing metrics-server
ref https://github.com/kubernetes/features/issues/271
There is still some work blocked on problems with repo synchronization:
- migrate to `v1beta1` introduced in #51653
- bump deps to HEAD
Will do it in a follow up PRs once the issue is resolved.
```release-note
Introduced Metrics Server
```
Automatic merge from submit-queue
Add RBAC, healthchecks, autoscalers and update Calico to v2.5.1
**What this PR does / why we need it**:
- Updates Calico to `v2.5`
- Calico/node to `v2.5.1`
- Calico CNI to `v1.10.0`
- Typha to `v0.4.1`
- Enable health check endpoints
- Add Readiness probe for calico-node and Typha
- Add Liveness probe for calico-node and Typha
- Add RBAC manifest
- With calico ClusterRole, ServiceAccount and ClusterRoleBinding
- Add Calico CRDs in the Calico manifest (only works for k8s v1.7+)
- Add vertical autoscaler for calico-node and Typha
- Add horizontal autoscaler for Typha
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51553, 51538, 51663, 51069, 51737)
Allow enable pod priority feature gate for GCE and configure priority for kube-proxy
**What this PR does / why we need it**:
From #23225, this PR adds an option for user to enable pod priority feature gate using GCE startup scripts, and configure pod priority for kube-proxy when enabled.
The setup `priorityClassName: system` derives from: ce1485c626/staging/src/k8s.io/api/core/v1/types.go (L2536-L2542)
The plan is to configure pod priority for kube-proxy daemonset (https://github.com/kubernetes/kubernetes/pull/50705) in the same way.
**Special notes for your reviewer**:
cc @bsalamat @davidopp @thockin
**Release note**:
```release-note
When using kube-up.sh on GCE, user could set env `ENABLE_POD_PRIORITY=true` to enable pod priority feature gate.
```
Automatic merge from submit-queue (batch tested with PRs 51583, 51283, 51374, 51690, 51716)
Create a secondary range for the services instead of a subnetwork
GCE now supports >1 secondary ranges / subnetwork.
Fixes#51774
```release-note
When using IP aliases, use a secondary range rather than subnetwork to reserve cluster IPs.
```
Automatic merge from submit-queue (batch tested with PRs 51590, 48217, 51209, 51575, 48627)
FlexVolume setup script for COS instance using mounting utility image in GCR.
**What this PR does / why we need it**: This scripts automates FlexVolume installation for a single COS instance. Users need to pre-pack their drivers and mount utilities in a Docker image and upload it to GCR.
For each FlexVolume plugin, the script places a driver wrapper in a writable and executable location. The wrapper calls commands from the actual driver but in a chroot environment, so that mount utilities from the image can be used.
I'm working on a script that automatically executes this on all instances. Will be in a separate PR.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#48626
```release-note
NONE
```
/cc @gmarek @chakri-nelluri
/assign @saad-ali @msau42
/sig storage
/release-note-none
Automatic merge from submit-queue
Adding Flexvolume plugin dir piping for controller manager on COS
**What this PR does / why we need it**: Sets the default Flexvolume plugin directory correctly for controller manager running on COS images.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#51563
```release-note
NONE
```
/release-note-none
/sig storage
/assign @msau42
/cc @wongma7
Automatic merge from submit-queue (batch tested with PRs 51480, 49616, 50123, 50846, 50404)
Add KUBE_APISERVER_REQUEST_TIMEOUT_SEC env var.
Cluster startup support for the flag added by #51415. I won't merge until that PR merges.
Bug: #51355
cc @jpbetz
Automatic merge from submit-queue
Retry master instance creation in case of retriable error (with sleep)
To help with our 5k-node CI tests failing to startup the cluster.
And also towards the greater goal - https://github.com/kubernetes/kubernetes/issues/43140
cc @kubernetes/sig-scalability-misc @kubernetes/sig-cluster-lifecycle-misc
Automatic merge from submit-queue (batch tested with PRs 47054, 50398, 51541, 51535, 51545)
Switch away from gcloud deprecated flags in compute resource listings
**What is fixed**
Remove deprecated `gcloud compute` flags, see linked issue.
**Which issue this PR fixes**:
fixes#49673
**Special notes for your reviewer**:
The change in `gcloudComputeResourceList` in `test/e2e/framework/ingress_utils.go` isn't strictly needed as currently no affected resources are called on within that file, however the function has the _potential_ to access affected resources so I covered it as well. Happy to change if deemed unnecessary.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Add Google cloud KMS service for envelope encryption transformer
This adds the required pieces which will allow addition of KMS based encryption providers (envelope transformer).
For now, we will be implementing it using Google Cloud KMS, but the code should make it easy to add support for any other such provider which can expose Decrypt and Encrypt calls.
Writing tests for Google Cloud KMS Service may cause a significant overhead to the testing framework. It has been tested locally and on GKE though.
Upcoming after this PR:
* Complete implementation of the envelope transformer, which uses LRU cache to maintain decrypted DEKs in memory.
* Track key version to assist in data re-encryption after a KEK rotation.
Development branch containing the changes described above: https://github.com/sakshamsharma/kubernetes/pull/4
Envelope transformer used by this PR was merged in #49350
Concerns #48522
Planned configuration:
```
kind: EncryptionConfig
apiVersion: v1
resources:
- resources:
- secrets
providers:
- kms:
cachesize: 100
configfile: gcp-cloudkms.conf
name: gcp-cloudkms
- identity: {}
```
gcp-cloudkms.conf:
```
[GoogleCloudKMS]
kms-location: global
kms-keyring: google-container-engine
kms-cryptokey: example-key
```
Automatic merge from submit-queue (batch tested with PRs 51471, 50561, 50435, 51473, 51436)
Fix `gcloud compute instance-groups managed list` call
**What this PR does / why we need it**: gcloud 168.0.0 makes the `gcloud compute instance-groups managed list --format='value(instanceGroup)'` call return a URL instead of just the name, which is causing `list-instances` to fail. Switching to `--format='value(name)'` seems to restore the old behavior.
x-ref #49673
**Release note**:
```release-note
NONE
```
/cc @wojtek-t @mwielgus @shyamjvs @jiayingz @mindprince
Automatic merge from submit-queue (batch tested with PRs 50932, 49610, 51312, 51415, 50705)
Allow running kube-proxy as a DaemonSet when using kube-up.sh on GCE
**What this PR does / why we need it**:
From #23225, this PR adds an option for user to run kube-proxy as a DaemonSet instead of static pods using GCE startup scripts. By default, kube-proxy will run as static pods.
This is the first step for moving kube-proxy into a DaemonSet in GCE, remaining tasks will be tracked on #23225.
**Special notes for your reviewer**:
The last commit are purely for testing out kube-proxy as daemonset via CIs.
cc @kubernetes/sig-network-misc @kubernetes/sig-cluster-lifecycle-misc
**Release note**:
```release-note
When using kube-up.sh on GCE, user could set env `KUBE_PROXY_DAEMONSET=true` to run kube-proxy as a DaemonSet. kube-proxy is run as static pods by default.
```
Automatic merge from submit-queue (batch tested with PRs 51038, 50063, 51257, 47171, 51143)
update related manifest files to use hostpath type
**What this PR does / why we need it**:
Per [discussion in #46597](https://github.com/kubernetes/kubernetes/pull/46597#pullrequestreview-53568947)
Dependes on #46597
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Fixes: https://github.com/kubernetes/kubeadm/issues/298
**Special notes for your reviewer**:
/cc @euank @thockin @tallclair @Random-Liu
**Release note**:
```release-note
None
```
Automatic merge from submit-queue (batch tested with PRs 50033, 49988, 51132, 49674, 51207)
Update cos image to cos-stable-60-9592-84-0
cos-m60 has been stable for a long time. This image contains a docker upgrade, which has been validated in https://github.com/kubernetes/kubernetes/issues/42926.
**Release note**:
```
None
```
/assign @yujuhong
/cc @dchen1107
Automatic merge from submit-queue (batch tested with PRs 51193, 51154, 42689, 51189, 51200)
Include $USER in network name to not clash for different users' cl…
Automatic merge from submit-queue (batch tested with PRs 51108, 51035, 50539, 51160, 50947)
Auto-calculate CLUSTER_IP_RANGE based on cluster size
In preparation for eliminating CLUSTER_IP_RANGE env var from job configs, making it less error prone while folks try to start their own large cluster tests (https://github.com/kubernetes/kubernetes/issues/50907).
/cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek
Automatic merge from submit-queue (batch tested with PRs 51108, 51035, 50539, 51160, 50947)
Set GCE_ALPHA_FEATURES environment variable in gce.conf
This allows us to gate alpha features in the pkg/cloudprovider/providers/gce.
Automatic merge from submit-queue
Add Priority admission controller
**What this PR does / why we need it**: Add Priority admission controller. This admission controller checks creation and update of PriorityClasses. It also resolves a PriorityClass name of a pod to its integer value.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
Add Priority admission controller for monitoring and resolving PriorityClasses.
```
ref/ #47604
ref/ #48646
Automatic merge from submit-queue (batch tested with PRs 50386, 50374, 50444, 50382)
Add explicit API kind and version to the audit policy file on GCE
Adds an explicit API version and kind to the audit policy file in GCE configuration scripts. It's a prerequisite for https://github.com/kubernetes/kubernetes/pull/49115
/cc @tallclair @piosz
Automatic merge from submit-queue (batch tested with PRs 50300, 50328, 50368, 50370, 50372)
Bugfix: set resources only for fluentd-gcp container.
There is more than one container in fluentd-gcp deployment. Previous
implementation was setting resources for all containers, not just
the fluent-gcp one.
**What this PR does / why we need it**:
Bugfix; https://github.com/kubernetes/kubernetes/pull/49009 without this is eating more resources.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#50366
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
There is more than one container in fluentd-gcp deployment. Previous
implementation was setting resources for multiple containers, not just
the fluent-gcp one.
Automatic merge from submit-queue
Ensure that pricing expander is used by default in Cluster Autoscaler
Pricing expander was set as the default one for GCP, however on some occasion it was possible that AUTOSCALER_EXPANDER_CONFIG variable was not set resulting in using the the random expander.
Automatic merge from submit-queue (batch tested with PRs 48487, 49009, 49862, 49843, 49700)
Enable overriding fluentd resources in GCP
**What this PR does / why we need it**: This enables overriding fluentd resources in GCP, when there is a need for custom ones.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50119, 48366, 47181, 41611, 49547)
Add basic install and mount flexvolumes e2e tests
fixes https://github.com/kubernetes/kubernetes/issues/47010
These two tests install a skeleton "dummy" flex driver, attachable and non-attachable respectively, then test that a pod can successfully use the flex driver. They are labeled disruptive because kubelet and controller-manager get restarted as part of the flex install. IMO it's important to keep this install procedure as part of the test to isolate any bugs with the startup plugin probe code.
There is a bit of an ugly dependency on cluster/gce/config-test.sh because --flex-volume-plugin-dir must be set to a dir that's readable from controller-manager container and writable by the flex e2e test. The default path is not writable on GCE masters with read-only root so I picked a location that looks okay.
In the "dummy" drivers I trick kubelet into thinking there is a mount point by doing "mount -t tmpfs none ${MNTPATH} >/dev/null 2>&1", hope that is okay.
I have only tested on GCE and theoretically they may work on AWS but I don't think there is a need to test on multiple cloudproviders.
-->
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 49989, 49806, 49649, 49412, 49512)
Use existing k8s binaries and images on disk when they are preloaded to gce cos image.
**What this PR does / why we need it**:
This change is to accelerate K8S startup time on gce when k8s tarballs and images are already preloaded in VM image, by skipping the downloading, extracting and file transfer steps.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Add parallelism to GCE cluster upgrade
Fixes https://github.com/kubernetes/kubernetes/issues/48373
Should allow upgrading 500-node cluster (1.6->1.7) in < 1 hr. It currently takes ~1.5 day.
Though it is the duty of the upgrader to choose the right parallelism in order to avoid disrupting too many pods.
/cc @kubernetes/sig-cluster-lifecycle-pr-reviews @kubernetes/sig-scalability-misc @mikedanese @gmarek
Automatic merge from submit-queue (batch tested with PRs 49898, 49897, 49919, 48860, 49491)
gce: make append_or_replace.. atomic
Before this change,
* the final echo is not atomically written to the target file
* two concurrent callers will use the same tempfile
Helps with https://github.com/kubernetes/kubernetes/issues/49895
cc @miekg
Automatic merge from submit-queue (batch tested with PRs 48976, 49474, 40050, 49426, 49430)
Use presence of kubeconfig file to toggle standalone mode
Fixes#40049
```release-note
The deprecated --api-servers flag has been removed. Use --kubeconfig to provide API server connection information instead. The --require-kubeconfig flag is now deprecated. The default kubeconfig path is also deprecated. Both --require-kubeconfig and the default kubeconfig path will be removed in Kubernetes v1.10.0.
```
/cc @kubernetes/sig-cluster-lifecycle-misc @kubernetes/sig-node-misc
Automatic merge from submit-queue
Remove flags low-diskspace-threshold-mb and outofdisk-transition-frequency
issue: #48843
This removes two flags replaced by the eviction manager. These have been depreciated for two releases, which I believe correctly follows the kubernetes depreciation guidelines.
```release-note
Remove depreciated flags: --low-diskspace-threshold-mb and --outofdisk-transition-frequency, which are replaced by --eviction-hard
```
cc @mtaufen since I am changing kubelet flags
cc @vishh @derekwaynecarr
/sig node
Replaces use of --api-servers with --kubeconfig in Kubelet args across
the turnup scripts. In many cases this involves generating a kubeconfig
file for the Kubelet and placing it in the correct location on the node.
Automatic merge from submit-queue
Auto-calculate master disk and root disk sizes in GCE
@gmarek PR https://github.com/kubernetes/kubernetes/pull/49282 didn't fix the issue because MASTER_DISK_SIZE was defaulting to 20GB in config-test.sh before being calculated inside get-master-disk-size() where you use pre-existing value if any.
It should be fixed by this now.
Automatic merge from submit-queue (batch tested with PRs 48565, 49172)
On GCE check whether NODE_LOCAL_SSDS=0 and handle this case appropriately
**What this PR does / why we need it**: Presently if you are using a mac and GCE and specify NODE_LOCAL_SSDS=0, or use the default, you end up with 2 local SSDs.
**Which issue this PR fixes** : fixes https://github.com/kubernetes/kubernetes/issues/49171
**Special notes for your reviewer**:
I've discovered that this issue is due to b353792f9c/cluster/gce/util.sh (L579)
If NODE_LOCAL_SSDS=0, this evaluates to $(seq 0)
```
$ for i in $(seq 0); do echo $i; done
1
0
```
From man seq on mac osx
```
The seq utility prints a sequence of numbers, one per line (default), from first (default 1),
to near last as possible, in increments of incr (default 1).When first is larger than last the
default incr is -1.
```
This was run on mac with the seq manpage indicating it comes from BSD Feb 19 2010.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 49222, 49333, 48708, 49337)
Fix issue in installing containerized mounter
Fix PR #49335
PR #49157 causes failure when installing containerized mounter. This
PR is a fix for it
Automatic merge from submit-queue (batch tested with PRs 49120, 46755, 49157, 49165, 48950)
gce: don't print every file in mounter to stdout
This is printing ~3000 lines.
Automatic merge from submit-queue
Use Container-optimzed OS images for nodes by default
Part of the deprecation of the debian-based ContainerVM images.
```release-note
kube-up and kubemark will default to using cos (GCI) images for nodes.
The previous default was container-vm (CVM, "debian"), which is deprecated.
If you need to explicitly use container-vm for some reason, you should set
KUBE_NODE_OS_DISTRIBUTION=debian
```
Automatic merge from submit-queue
Pass cluster name to Heapster with Stackdriver sink.
**What this PR does / why we need it**:
Passes cluster name as argument to Heapster when it's used with Stackdriver sink to allow setting resource label 'cluster_name' in exported metrics.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 47918, 47964, 48151, 47881, 48299)
Add ApiEndpoint support to GCE config.
**What this PR does / why we need it**:
Add the ability to change ApiEndpoint for GCE.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
None
```
Automatic merge from submit-queue (batch tested with PRs 43558, 48261, 42376, 46803, 47058)
Add bind mount /etc/resolv.conf from host to containerized mounter
Currently, in containerized mounter rootfs, there is no DNS setup. If client
try to set up volume with host name instead of IP address, it will fail to resolve
the host name.
By bind mount the host's /etc/resolv.conf to mounter rootfs, VM hosts name
could be resolved when using host name during mount.
```release-note
Fixes issue where you could not mount NFS or glusterFS volumes using hostnames on GCI/GKE with COS images.
```
Automatic merge from submit-queue (batch tested with PRs 48004, 48205, 48130, 48207)
Do not set CNI in cases where there is a private master and network policy provider is set.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
In GCE and in a "private master" setup, do not set the network-plugin provider to CNI by default if a network policy provider is given.
```
Automatic merge from submit-queue
Make big clusters work again after introduction of subnets
This PR does two things:
- make IP aliases automatically pick Node IP Range based on number of Nodes,
- fix logic for starting clusters >4095 Nodes that was broken by introduction of subnets,
cc @wojtek-t @shyamjvs
```release-note
Setting env var ENABLE_BIG_CLUSTER_SUBNETS=true will allow kube-up.sh to start clusters bigger that 4095 Nodes on GCE.
```
Ref https://github.com/kubernetes/kubernetes/issues/47344
Automatic merge from submit-queue (batch tested with PRs 47993, 47892, 47591, 47469, 47845)
Bump up npd version to v0.4.1
```
Bump up npd version to v0.4.1
```
Fixes#47219
Automatic merge from submit-queue (batch tested with PRs 47993, 47892, 47591, 47469, 47845)
Use a different env var to enable the ip-masq-agent addon.
We shouldn't mix setting the non-masq-cidr with enabling the addon.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
https://github.com/kubernetes/kubernetes/issues/47865
Automatic merge from submit-queue (batch tested with PRs 47403, 46646, 46906, 46527, 46792)
Avoid redundant copying of tars during kube-up for gce if the same file already exists
**What this PR does / why we need it**:
Whenever I execute cluster/kube-up.sh it copies my tar files to google cloud, even if the files haven't changed. This PR checks to see whether the files already exist, and avoids uploading them again. These files are large and can take a long time to upload.
**Which issue this PR fixes**: fixes#46791
**Special notes for your reviewer**:
Here is the new output:
cluster/kube-up.sh
... Starting cluster in us-central1-b using provider gce
... calling verify-prereqs
... calling verify-kube-binaries
... calling kube-up
Project: PROJECT
Zone: us-central1-b
+++ Staging server tars to Google Storage: gs://kubernetes-staging-PROJECT/kubernetes-devel
+++ kubernetes-server-linux-amd64.tar.gz uploaded earlier, cloud and local file md5 match (md5 = 3a095kcf27267a71fe58f91f89fab1bc)
**Release note**:
```cluster/kube-up.sh on gce now avoids redundant copying of kubernetes tars if the local and cloud files' md5 hash match```
Automatic merge from submit-queue
Remove limits from ip-masq-agent for now and disable ip-masq-agent in GCE
ip-masq-agent when issuing an iptables-save will read any configured iptables on the node. This means that the ip-masq-agent's memory requirements would grow with the number of iptables (i.e. services) on the node.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
#47865
**Special notes for your reviewer**:
**Release note**:
```release-note
```
ip-masq-agent when issuing an iptables-save will read
any configured iptables on the node. This means that
the ip-masq-agent's memory requirements would grow
with the number of iptables (i.e. services) on the node.
Disable ip-masq-agent in GCE
Automatic merge from submit-queue
Add ip-masq-agent readiness label by default.
Since we are setting the non-masq-cidr in the kubelet to 0.0.0.0/0 we
need to ensure the ip-masq-agent runs.
pr/#46473 made the NON_MASQUERADE_CIDR default to 0.0.0.0/0 which means we need to have this label set now.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
fixes#47752
**Special notes for your reviewer**:
**Release note**:
```release-note
ip-masq-agent is now the default for GCE
```
Automatic merge from submit-queue (batch tested with PRs 45268, 47573, 47632, 47818)
NODE_TAINTS in gce startup scripts
Currently there is now way to pass a list of taints that should be added on node registration (at least not in gce or other saltbased deployment). This PR adds necessary plumbing to pass the taints from user or instance group template to kubelet startup flags.
```release-note
Taints support in gce/salt startup scripts.
```
The PR was manually tested.
```
NODE_TAINTS: 'dedicated=ml:NoSchedule'
```
in kube-env results in
```
spec:
[...]
taints:
- effect: NoSchedule
key: dedicated
timeAdded: null
value: ml
```
cc: @davidopp @gmarek @dchen1107 @MaciekPytel
setting the non-masq-cidr in the kubelet to 0.0.0.0/0 we
need to ensure the ip-masq-agent runs.
Add node label pre-req back to ip-masq-agent.
Make gce test consistent with gce default scripts.
Automatic merge from submit-queue (batch tested with PRs 46604, 47634)
Set price expander in Cluster Autoscaler for GCE
With CA 0.6 we will make price-preferred node expander the default one for GCE. For other cloud providers we will stick to the default one (random) until the community implement the required interfaces in CA repo.
https://github.com/kubernetes/autoscaler/issues/82
cc: @MaciekPytel @aleksandra-malinowska
Automatic merge from submit-queue (batch tested with PRs 46327, 47166)
mark --network-plugin-dir deprecated for kubelet
**What this PR does / why we need it**:
**Which issue this PR fixes** : fixes#43967
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 47530, 47679)
Use cos-stable-59-9460-64-0 instead of cos-beta-59-9460-20-0.
Remove dead code that has now moved to another repo as part of #47467
**Release note**:
```release-note
NONE
```
/sig node
Automatic merge from submit-queue (batch tested with PRs 47626, 47674, 47683, 47290, 47688)
The KUBE-METADATA-SERVER firewall must be applied before the universa…
…l tcp ACCEPT
**What this PR does / why we need it**: the metadata firewall rule was broken by being appended after the universal tcp accept.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 38751, 44282, 46382, 47603, 47606)
Working on fixing #43716.
This will create the necessary certificates.
On GCE is will upload those certificates to Metadata.
They are then pulled down on to the kube-apiserver.
They are written to the /etc/src/kubernetes/pki directory.
Finally they are loaded vi the appropriate command line flags.
The requestheader-client-ca-file can be seen by running the following:-
kubectl get ConfigMap extension-apiserver-authentication
--namespace=kube-system -o yaml
Minor bug fixes.
Made sure AGGR_MASTER_NAME is set up in all configs.
Clean up variable names.
Added additional requestheader configuration parameters.
Added check so that if there is no Aggregator CA contents we won't start
the aggregator with the relevant flags.
**What this PR does / why we need it**:
This PR creates a request header CA. It also creates a proxy client cert/key pair.
It causes these files to end up on kube-apiserver and set the CLI flags so they are properly loaded.
Without it the customer either has to set them up themselves or re-use the master CA which is a security vulnerability.
Currently this creates everything on GCE.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#43716
**Special notes for your reviewer**:
This is a reapply of pull/47094 with the GKE issue resolved.
**Release note**: None
- It contains a fix for ipaliasing.
- It contains a fix which decouples GPU driver installation from kernel
version.
Remove dead code that has now moved to another repo as part of #47467
Automatic merge from submit-queue
Don't start any Typha instances if not using Calico
**What this PR does / why we need it**:
Don't start any Typha instances if Calico isn't being used. A recent change now includes all add-ons on the master, but we don't always want a Typha replica.
**Which issue this PR fixes**
Fixes https://github.com/kubernetes/kubernetes/issues/47622
**Release note**:
```release-note
NONE
```
cc @dnardo
Automatic merge from submit-queue (batch tested with PRs 47562, 47605)
Adding option in node start script to add "volume-plugin-dir" flag to kubelet.
**What this PR does / why we need it**: Adds a variable to allow specifying FlexVolume driver directory through cluster/kube-up.sh. Without this, the process of setting up FlexVolume in a non-default directory is very manual.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#47561
Automatic merge from submit-queue
Add encryption provider support via environment variables
These changes are needed to allow cloud providers to use the encryption providers as an alpha feature. The version checks can be done in the respective cloud providers'.
Context: #46460 and #46916
@destijl @jcbsmpsn @smarterclayton
This will create the necessary certificates.
On GCE is will upload those certificates to Metadata.
They are then pulled down on to the kube-apiserver.
They are written to the /etc/src/kubernetes/pki directory.
Finally they are loaded vi the appropriate command line flags.
The requestheader-client-ca-file can be seen by running the following:-
kubectl get ConfigMap extension-apiserver-authentication
--namespace=kube-system -o yaml
Minor bug fixes.
Made sure AGGR_MASTER_NAME is set up in all configs.
Clean up variable names.
Added additional requestheader configuration parameters.
Added check so that if there is no Aggregator CA contents we won't start
the aggregator with the relevant flags.
Automatic merge from submit-queue
Fix dangling reference to gcloud alpha API for GCI (should be beta)
This reference to the alpha API was missed (fixed in GCE, but not GCI)
Fixes#47494
```release-note
none
```
Automatic merge from submit-queue (batch tested with PRs 47510, 47516, 47482, 47521, 47537)
Allow autoscaler min at 0 in GCE
Allow scaling migs to zero in GCE startup scripts. This only makes sense when there is more than 1 mig. The main use case (for now) will be to test scaling to to zero in e2e tests.
Automatic merge from submit-queue (batch tested with PRs 47000, 47188, 47094, 47323, 47124)
Set up proxy certs for Aggregator.
Working on fixing https://github.com/kubernetes/kubernetes/issues/43716.
This will create the necessary certificates.
On GCE is will upload those certificates to Metadata.
They are then pulled down on to the kube-apiserver.
They are written to the /etc/src/kubernetes/pki directory.
Finally they are loaded vi the appropriate command line flags.
The requestheader-client-ca-file can be seen by running the following:-
kubectl get ConfigMap extension-apiserver-authentication --namespace=kube-system -o yaml
**What this PR does / why we need it**:
This PR creates a request header CA. It also creates a proxy client cert/key pair.
It causes these files to end up on kube-apiserver and set the CLI flags so they are properly loaded.
Without it the customer either has to set them up themselves or re-use the master CA which is a security vulnerability.
Currently this creates everything on GCE.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#43716
**Special notes for your reviewer**:
Working on fixing https://github.com/kubernetes/kubernetes/issues/43716.
This will create the necessary certificates.
On GCE is will upload those certificates to Metadata.
They are then pulled down on to the kube-apiserver.
They are written to the /etc/src/kubernetes/pki directory.
Finally they are loaded vi the appropriate command line flags.
The requestheader-client-ca-file can be seen by running the following:-
kubectl get ConfigMap extension-apiserver-authentication
--namespace=kube-system -o yaml
Minor bug fixes.
Made sure AGGR_MASTER_NAME is set up in all configs.
Clean up variable names.
Added additional requestheader configuration parameters.
Automatic merge from submit-queue
Remove e2e-rbac-bindings.
Replace todo-grabbag binding w/ more specific heapster roles/bindings.
Move kubelet binding.
**What this PR does / why we need it**:
The "e2e-rbac-bindings" held 2 leftovers from the 1.6 RBAC rollout process:
- One is the "kubelet-binding" which grants the "system:node" role to kubelet. This is needed until we enable the node authorizer. I moved this to the folder w/ some other kubelet related bindings.
- The other is the "todo-remove-grabbag-cluster-admin" binding, which grants the cluster-admin role to the default service account in the kube-system namespace. This appears to only be required for heapster. Heapster will instead use a "heapster" service account, bound to a "system:heapster" role on the cluster (no write perms), and a "system:pod-nanny" role in the kube-system namespace.
**Which issue this PR fixes**: Addresses part of #39990
**Release Note**:
```release-note
New and upgraded 1.7 GCE/GKE clusters no longer have an RBAC ClusterRoleBinding that grants the `cluster-admin` ClusterRole to the `default` service account in the `kube-system` namespace.
If this permission is still desired, run the following command to explicitly grant it, either before or after upgrading to 1.7:
kubectl create clusterrolebinding kube-system-default --serviceaccount=kube-system:default --clusterrole=cluster-admin
```
Automatic merge from submit-queue
Audit webhook config for GCE
Add a `ADVANCED_AUDIT_BACKEND` (comma delimited list) environment variable to the GCE cluster config to select the audit backend, and add configuration for the webhook backend.
~~Based on the first commit from https://github.com/kubernetes/kubernetes/pull/46557~~
For kubernetes/features#22
Since this is GCE-only configuration plumbing, I think this should be exempt from code-freeze.
Automatic merge from submit-queue (batch tested with PRs 46897, 46899, 46864, 46854, 46875)
Write audit policy file for GCE/GKE configuration
Setup the audit policy configuration for GCE & GKE. Here is the high level summary of the policy:
- Default logging everything at `Metadata`
- Known write APIs default to `RequestResponse`
- Known read-only APIs default to `Request`
- Except secrets & configmaps are logged at `Metadata`
- Don't log events
- Don't log `/version`, swagger or healthchecks
In addition to the above, I spent time analyzing the noisiest lines in the audit log from a cluster that soaked for 24 hours (and ran a batch of e2e tests). Of those top requests, those that were identified as low-risk (all read-only, except update kube-system endpoints by controllers) are dropped.
I suspect we'll want to tweak this a bit more once we've had a time to soak it on some real clusters.
For kubernetes/features#22
/cc @sttts @ericchiang
Automatic merge from submit-queue (batch tested with PRs 46972, 42829, 46799, 46802, 46844)
promote tls-bootstrap to beta
last commit of this PR.
Towards https://github.com/kubernetes/kubernetes/issues/46999
```release-note
Promote kubelet tls bootstrap to beta. Add a non-experimental flag to use it and deprecate the old flag.
```
Automatic merge from submit-queue
Respect PDBs during node upgrades and add test coverage to the ServiceTest upgrade test.
This is still a WIP... needs to be squashed at least, and I don't think it's currently passing until I increase the scale of the RC, but please have a look at the general outline. Thanks!
Fixes#38336
@kow3ns @bdbauer @krousey @erictune @maisem @davidopp
```
On GCE, node upgrades will now respect PodDisruptionBudgets, if present.
```
Automatic merge from submit-queue
Adding a metadata proxy addon
**What this PR does / why we need it**: adds a metadata server proxy daemonset to hide kubelet secrets.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: this partially addresses #8867
**Special notes for your reviewer**:
**Release note**: the gce metadata server can be hidden behind a proxy, hiding the kubelet's token.
```release-note
The gce metadata server can be hidden behind a proxy, hiding the kubelet's token.
```
Automatic merge from submit-queue
Add initializer support to admission and uninitialized filtering to rest storage
Initializers are the opposite of finalizers - they allow API clients to react to object creation and populate fields prior to other clients seeing them.
High level description:
1. Add `metadata.initializers` field to all objects
2. By default, filter objects with > 0 initializers from LIST and WATCH to preserve legacy client behavior (known as partially-initialized objects)
3. Add an admission controller that populates .initializer values per type, and denies mutation of initializers except by certain privilege levels (you must have the `initialize` verb on a resource)
4. Allow partially-initialized objects to be viewed via LIST and WATCH for initializer types
5. When creating objects, the object is "held" by the server until the initializers list is empty
6. Allow some creators to bypass initialization (set initializers to `[]`), or to have the result returned immediately when the object is created.
The code here should be backwards compatible for all clients because they do not see partially initialized objects unless they GET the resource directly. The watch cache makes checking for partially initialized objects cheap. Some reflectors may need to change to ask for partially-initialized objects.
```release-note
Kubernetes resources, when the `Initializers` admission controller is enabled, can be initialized (defaulting or other additive functions) by other agents in the system prior to those resources being visible to other clients. An initialized resource is not visible to clients unless they request (for get, list, or watch) to see uninitialized resources with the `?includeUninitialized=true` query parameter. Once the initializers have completed the resource is then visible. Clients must have the the ability to perform the `initialize` action on a resource in order to modify it prior to initialization being completed.
```
Automatic merge from submit-queue (batch tested with PRs 46239, 46627, 46346, 46388, 46524)
Configure NPD version through env variable
This lets user specify NPD version to be installed with kubernetes.
Respect PDBs during node upgrades and add test coverage to the
ServiceTest upgrade test. Modified that test so that we include pod
anti-affinity constraints and a PDB.
Automatic merge from submit-queue
gcloud command syntax changed between alpha and beta versions
syntax for secondary-ranges changed from:
name=NAME,range=RANGE
to
NAME=RANGE
proxy_handler now uses the endpoint router to map the cluster IP to
appropriate endpoint (Pod) IP for the given resource.
Added code to allow aggregator routing to be optional.
Updated bazel build.
Fixes to cover JLiggit comments.
Added util ResourceLocation method based on Listers.
Fixed issues from verification steps.
Updated to add an interface to obfuscate some of the routing logic.
Collapsed cluster IP resolution in to the aggregator routing
implementation.
Added 2 simple unit tests for ResolveEndpoint
Setting this will trigger
cluster/addons/ip-masq-agent/ip-masq-agent.yaml to be installed as an
addon, which disable configure IP masquerade for all of RFC1918, rather
than just 10.0/8.
Automatic merge from submit-queue (batch tested with PRs 44774, 46266, 46248, 46403, 46430)
kube-proxy: ratelimit runs of iptables by sync-period flags
This bounds how frequently iptables can be synced. It will be no more often than every 10 seconds and no less often than every 1 minute, by default.
@timothysc FYI
@dcbw @freehan FYI
Automatic merge from submit-queue (batch tested with PRs 45573, 46354, 46376, 46162, 46366)
GCE - Retrieve subnetwork name/url from gce.conf
**What this PR does / why we need it**:
Features like ILB require specifying the subnetwork if the network is type manual.
**Notes:**
The network URL can be [constructed](68e7e18698/pkg/cloudprovider/providers/gce/gce.go (L211-L217)) by fetching instance metadata; however, the subnetwork is not provided through this feature. Users must specify the subnetwork name/url through the gce.conf.
Although multiple subnets can exist in the same region for a network, the cloud provider will only use one subnet url for creating LBs.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 46299, 46309, 46311, 46303, 46150)
Create a subnet for reserving the service cluster IP range
This will be done if IP aliases is enabled on GCP.
```release-note
NONE
```
Automatic merge from submit-queue
Allow the /logs handler on the apiserver to be toggled.
Adds a flag to kube-apiserver, and plumbs through en environment variable in configure-helper.sh
Packaged the script as a docker container stored in gcr.io/google-containers
A daemonset deployment is included to make it easy to consume the installer
A cluster e2e has been added to test the installation daemonset along with verifying installation
by using a sample CUDA application.
Node e2e for GPUs updated to avoid running on nodes without GPU devices.
Signed-off-by: Vishnu kannan <vishnuk@google.com>
Automatic merge from submit-queue
Update Calico add-on
**What this PR does / why we need it:**
Updates Calico to the latest version using self-hosted install as a DaemonSet, removes Calico's dependency on etcd.
- [x] Remove [last bits of Calico salt](175fe62720/cluster/saltbase/salt/calico/master.sls (L3))
- [x] Failing on the master since no kube-proxy to access API.
- [x] Fix outgoing NAT
- [x] Tweak to work on both debian / GCI (not just GCI)
- [x] Add the portmap plugin for host port support
Maybe:
- [ ] Add integration test
**Which issue this PR fixes:**
https://github.com/kubernetes/kubernetes/issues/32625
**Try it out**
Clone the PR, then:
```
make quick-release
export NETWORK_POLICY_PROVIDER=calico
export NODE_OS_DISTRIBUTION=gci
export MASTER_SIZE=n1-standard-4
./cluster/kube-up.sh
```
**Release note:**
```release-note
The Calico version included in kube-up for GCE has been updated to v2.2.
```
Automatic merge from submit-queue (batch tested with PRs 44606, 46038)
Add ip-masq-agent addon to the addons folder.
This also ensures that under gce we add this DaemonSet if the non-masq-cidr
is set to 0/0.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
Add ip-masq-agent addon to the addons folder which is used in GCE if --non-masquerade-cidr is set to 0/0
```
Automatic merge from submit-queue (batch tested with PRs 45569, 45602, 45604, 45478, 45550)
Enable kernel memcg notification for node and cluster GCI/COS testing.
Sets --experimental-kernel-memcg-notification=true when running on the GCI/COS image. It sets this for master and nodes for cluster e2e tests, and for the node in node e2e tests.
Issue #42676
cc @dchen1107 @Random-Liu
IP aliases are an alpha feature, and node accelerators are a beta
feature. $gcloud determines which is appropriate.
Before, this would try to run "gcloud alpha beta", which is incoherent.
Automatic merge from submit-queue (batch tested with PRs 44830, 45130)
Adding support for Accelerators to GCE clusters.
```release-note
Create clusters with GPUs in GKE by specifying "type=<gpu-type>,count=<gpu-count>" to NODE_ACCELERATORS env var.
List of available GPUs - https://cloud.google.com/compute/docs/gpus/#introduction
```
Automatic merge from submit-queue (batch tested with PRs 44590, 44969, 45325, 45208, 44714)
Enable basic auth username rotation for GCI
When changing basic auth creds, just delete the whole file, in order to be able to rotate username in addition to password.
Automatic merge from submit-queue (batch tested with PRs 45285, 45162)
mounter.go: format return err.
**What this PR does / why we need it**:
when an error returned is nil, it's preferred to explicitly return nil.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 44591, 44549)
Update repo-infra bazel dependency and use new gcs_upload rule
This PR provides similar functionality to push-build.sh entirely within Bazel rules (though it relies on gsutil).
It's an alternative to #44306.
Depends on https://github.com/kubernetes/repo-infra/pull/13.
**Release note**:
```release-note
NONE
```
Using Ubuntu on GCE to run cluster e2e tests requires slightly different
node.yaml and master.yaml files than GCI, because Ubuntu uses systemd as
PID 1, wheras GCI uses upstart with a systemd delegate. Therefore the
e2e tests fail using those files since the kubernetes services are not
brought back up after a node/master reboot.
Automatic merge from submit-queue
Use auto mode networks instead of legacy networks in GCP
Use of the --range flag creates legacy networks in GCP.
Legacy networks will not support new GCP features.
```release-note
NONE
```
Automatic merge from submit-queue
Add support for IP aliases for pod IPs (GCP alpha feature)
```release-note
Adds support for allocation of pod IPs via IP aliases.
# Adds KUBE_GCE_ENABLE_IP_ALIASES flag to the cluster up scripts (`kube-{up,down}.sh`).
KUBE_GCE_ENABLE_IP_ALIASES=true will enable allocation of PodCIDR ips
using the ip alias mechanism rather than using routes. This feature is currently
only available on GCE.
## Usage
$ CLUSTER_IP_RANGE=10.100.0.0/16 KUBE_GCE_ENABLE_IP_ALIASES=true bash -x cluster/kube-up.sh
# Adds CloudAllocator to the node CIDR allocator (kubernetes-controller manager).
If CIDRAllocatorType is set to `CloudCIDRAllocator`, then allocation
of CIDR allocation instead is done by the external cloud provider and
the node controller is only responsible for reflecting the allocation
into the node spec.
- Splits off the rangeAllocator from the cidr_allocator.go file.
- Adds cloudCIDRAllocator, which is used when the cloud provider allocates
the CIDR ranges externally. (GCE support only)
- Updates RBAC permission for node controller to include PATCH
```
KUBE_GCE_ENABLE_IP_ALIASES=true will enable allocation of PodCIDR ips
using the ip alias mechanism rather than using routes.
NODE_IP_RANGE will control the node instance IP cidr
KUBE_GCE_IP_ALIAS_SIZE controls the size of each podCIDR
IP_ALIAS_SUBNETWORK controls the name of the subnet created for the cluster
Automatic merge from submit-queue (batch tested with PRs 43866, 42748)
hack/cluster: download cfssl if not present
hack/local-up-cluster.sh uses cfssl to generate certificates and
will exit it cfssl is not already installed. But other cluster-up
mechanisms (GCE) that generate certs just download cfssl if not
present. Make local-up-cluster.sh do that too so users don't have
to bother installing it from somewhere.
Automatic merge from submit-queue (batch tested with PRs 42025, 44169, 43940)
if we have a dedicated serviceaccount keypair, use it to verify serviceaccounts
Automatic merge from submit-queue
[Federation] Remove FEDERATIONS_DOMAIN_MAP references
Remove all references to FEDERATIONS_DOMAIN_MAP as this method is no longer is used and is replaced by adding federation domain map to kube-dns configmap.
cc @madhusudancs @kubernetes/sig-federation-pr-reviews
**Release note**:
```
[Federation] Mechanism of adding `federation domain maps` to kube-dns deployment via `--federations` flag is superseded by adding/updating `federations` key in `kube-system/kube-dns` configmap. If user is using kubefed tool to join cluster federation, adding federation domain maps to kube-dns is already taken care by `kubefed join` and does not need further action.
```
Automatic merge from submit-queue
Use shared informers for proxy endpoints and service configs
Use shared informers instead of creating local controllers/reflectors
for the proxy's endpoints and service configs. This allows downstream
integrators to pass in preexisting shared informers to save on memory &
cpu usage.
This also enables the cache mutation detector for kube-proxy for those
presubmit jobs that already turn it on.
Follow-up to #43295 cc @wojtek-t
Will race with #43937 for conflicting changes 😄 cc @thockin
cc @smarterclayton @sttts @liggitt @deads2k @derekwaynecarr @eparis @kubernetes/rh-cluster-infra
Use shared informers instead of creating local controllers/reflectors
for the proxy's endpoints and service configs. This allows downstream
integrators to pass in preexisting shared informers to save on memory &
cpu usage.
This also enables the cache mutation detector for kube-proxy for those
presubmit jobs that already turn it on.
hack/local-up-cluster.sh uses cfssl to generate certificates and
will exit it cfssl is not already installed. But other cluster-up
mechanisms (GCE) that generate certs just download cfssl if not
present. Make local-up-cluster.sh do that too.
Per Clayton's suggestion, move stuff from cluster/lib/util.sh to
hack/lib/util.sh. Also consolidate ensure-temp-dir and use the
hack/lib/util.sh implementation rather than cluster/common.sh.
Automatic merge from submit-queue
Centos provider: generate SSL certificates for etcd cluster.
**What this PR does / why we need it**:
Support secure etcd cluster for centos provider, generate SSL certificates for etcd in default. Running it w/o SSL is exposing cluster data to everyone and is not recommended. [#39462](https://github.com/kubernetes/kubernetes/pull/39462#issuecomment-271601547)
/cc @jszczepkowski @zmerlynn
**Release note**:
```release-note
Support secure etcd cluster for centos provider.
```
Automatic merge from submit-queue
added prompt warning if etcd3 media type isn't set during upgrade
**What this PR does / why we need it**:
This adds a prompt confirming the upgrade when `STORAGE_MEDIA_TYPE` is not explicitly set. This is to prevent users from accidentally upgrading to protobuf.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
Alongs with docs, addresses #43669
**Special notes for your reviewer**:
Should be cherrypicked onto `release-1.6`
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 43546, 43544)
Default to enabling legacy ABAC policy in non-test kube-up.sh environments
Fixes https://github.com/kubernetes/kubernetes/issues/43541
In 1.5, we unconditionally stomped the abac policy file if KUBE_USER was set, and unconditionally used ABAC mode pointing to that file.
In 1.6, unless the user opts out (via `ENABLE_LEGACY_ABAC=false`), we want the same legacy policy included as a fallback to RBAC.
This PR:
* defaults legacy ABAC **on** in normal deployments
* defaults legacy ABAC **on** in upgrade E2Es (ensures combination of ABAC and RBAC works properly for upgraded clusters)
* defaults legacy ABAC **off** in non-upgrade E2Es (ensures e2e tests 1.6+ run with tightened permissions, and that default RBAC roles cover the required core components)
GKE changes to drive the `ENABLE_LEGACY_ABAC` envvar were made by @cjcullen out of band
```release-note
`kube-up.sh` using the `gce` provider enables both RBAC authorization and the permissive legacy ABAC policy that makes all service accounts superusers. To opt out of the permissive ABAC policy, export the environment variable `ENABLE_LEGACY_ABAC=false` before running `cluster/kube-up.sh`.
```
Automatic merge from submit-queue
Bump CNI consumers to v0.5.1
**What this PR does / why we need it**:
- vendored CNI plugins properly handle `DEL` on missing resources
- update CNI version refs
**Which issue this PR fixes**
fixes#43488
**Release note**:
`bumps CNI to version v0.5.1 where plugins properly handle DEL on non existent resources`
Automatic merge from submit-queue
Add an env KUBE_ENABLE_MASTER_NOSCHEDULE_TAINT and disable it by default
This PR changed master `NoSchedule` taint to opt-in.
As is discussed with @bgrant0607 @janetkuo, `NoSchedule` master taint breaks existing user workload, we should not enable it by default.
Previously, NPD required the taint because it can only support one OS distro with a specific configuration. If master and node are using different OS distros, NPD will not work either on master or node. However, we've already fixed this in https://github.com/kubernetes/kubernetes/pull/40206, so for NPD it's fine to disable the taint.
This should work, but I'll still try it in my cluster to confirm.
@kubernetes/sig-scheduling-misc @dchen1107 @mikedanese
Automatic merge from submit-queue (batch tested with PRs 43254, 43255, 43184, 42509)
Symlink cluster/gce/cos to cluster/gce/gci
Fixes: #43139
As I just unfortunately found out after spending an hour getting to the point where I could test this, upgrade.sh does not support upgrading nodes to local binaries. So someone will have to cut a release to test whether this change actually works.