Automatic merge from submit-queue (batch tested with PRs 48578, 48895, 48958)
use port configuration
**What this PR does / why we need it**: Uses the `port` config option in the kubeapi-load-balancer charm.
**Release note**:
```release-note
Uses the port config option in the kubeapi-load-balancer charm.
```
Automatic merge from submit-queue
remove some people from OWNERS so they don't get reviews anymore
These are googlers who don't work on the project anymore but are still
getting reviews assigned to them:
- @bprashanth
- @rjnagal
- @vmarmol
Automatic merge from submit-queue (batch tested with PRs 48812, 48276)
Change fluentd-gcp monitoring to use metrics exposed by SD plugin
Following https://github.com/GoogleCloudPlatform/fluent-plugin-google-cloud/pull/135, make fluentd-gcp expose metrics in Prometheus registry and use them instead of counting records in the pipeline.
/cc @piosz @igorpeshansky
```release-note
Fluentd-gcp DaemonSet exposes different set of metrics.
```
Automatic merge from submit-queue (batch tested with PRs 48864, 48651, 47703)
Enable logexporter mechanism to dump logs from k8s nodes to GCS directly
Ref https://github.com/kubernetes/kubernetes/issues/48513
This adds support for logexporter from k8s side. Next I'll send a PR adding support from test-infra side.
/cc @kubernetes/sig-scalability-misc @kubernetes/test-infra-maintainers @fejta @wojtek-t @gmarek
Automatic merge from submit-queue
Fixed cluster validation for multizonal clusters.
Fixed cluster validation for multizonal clusters.
This should fix HA master e2e tests.
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 46748, 48826)
Added `CriticalAddonsOnly` toleration for npd.
**What this PR does / why we need it**:
We should add `CriticalAddonsOnly` toleration to make sure the daemonset can be scheduled on the node even if already planned to run critical pod.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#47015
**Special notes for your reviewer**:
**Release note**:
```release-note
none
```
Automatic merge from submit-queue
Properly nest code blocks
**What this PR does / why we need it**:
Markdown code blocks are adjusted to better display on GitHub. See [rendered](c3fbec7663/cluster/addons/cluster-loadbalancing/glbc/README.md) version.
**Release note**:
```release-note
Adjust markdown code block in README for Google Load Balancer addon.
```
Automatic merge from submit-queue
Update docs for user-guide
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 48781, 48817, 48830, 48829, 48053)
Fix yaml-quote typo
Caught this looking through CI logs.
/assign wojtek-t
Automatic merge from submit-queue (batch tested with PRs 48279, 48566, 48319, 48794, 47952)
Add prometheus plugin on fluentd image.
**What this PR does / why we need it**:
This PR adds the prometheus plugin on Fluentd.
**Special notes for your reviewer**:
The plugin used was: https://github.com/kazegusuri/fluent-plugin-prometheus, on the latest stable version.
All configs used are default.
**Release note**:
```release-note
Fluentd-es addon now exposes a /metrics endpoint for monitoring on port 24231.
```
Automatic merge from submit-queue
Use Container-optimzed OS images for nodes by default
Part of the deprecation of the debian-based ContainerVM images.
```release-note
kube-up and kubemark will default to using cos (GCI) images for nodes.
The previous default was container-vm (CVM, "debian"), which is deprecated.
If you need to explicitly use container-vm for some reason, you should set
KUBE_NODE_OS_DISTRIBUTION=debian
```
Automatic merge from submit-queue
Pass cluster name to Heapster with Stackdriver sink.
**What this PR does / why we need it**:
Passes cluster name as argument to Heapster when it's used with Stackdriver sink to allow setting resource label 'cluster_name' in exported metrics.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 48405, 48742, 48748, 48571, 48482)
Setting default FlexVolume driver directory on COS images.
**What this PR does / why we need it**: The original default FlexVolume driver directory is not writable on COS. A new location is necessary to make FlexVolume work.
This directory doesn't exist by default. FlexVolume users need to create this directory, bind mount it, and remount with the executable permission. The other candidate is /home/kubernetes/bin, but the directory is already getting cluttered. I will submit a different PR for a script that automates this step.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#48570
Automatic merge from submit-queue (batch tested with PRs 48698, 48712, 48516, 48734, 48735)
GCE: Allow empty NETWORK_PROJECT_ID env var
Changes:
1. Adds `GCE_API_ENDPOINT` logic to container-linux as it was added to GCI in #47881.
1. Apply `NETWORK_PROJECT_ID` value to gce.conf only if the env var is set.
/sig network
/area platform/gce
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Launch kubemark with an existing Kubemark master
In order to expand the use of kubemark, allow developers to use kubemark with a pre-existing Kubernetes cluster.
Ref issue #44393
Automatic merge from submit-queue (batch tested with PRs 48399, 48450, 48144)
Skip errors when unregistering juju kubernetes-workers
**What this PR does / why we need it**: When removing a kubernetes node from using Juju and for some reason kubernetes master fails we should not error the node, instead we should proceed with the removal of the node and the master will recognise that node as unavailable because it will fail heartbeats.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/300
**Special notes for your reviewer**:
**Release note**:
```
Clean decommission of Juju kubernetes worker units
```
Automatic merge from submit-queue (batch tested with PRs 48399, 48450, 48144)
configure kube-proxy to run with unset conntrack param when in lxc
**What this PR does / why we need it**: Configures the Juju Charm code to run kube-proxy with `conntrack-max-per-core` set to `0` when in an lxc as a workaround for issues when mounting `/sys/module/nf_conntrack/parameters/hashsize`
**Release note**:
```release-note
Configures the Juju Charm code to run kube-proxy with conntrack-max-per-core set to 0 when in an lxc as a workaround for issues when mounting /sys/module/nf_conntrack/parameters/hashsize
```
Automatic merge from submit-queue (batch tested with PRs 47043, 48448, 47515, 48446)
Fix charms leaving services running after remove-unit
**What this PR does / why we need it**:
This fixes a case where removed charm units can sometimes leave behind running services that interfere with the rest of the cluster.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
Fix charms leaving services running after remove-unit
```
Automatic merge from submit-queue (batch tested with PRs 48439, 48440, 48394)
Fix kubernetes charms not restarting services after snap upgrades
**What this PR does / why we need it**:
This fixes a problem where the Kubernetes charms don't restart services after upgrading snaps. This can cause certain fixes not to be picked up (for example https://github.com/juju-solutions/release/pull/10)
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
Fixed kubernetes charms not restarting services after snap upgrades
```
Automatic merge from submit-queue (batch tested with PRs 48439, 48440, 48394)
Fix: namespace-create have kubectl in path
**What this PR does / why we need it**: In juju deployed clusters namespace-create action is failing
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/326
**Special notes for your reviewer**:
**Release note**:
```Fix: namespace-create action on Juju deployed clusters
```
Automatic merge from submit-queue
Add configuration for swift container name
**What this PR does / why we need it:**
This review updates the OpenStack Heat provider to allow for configuring the name of the Swift object store.
**Which issue this PR fixes:**
fixes#47966
**Special notes for your reviewer**:
Note that the terminology for OpenStack Swift conflicts with K8S terminology. In this instance, container is referring to the organization structure of Swift storage objects.
**Release note**:
```release-note
Adds configuration option for Swift object store container name to OpenStack Heat provider.
```
Automatic merge from submit-queue (batch tested with PRs 48317, 48313, 48351, 48357, 48115)
Ensure get_password is accessing a file that exists.
**What this PR does / why we need it**: get_password will throw an exception instead of returning None in case the basic_auth.csv file is missing but /root/cdk/ is there in a juju deployment.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/324
**Special notes for your reviewer**:
**Release note**:
```
Fix race condition where /root/cdk is not yet initialised in kubernetes-master setup by Juju
```
Automatic merge from submit-queue (batch tested with PRs 47918, 47964, 48151, 47881, 48299)
Add ApiEndpoint support to GCE config.
**What this PR does / why we need it**:
Add the ability to change ApiEndpoint for GCE.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
None
```
Automatic merge from submit-queue (batch tested with PRs 43558, 48261, 42376, 46803, 47058)
Add bind mount /etc/resolv.conf from host to containerized mounter
Currently, in containerized mounter rootfs, there is no DNS setup. If client
try to set up volume with host name instead of IP address, it will fail to resolve
the host name.
By bind mount the host's /etc/resolv.conf to mounter rootfs, VM hosts name
could be resolved when using host name during mount.
```release-note
Fixes issue where you could not mount NFS or glusterFS volumes using hostnames on GCI/GKE with COS images.
```
Automatic merge from submit-queue (batch tested with PRs 47850, 47835, 46197, 47250, 48284)
Securing the cluster created by Juju
**What this PR does / why we need it**: This PR secures the deployments done with Juju master. Works around certain security issues inherent to kubernetes (see for example dashboard access)
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```
Securing Juju kubernetes dashboard
```
Automatic merge from submit-queue (batch tested with PRs 46850, 47984)
Update addon-resizer version
Update addon-resizer version and remove the flags that have been deprecated in the new version.
**What this PR does / why we need it**:
ref kubernetes/contrib#2623
**Special notes for your reviewer**:
Need to wait for merging kubernetes/contrib#2623 first.
**Release note**:
```release-note
addon-resizer flapping behavior was removed.
```
Automatic merge from submit-queue
Allow log-dumping only N randomly-chosen nodes in the cluster
This should let us save "lots" (~3-4 hours) of time in our 5000-node cluster scale tests as we copy logs from all the nodes to jenkins worker and then upload all of them to gcs (while we don't need too many).
This will also prevent the jenkins container facing "No space left on device" error while dumping logs, that we saw in runs 12-13 of gce-enormous-cluster.
The longterm fix will be to enable [logexporter](https://github.com/kubernetes/test-infra/tree/master/logexporter) for our tests.
cc @kubernetes/sig-scalability-misc @kubernetes/test-infra-maintainers @gmarek @fejta
Automatic merge from submit-queue (batch tested with PRs 48004, 48205, 48130, 48207)
Bumped Heapster to v1.4.0
``` release-note
Bumped Heapster to v1.4.0.
More details about the release https://github.com/kubernetes/heapster/releases/tag/v1.4.0
```
follow up #47961
The release candidate `v1.4.0-beta.0` turned out to be stable.
Automatic merge from submit-queue (batch tested with PRs 48004, 48205, 48130, 48207)
Do not set CNI in cases where there is a private master and network policy provider is set.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
In GCE and in a "private master" setup, do not set the network-plugin provider to CNI by default if a network policy provider is given.
```
Automatic merge from submit-queue (batch tested with PRs 48192, 48182)
Add generic NoSchedule toleration to fluentd in gcp config as a quick…
…-fix for #44445
Automatic merge from submit-queue (batch tested with PRs 48139, 48042, 47645, 48054, 48003)
Add a failsafe for etcd not returning a connection string
**What this PR does / why we need it**: Removing a kubernetes-master will fail as described on this issue: https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/311
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/311
**Special notes for your reviewer**: This is a two liner defensive code. I am not totally sold on this patch. I might not be the right place to address the above issue. However, solving the problem on the etcd side and updating the interface scope to be unit (as suggested) seems much more involving.
**Release note**:
```
Fix error when removing juju kubernetes-master unit
```
Automatic merge from submit-queue
Make big clusters work again after introduction of subnets
This PR does two things:
- make IP aliases automatically pick Node IP Range based on number of Nodes,
- fix logic for starting clusters >4095 Nodes that was broken by introduction of subnets,
cc @wojtek-t @shyamjvs
```release-note
Setting env var ENABLE_BIG_CLUSTER_SUBNETS=true will allow kube-up.sh to start clusters bigger that 4095 Nodes on GCE.
```
Ref https://github.com/kubernetes/kubernetes/issues/47344
Automatic merge from submit-queue
Insert Cynerva and Kjackal to approvers list
**What this PR does / why we need it**:
Per the membership reviews, we're looking to promote Konstantinos and
George to approvers to help distribute the review/bug load for the `cluster/juju` code
tree.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
**Special notes for your reviewer**:
cc @marcoceppi and @tvansteenburgh
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 48092, 47894, 47983)
fix systemd service file for custom args.
`KUBE_SCHEDULER_ARGS` and `KUBELET_ARGS` are used to custom args for scheduler or kubelet by users.
But if there are more than one params in `KUBELET_ARGS`, for example, if I set KUBELET_ARGS="--cgroups-per-qos=false --enforce-node-allocatable=", the kubelet will judge the `false --enforce-node-allocatable=` as the value of `cgroups-per-qos`. Because `${KUBELET_ARGS}` in kubelet.service will expands the variable into one word. And if I take `$KUBELET_ARGS` instead, kubelet will worker perfectly.
For more info, please click [EnvironmentFiles and support for /etc/sysconfig files](http://fedoraproject.org/wiki/Packaging:Systemd#EnvironmentFiles_and_support_for_.2Fetc.2Fsysconfig_files). This bug is reported by @huanxingyouyoutoo. And I make this PR for her to fix it.
**Release note**:
```
NONE
```
Automatic merge from submit-queue (batch tested with PRs 48012, 47443, 47702, 47178)
Fix setting juju worker labels during deployment
**What this PR does / why we need it**: Allows for setting the labels of juju workers during deployment (eg inside a bundle)
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#47176
**Special notes for your reviewer**:
**Release note**:
```
Fix bug in setting Juju kubernetes-worker labels in bundle.yaml files.
```
Automatic merge from submit-queue (batch tested with PRs 47860, 47170)
Fix restart action on juju kubernetes-master
**What this PR does / why we need it**: Restart action of kubernetes-master of Juju is not functioning.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/299
**Special notes for your reviewer**:
**Release note**:
```
Fix: Restart action of juju's kubernetes-master restarts the respective snap based services
```
Automatic merge from submit-queue (batch tested with PRs 47860, 47170)
Make fluentd log to stdio instead of a dedicated file
Lower verbosity also, to reduce volume of system logs exported to the backend.
Fix https://github.com/kubernetes/kubernetes/issues/43772
/cc @piosz
Automatic merge from submit-queue (batch tested with PRs 47961, 46276)
Bumped Heapster to v1.4.0-beta.0
Heapster release candidate for Kubernetes 1.7
cc @dchen1107 @caesarxuchao
Automatic merge from submit-queue
Parameterize the binary path and host arch for the hyperkube image
As the [cluster/images/hyperkube/README.md](https://github.com/kubernetes/kubernetes/tree/master/cluster/images/hyperkube) shows, I run the command: `make build VERSION=test ARCH=ppc64le`, but got the below errors, so this PR will fix it.
```
ARCH=ppc64le
cp -r ./* /tmp/hyperkubeTFbYrI
mkdir -p /tmp/hyperkubeTFbYrI/cni-bin
cp ../../../_output/dockerized/bin/linux/ppc64le/hyperkube /tmp/hyperkubeTFbYrI
cp: cannot stat '../../../_output/dockerized/bin/linux/ppc64le/hyperkube': No such file or directory
Makefile:62: recipe for target 'build' failed
make: *** [build] Error 1
```
Automatic merge from submit-queue (batch tested with PRs 47993, 47892, 47591, 47469, 47845)
Bump up npd version to v0.4.1
```
Bump up npd version to v0.4.1
```
Fixes#47219
Automatic merge from submit-queue (batch tested with PRs 47993, 47892, 47591, 47469, 47845)
Use a different env var to enable the ip-masq-agent addon.
We shouldn't mix setting the non-masq-cidr with enabling the addon.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
https://github.com/kubernetes/kubernetes/issues/47865
Automatic merge from submit-queue (batch tested with PRs 47883, 47179, 46966, 47982, 47945)
Strip versions from known api groups in audit policy
Props to @CaoShuFeng for catching this.
Issue: kubernetes/features#22
/cc @ericchiang
Automatic merge from submit-queue (batch tested with PRs 46151, 47602, 47507, 46203, 47471)
Add RBAC support to fluentd-elasticsearch cluster addon
**What this PR does / why we need it**:
Adds rbac support to the fluentd-elasticsearch addon
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#46023
**Special notes for your reviewer**:
**Release note**:
```release-note
Add RBAC support to fluentd-elasticsearch cluster addon
```
Automatic merge from submit-queue (batch tested with PRs 46151, 47602, 47507, 46203, 47471)
es discovery support args apiserver-host and kubeconfig
Now discovery elasticsearch through kubernetes client,but now does not support specifying the apiserver-host or kubeconfig create client.
Automatic merge from submit-queue (batch tested with PRs 47403, 46646, 46906, 46527, 46792)
Avoid redundant copying of tars during kube-up for gce if the same file already exists
**What this PR does / why we need it**:
Whenever I execute cluster/kube-up.sh it copies my tar files to google cloud, even if the files haven't changed. This PR checks to see whether the files already exist, and avoids uploading them again. These files are large and can take a long time to upload.
**Which issue this PR fixes**: fixes#46791
**Special notes for your reviewer**:
Here is the new output:
cluster/kube-up.sh
... Starting cluster in us-central1-b using provider gce
... calling verify-prereqs
... calling verify-kube-binaries
... calling kube-up
Project: PROJECT
Zone: us-central1-b
+++ Staging server tars to Google Storage: gs://kubernetes-staging-PROJECT/kubernetes-devel
+++ kubernetes-server-linux-amd64.tar.gz uploaded earlier, cloud and local file md5 match (md5 = 3a095kcf27267a71fe58f91f89fab1bc)
**Release note**:
```cluster/kube-up.sh on gce now avoids redundant copying of kubernetes tars if the local and cloud files' md5 hash match```
Automatic merge from submit-queue
Remove limits from ip-masq-agent for now and disable ip-masq-agent in GCE
ip-masq-agent when issuing an iptables-save will read any configured iptables on the node. This means that the ip-masq-agent's memory requirements would grow with the number of iptables (i.e. services) on the node.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
#47865
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue
Don't audit log tokens in TokenReviews
We don't want to leak auth tokens in the audit logs, so only log TokenReview requests at the metadata level.
Issue: kubernetes/features#22
ip-masq-agent when issuing an iptables-save will read
any configured iptables on the node. This means that
the ip-masq-agent's memory requirements would grow
with the number of iptables (i.e. services) on the node.
Disable ip-masq-agent in GCE
Automatic merge from submit-queue
Fix broken command in registry addon document
**What this PR does / why we need it**:
Fix a command example in registry addon document so it matches the example yaml above.
Automatic merge from submit-queue (batch tested with PRs 47906, 47910)
Reduce scheduler CPU request to 75m
On a 1 cpu master we are over budget with CPU requests. Components like npd or cluster autoscaler don't have *any* space to run. We need to reduce some requests.
cc: @gmarek @mikedanese @roberthbailey @davidopp @dchen1107
Automatic merge from submit-queue
Add aleksandra-malinowska to cluster-autoscaler salt definition owners
@aleksandra-malinowska is working on Cluster Autoscaler so she should be added to reviewers and approvers.
Automatic merge from submit-queue
Reduce Cluster Autoscaler cpu request to 10m
We are super tight on 1 cpu master node. With the recent changes we cannot fit to the master if request is bigger than 10m.
cc: @gmarek @MaciekPytel @aleksandra-malinowska
Automatic merge from submit-queue
Update addons with upstream CVE fixes
**What this PR does / why we need it**: refreshes the kube-dns, metadata-proxy, and fluentd-gcp, event-exporter, prometheus-to-sd, and ip-masq-agent addons with new base images containing fixes for the following vulnerabilities:
* CVE-2016-4448
* CVE-2016-9841
* CVE-2016-9843
* CVE-2017-1000366
* CVE-2017-2616
* CVE-2017-9526
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#47386 (yay!)
**Special notes for your reviewer**:
**Release note**:
```release-note
Update kube-dns, metadata-proxy, and fluentd-gcp, event-exporter, prometheus-to-sd, and ip-masq-agent addons with new base images containing fixes for CVE-2016-4448, CVE-2016-9841, CVE-2016-9843, CVE-2017-1000366, CVE-2017-2616, and CVE-2017-9526.
```
/assign @bowei @MrHohn @Q-Lee @crassirostris @dnardo
/cc @dchen1107 @timstclair
Automatic merge from submit-queue (batch tested with PRs 42252, 42251, 42249, 47512, 47887)
Bump the memory request/limit for ip-masq-daemon.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
issue #47865
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Per the membership reviews we're looking to promote Konstantinos and
George to approvers to help distribute the review/bug load for the juju code
tree.
Automatic merge from submit-queue
Add ip-masq-agent readiness label by default.
Since we are setting the non-masq-cidr in the kubelet to 0.0.0.0/0 we
need to ensure the ip-masq-agent runs.
pr/#46473 made the NON_MASQUERADE_CIDR default to 0.0.0.0/0 which means we need to have this label set now.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
fixes#47752
**Special notes for your reviewer**:
**Release note**:
```release-note
ip-masq-agent is now the default for GCE
```
Automatic merge from submit-queue
fix validate-cluster.sh
attempt to fix#47379.
Without this fix, the validate-cluster.sh never retries if `kubectl-retry get cs` fails.
cc @dchen1107
Automatic merge from submit-queue (batch tested with PRs 45268, 47573, 47632, 47818)
NODE_TAINTS in gce startup scripts
Currently there is now way to pass a list of taints that should be added on node registration (at least not in gce or other saltbased deployment). This PR adds necessary plumbing to pass the taints from user or instance group template to kubelet startup flags.
```release-note
Taints support in gce/salt startup scripts.
```
The PR was manually tested.
```
NODE_TAINTS: 'dedicated=ml:NoSchedule'
```
in kube-env results in
```
spec:
[...]
taints:
- effect: NoSchedule
key: dedicated
timeAdded: null
value: ml
```
cc: @davidopp @gmarek @dchen1107 @MaciekPytel
setting the non-masq-cidr in the kubelet to 0.0.0.0/0 we
need to ensure the ip-masq-agent runs.
Add node label pre-req back to ip-masq-agent.
Make gce test consistent with gce default scripts.
Automatic merge from submit-queue (batch tested with PRs 46604, 47634)
Set price expander in Cluster Autoscaler for GCE
With CA 0.6 we will make price-preferred node expander the default one for GCE. For other cloud providers we will stick to the default one (random) until the community implement the required interfaces in CA repo.
https://github.com/kubernetes/autoscaler/issues/82
cc: @MaciekPytel @aleksandra-malinowska
Automatic merge from submit-queue (batch tested with PRs 47669, 40284, 47356, 47458, 47701)
Standardize on home/kubernetes/bin for CNI
**What this PR does / why we need it**:
Standardizes where CNI plugins get installed on GCE.
**Which issue this PR fixes**
Fixes: https://github.com/kubernetes/kubernetes/issues/47453
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 47669, 40284, 47356, 47458, 47701)
Mark Static pods on the Master as critical
fixes#47277.
A known issue with static pods is that they do not interact well with evictions. If a static pod is evicted or oom killed, then it will never be recreated. To mitigate this, we do not evict static pods that are critical. In addition, non-critical pods are candidates for preemption if a critical pod is scheduled to the node. If there are not enough allocatable resources on the node, this causes the static pod to be preempted.
This PR marks all static pods in the kube-system namspace as critical.
cc @vishh @dchen1107
Automatic merge from submit-queue (batch tested with PRs 46327, 47166)
mark --network-plugin-dir deprecated for kubelet
**What this PR does / why we need it**:
**Which issue this PR fixes** : fixes#43967
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 47530, 47679)
Use cos-stable-59-9460-64-0 instead of cos-beta-59-9460-20-0.
Remove dead code that has now moved to another repo as part of #47467
**Release note**:
```release-note
NONE
```
/sig node
Automatic merge from submit-queue (batch tested with PRs 47626, 47674, 47683, 47290, 47688)
Use echoserver:1.6 for better debugging and XSS prevention.
**What this PR does / why we need it**: This updates our test code to use a newer echoserver with XSS preventions.
**Which issue this PR fixes**: fixes#47682
**Special notes for your reviewer**: Marking as 1.7 since it's a fix to test code.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 47626, 47674, 47683, 47290, 47688)
Fix Juju kubernetes-master idle status never being set
**What this PR does / why we need it**:
This fixes a problem with the kubernetes-master charm where the "Kubernetes master running." status message never gets set.
This happens because the `kube-api-endpoint.connected` state that it's waiting for doesn't exist. The state we need is `kube-api-endpoint.available` as seen [here](https://github.com/juju-solutions/interface-http/blob/master/provides.py#L12).
Additionally, we need to add the relation arguments to idle_status so it doesn't break when called.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#47676
**Special notes for your reviewer**:
**Release note**:
```release-note
Fix Juju kubernetes-master idle status never being set
```
Automatic merge from submit-queue (batch tested with PRs 47626, 47674, 47683, 47290, 47688)
The KUBE-METADATA-SERVER firewall must be applied before the universa…
…l tcp ACCEPT
**What this PR does / why we need it**: the metadata firewall rule was broken by being appended after the universal tcp accept.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 38751, 44282, 46382, 47603, 47606)
Working on fixing #43716.
This will create the necessary certificates.
On GCE is will upload those certificates to Metadata.
They are then pulled down on to the kube-apiserver.
They are written to the /etc/src/kubernetes/pki directory.
Finally they are loaded vi the appropriate command line flags.
The requestheader-client-ca-file can be seen by running the following:-
kubectl get ConfigMap extension-apiserver-authentication
--namespace=kube-system -o yaml
Minor bug fixes.
Made sure AGGR_MASTER_NAME is set up in all configs.
Clean up variable names.
Added additional requestheader configuration parameters.
Added check so that if there is no Aggregator CA contents we won't start
the aggregator with the relevant flags.
**What this PR does / why we need it**:
This PR creates a request header CA. It also creates a proxy client cert/key pair.
It causes these files to end up on kube-apiserver and set the CLI flags so they are properly loaded.
Without it the customer either has to set them up themselves or re-use the master CA which is a security vulnerability.
Currently this creates everything on GCE.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#43716
**Special notes for your reviewer**:
This is a reapply of pull/47094 with the GKE issue resolved.
**Release note**: None
- It contains a fix for ipaliasing.
- It contains a fix which decouples GPU driver installation from kernel
version.
Remove dead code that has now moved to another repo as part of #47467
Automatic merge from submit-queue
Don't start any Typha instances if not using Calico
**What this PR does / why we need it**:
Don't start any Typha instances if Calico isn't being used. A recent change now includes all add-ons on the master, but we don't always want a Typha replica.
**Which issue this PR fixes**
Fixes https://github.com/kubernetes/kubernetes/issues/47622
**Release note**:
```release-note
NONE
```
cc @dnardo
Automatic merge from submit-queue (batch tested with PRs 47562, 47605)
Adding option in node start script to add "volume-plugin-dir" flag to kubelet.
**What this PR does / why we need it**: Adds a variable to allow specifying FlexVolume driver directory through cluster/kube-up.sh. Without this, the process of setting up FlexVolume in a non-default directory is very manual.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#47561
Automatic merge from submit-queue
Add encryption provider support via environment variables
These changes are needed to allow cloud providers to use the encryption providers as an alpha feature. The version checks can be done in the respective cloud providers'.
Context: #46460 and #46916
@destijl @jcbsmpsn @smarterclayton
Automatic merge from submit-queue
[GCE] Bump GLBC version to 0.9.5
Fixes#47559
```release-note
Bump GLBC version to 0.9.5 - fixes [loss of manually modified GCLB health check settings](https://github.com/kubernetes/kubernetes/issues/47559) upon upgrade from pre-1.6.4 to either 1.6.4 or 1.6.5, or from pre-1.5.8 to 1.5.8.
```
This will create the necessary certificates.
On GCE is will upload those certificates to Metadata.
They are then pulled down on to the kube-apiserver.
They are written to the /etc/src/kubernetes/pki directory.
Finally they are loaded vi the appropriate command line flags.
The requestheader-client-ca-file can be seen by running the following:-
kubectl get ConfigMap extension-apiserver-authentication
--namespace=kube-system -o yaml
Minor bug fixes.
Made sure AGGR_MASTER_NAME is set up in all configs.
Clean up variable names.
Added additional requestheader configuration parameters.
Added check so that if there is no Aggregator CA contents we won't start
the aggregator with the relevant flags.
Automatic merge from submit-queue (batch tested with PRs 47492, 47542, 46800, 47545, 45764)
Update addons with upstream CVE fixes
**What this PR does / why we need it**: refreshes the cluster-proportional-autoscaler, metadata-proxy, and fluentd-gcp addons with new base images with fixes for the following vulnerabilities:
* CVE-2016-4448
* CVE-2016-8859
* CVE-2016-9841
* CVE-2016-9843
* CVE-2017-9526
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: x-ref #47386, though there are still a few images left to update
**Release note**:
```release-note
Update cluster-proportional-autoscaler, metadata-proxy, and fluentd-gcp addons with fixes for CVE-2016-4448, CVE-2016-8859, CVE-2016-9841, CVE-2016-9843, and CVE-2017-9526.
```
/cc @timstclair @MrHohn @Q-Lee @crassirostris
Automatic merge from submit-queue
Fix dangling reference to gcloud alpha API for GCI (should be beta)
This reference to the alpha API was missed (fixed in GCE, but not GCI)
Fixes#47494
```release-note
none
```
Automatic merge from submit-queue (batch tested with PRs 47510, 47516, 47482, 47521, 47537)
Allow autoscaler min at 0 in GCE
Allow scaling migs to zero in GCE startup scripts. This only makes sense when there is more than 1 mig. The main use case (for now) will be to test scaling to to zero in e2e tests.
Automatic merge from submit-queue (batch tested with PRs 47302, 47389, 47402, 47468, 47459)
Change port on which fluentd exposes its metrics
Fix https://github.com/kubernetes/kubernetes/issues/47397
/cc @Q-Lee @nicksardo
```release-note
Stackdriver Logging deployment exposes metrics on node port 31337 when enabled.
```
Automatic merge from submit-queue (batch tested with PRs 47302, 47389, 47402, 47468, 47459)
Update to kube-addon-manager:v6.4-beta.2: kubectl v1.6.4 and refreshed base images
**What this PR does / why we need it**: refreshes base images for kube-addon-manager with fixes for CVE-2016-9841 and CVE-2016-9843.
x-ref https://github.com/kubernetes/kubernetes/issues/47386
**Special notes for your reviewer**: the updated images are not yet pushed, so tests will fail until that's done.
**Release note**:
```release-note
```
/assign @MrHohn
Automatic merge from submit-queue (batch tested with PRs 46441, 43987, 46921, 46823, 47276)
Enable Node authorizer and NodeRestriction admission in kubemark
xref https://github.com/kubernetes/features/issues/279
We want to ensure scale testing covers use of the authorizer/admission pair that partitions nodes. This includes enabling the authorizer, which populates a graph of existing nodes and pods.
Kubemark is still running all nodes with a single credential, so a follow-up step is to generate unique credentials per node (or enable TLS bootstrapping) and remove the temporary rolebinding added in this PR so the node authorizer is the one authorizing each call by a hollow node.
Automatic merge from submit-queue (batch tested with PRs 47000, 47188, 47094, 47323, 47124)
Set up proxy certs for Aggregator.
Working on fixing https://github.com/kubernetes/kubernetes/issues/43716.
This will create the necessary certificates.
On GCE is will upload those certificates to Metadata.
They are then pulled down on to the kube-apiserver.
They are written to the /etc/src/kubernetes/pki directory.
Finally they are loaded vi the appropriate command line flags.
The requestheader-client-ca-file can be seen by running the following:-
kubectl get ConfigMap extension-apiserver-authentication --namespace=kube-system -o yaml
**What this PR does / why we need it**:
This PR creates a request header CA. It also creates a proxy client cert/key pair.
It causes these files to end up on kube-apiserver and set the CLI flags so they are properly loaded.
Without it the customer either has to set them up themselves or re-use the master CA which is a security vulnerability.
Currently this creates everything on GCE.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#43716
**Special notes for your reviewer**:
Automatic merge from submit-queue (batch tested with PRs 47000, 47188, 47094, 47323, 47124)
Add Calico typha agent
**What this PR does / why we need it**:
- Adds the Calico typha agent with autoscaling to the GCE scripts.
- Adds logic to adjust Calico resource requests based on cluster size.
Fixes https://github.com/kubernetes/kubernetes/issues/47269
**Special notes for your reviewer**:
CC @dnardo
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Make fluentd-gcp run with host network
Fluentd-gcp should have access to instance's platform-dependent service account in order to work.
/cc @piosz
Working on fixing https://github.com/kubernetes/kubernetes/issues/43716.
This will create the necessary certificates.
On GCE is will upload those certificates to Metadata.
They are then pulled down on to the kube-apiserver.
They are written to the /etc/src/kubernetes/pki directory.
Finally they are loaded vi the appropriate command line flags.
The requestheader-client-ca-file can be seen by running the following:-
kubectl get ConfigMap extension-apiserver-authentication
--namespace=kube-system -o yaml
Minor bug fixes.
Made sure AGGR_MASTER_NAME is set up in all configs.
Clean up variable names.
Added additional requestheader configuration parameters.
Automatic merge from submit-queue
Remove e2e-rbac-bindings.
Replace todo-grabbag binding w/ more specific heapster roles/bindings.
Move kubelet binding.
**What this PR does / why we need it**:
The "e2e-rbac-bindings" held 2 leftovers from the 1.6 RBAC rollout process:
- One is the "kubelet-binding" which grants the "system:node" role to kubelet. This is needed until we enable the node authorizer. I moved this to the folder w/ some other kubelet related bindings.
- The other is the "todo-remove-grabbag-cluster-admin" binding, which grants the cluster-admin role to the default service account in the kube-system namespace. This appears to only be required for heapster. Heapster will instead use a "heapster" service account, bound to a "system:heapster" role on the cluster (no write perms), and a "system:pod-nanny" role in the kube-system namespace.
**Which issue this PR fixes**: Addresses part of #39990
**Release Note**:
```release-note
New and upgraded 1.7 GCE/GKE clusters no longer have an RBAC ClusterRoleBinding that grants the `cluster-admin` ClusterRole to the `default` service account in the `kube-system` namespace.
If this permission is still desired, run the following command to explicitly grant it, either before or after upgrading to 1.7:
kubectl create clusterrolebinding kube-system-default --serviceaccount=kube-system:default --clusterrole=cluster-admin
```
Automatic merge from submit-queue
Audit webhook config for GCE
Add a `ADVANCED_AUDIT_BACKEND` (comma delimited list) environment variable to the GCE cluster config to select the audit backend, and add configuration for the webhook backend.
~~Based on the first commit from https://github.com/kubernetes/kubernetes/pull/46557~~
For kubernetes/features#22
Since this is GCE-only configuration plumbing, I think this should be exempt from code-freeze.
Automatic merge from submit-queue
Remove Initializers from admission-control in kubernetes-master charm for pre-1.7
**What this PR does / why we need it**:
This fixes a problem with the kubernetes-master charm where kube-apiserver never comes up:
```
failed to initialize admission: Unknown admission plugin: Initializers
```
The Initializers plugin does not exist before Kubernetes 1.7. The charm needs to support 1.6 as well.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#47062
**Special notes for your reviewer**:
This fixes a problem introduced by https://github.com/kubernetes/kubernetes/pull/36721
**Release note**:
```release-note
Remove Initializers from admission-control in kubernetes-master charm for pre-1.7
```
Automatic merge from submit-queue
Fixes 47182
**What this PR does / why we need it**: Adds some state guards to the idle_status message to speed up the deployment
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#47182
**Special notes for your reviewer**:
This adds additional state guards of the idle_status method, which will
prevent it from being run until a worker has joined the relationship.
Previous invocations may have some messaging inconsistencies but will reach
eventual consistency once a worker has joined.
This prevents the polling loop from executing too soon, bloating the
installation time by bare-minimum an additional 10 minutes.
**Release note**:
```release-note
Added state guards to the idle_status messaging in the kubernetes-master charm to make deployment faster on initial deployment.
```
Automatic merge from submit-queue
Bump up npd version to v0.4.0
Fixes#47070.
Bump up npd version to [v0.4.0](https://github.com/kubernetes/node-problem-detector/releases/tag/v0.4.0).
```release-note
Bump up Node Problem Detector version to v0.4.0, which added support of parsing log from /dev/kmsg and ABRT.
```
/cc @dchen1107 @ajitak
This adds additional state guardsof the idle_status method, which will
prevent it from being run until a worker has joined the relationship.
Previous invocations may have some message artifacting, but will reach
eventual consistency once a worker has joined.
This prevents the polling loop from executing too soon, bloating the
installation time by bare-minimum an additional 10 minutes.
Automatic merge from submit-queue (batch tested with PRs 46718, 46828, 46988)
Update docs/ links to point to main site
**What this PR does / why we need it**:
This updates various links to either point to kubernetes.io or to the kubernetes/community repo instead of the legacy docs/ tree in k/k
Pre-requisite for #46813
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
@kubernetes/sig-docs-maintainers @chenopis @ahmetb @thockin
Automatic merge from submit-queue (batch tested with PRs 46897, 46899, 46864, 46854, 46875)
Write audit policy file for GCE/GKE configuration
Setup the audit policy configuration for GCE & GKE. Here is the high level summary of the policy:
- Default logging everything at `Metadata`
- Known write APIs default to `RequestResponse`
- Known read-only APIs default to `Request`
- Except secrets & configmaps are logged at `Metadata`
- Don't log events
- Don't log `/version`, swagger or healthchecks
In addition to the above, I spent time analyzing the noisiest lines in the audit log from a cluster that soaked for 24 hours (and ran a batch of e2e tests). Of those top requests, those that were identified as low-risk (all read-only, except update kube-system endpoints by controllers) are dropped.
I suspect we'll want to tweak this a bit more once we've had a time to soak it on some real clusters.
For kubernetes/features#22
/cc @sttts @ericchiang
Automatic merge from submit-queue
Add event exporter deployment to the fluentd-gcp addon
Introduce event exporter deployment to the fluentd-gcp addon so that by default if logging to Stackdriver is enabled, events will be available there also.
In this release, event exporter is a non-critical pod in BestEffort QoS class to avoid preempting actual workload in tightly loaded clusters. It will become critical in one of the future releases.
```release-note
Stackdriver cluster logging now deploys a new component to export Kubernetes events.
```
Automatic merge from submit-queue (batch tested with PRs 46972, 42829, 46799, 46802, 46844)
promote tls-bootstrap to beta
last commit of this PR.
Towards https://github.com/kubernetes/kubernetes/issues/46999
```release-note
Promote kubelet tls bootstrap to beta. Add a non-experimental flag to use it and deprecate the old flag.
```
Automatic merge from submit-queue (batch tested with PRs 46734, 46810, 46759, 46259, 46771)
Add iptables lock-file mount to kube-proxy manifest
**What this PR does / why we need it**: kube-proxy is broken in make bazel-release. The new iptables binary uses a lockfile in "/run", but the directory doesn't exist. This causes iptables-restore to fail. We need to share the same lock-file amongst all containers, so mount the host /run dir.
This is similar to #46132 but expediency matters, since builds are broken.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#46103
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue
Respect PDBs during node upgrades and add test coverage to the ServiceTest upgrade test.
This is still a WIP... needs to be squashed at least, and I don't think it's currently passing until I increase the scale of the RC, but please have a look at the general outline. Thanks!
Fixes#38336
@kow3ns @bdbauer @krousey @erictune @maisem @davidopp
```
On GCE, node upgrades will now respect PodDisruptionBudgets, if present.
```