Automatic merge from submit-queue (batch tested with PRs 42058, 41160, 42065, 42076, 39338)
Bump up dns-horizontal-autoscaler to 1.1.1
cluster-proportional-autoscaler 1.1.1 is releasing by kubernetes-incubator/cluster-proportional-autoscaler#26, also bump it up for dns-horizontal-autoscaler to introduce below features:
- Add PreventSinglePointFailure option in linear mode.
- Use protobufs for communication with apiserver.
- Support switching control mode on-the-fly.
Note:
The new entry `"preventSinglePointFailure":true` ensures kube-dns to have at least 2 replicas when there is more than one node. Mitigate the issue mentioned in #40063.
@bowei @thockin
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Cleanup fluentd-gcp image, rebase on debian-base
**Why we need this PR**:
There are several problems with our current fluentd-gcp image:
- It pulls in lots of unused packages, which expose unnecessary risk and create noise in CVE scans (and scare customers). The most notable example is the fluent-ui, which pulls in rails.
- `curl | sh ` is not a good practice for a Dockerfile. First, the script is not checked in the same source control branch, so builds are not reproducible. Second, the actions it is taking are opaque. Third, in this case, using non-standard packages means they're harder to manage with CVE scans & upstream fixes.
**What is changed by this PR?**
- Rather than relying on td-agent (which includes fluent-ui), use standard upstream packages. This is largely based off the [official fluentd debian-based image](https://github.com/fluent/fluentd-docker-image/blob/master/v0.12/debian/Dockerfile).
- Rebases the image on debian-base (depends on https://github.com/kubernetes/kubernetes/pull/41915). We would like to move towards a single full-distro base image we can maintain. This change should be relatively minor.
As a result of these changes, the image size is reduced from 360.6 MB to 185.8 MB (nearly half). Many packages were removed, and the full diff (focus on the unversioned files) is listed here: 3fb704f977
**Which issue this PR fixes** https://github.com/kubernetes/kubernetes/issues/40248
**Special notes for your reviewer**:
This change both addresses security concerns, and is expected to greatly reduce the maintenance burden of the fluentd-gcp image. I'd *really* like to get this into 1.6, so please prioritize this review if possible.
I tested this by running the default e2e suite on a private e2e cluster using the new image. If there are other tests you'd like me to run, please let me know ASAP.
**Release note**:
```release-note
Cleanup fluentd-gcp image: rebase on debian-base, switch to upstream packages, remove fluent-ui & rails
```
Automatic merge from submit-queue (batch tested with PRs 41854, 41801, 40088, 41590, 41911)
Default storage class for vSphere Fixes#40070
**What this PR does / why we need it**:
Create default storage class for vSphere. This is part of the storage class GA effort https://github.com/kubernetes/features/issues/36
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
fixes#40070
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue
Base etcd-empty-dir-cleanup on busybox, run as nobody, and update to etcdctl 3.0.14
**What this PR does / why we need it**: since the `etcd-empty-dir-cleanup` image just uses a simple shell script and `etcdctl`, we can base it on busybox, which is a smaller target than alpine.
I've also updated this to use an `etcdctl` from etcd 3.0.14, which matches the version of etcd we're running in 1.6 clusters (I believe), and changed the tag to match the `etcdctl` version.
Tested in my own e2e cluster, where it seems to work.
I haven't pushed the image yet, so e2e tests *may* fail. Tagging `do-not-merge`; if you think this looks good, I'll push the image and retest.
**Release note**:
```release-note
```
cc @timstclair @mml @wojtek-t
Automatic merge from submit-queue
move kube-dns to a separate service account
Switches the kubedns addon to run as a separate service account so that we can subdivide RBAC permission for it. The RBAC permissions will need a little more refinement which I'm expecting to find in https://github.com/kubernetes/kubernetes/pull/38626 .
@cjcullen @kubernetes/sig-auth since this is directly related to enabling RBAC with subdivided permissions
@thockin @kubernetes/sig-network since this directly affects now kubedns is added.
```release-note
`kube-dns` now runs using a separate `system:serviceaccount:kube-system:kube-dns` service account which is automatically bound to the correct RBAC permissions.
```
Automatic merge from submit-queue (batch tested with PRs 39855, 41433, 41567, 41887, 41652)
Add fluentd monitoring to fluentd-gcp image
Right now we are not able to monitor the state of fluentd in cluster, which may result in logging subsystem quietly failing. This PR tries to address that problem by introducing the fluentd container monitoring:
* fluentd internal metrics, like number of buffers and number of data in buffers
* `logging_line_count`, number of lines, read by fluentd from application containers' logs
* Has `tag` label, corresponding to the fluentd tag of the entry
* `logging_entry_count`, number of entries, emitted to the output plugin
* With label `component` set to `container`, generated by application containers
* With label `component` set to `system`, generated by system components like kubelet, docker, scheduler, etc.
* Has `tag` label, corresponding to the fluentd tag of the entry
CC @fabxc @igorpeshansky @edsiper
Automatic merge from submit-queue (batch tested with PRs 41797, 41793, 41795, 41807, 41781)
Turn fluentd supervisor off for fluentd-gcp
By default, turn fluentd supervisor off so that when fluentd process fails, for example due to OOM, container fails completely and it would be easy to detect.
CC @igorpeshansky @qingling128
Automatic merge from submit-queue (batch tested with PRs 41349, 41532, 41256, 41587, 41657)
Update kubectl in addon-manager to use HPA in autoscaling/v1
Addon-manager is broken since HPA objects were removed from extensions api group.
Came across the logs from [the latest addon-manager on Jenkins](https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce/4290/artifacts/bootstrap-e2e-master/kube-addon-manager.log):
```
INFO: == Entering periodical apply loop at 2017-02-16T17:33:37+0000 ==
error: error pruning namespaced object extensions/v1beta1, Kind=HorizontalPodAutoscaler: the server could not find the requested resource
WRN: == Failed to execute /usr/local/bin/kubectl apply --namespace=kube-system -f /etc/kubernetes/addons --prune=true -l kubernetes.io/cluster-service=true --recursive >/dev/null at 2017-02-16T17:33:38+0000. 2 tries remaining. ==
error: error pruning namespaced object extensions/v1beta1, Kind=HorizontalPodAutoscaler: the server could not find the requested resource
WRN: == Failed to execute /usr/local/bin/kubectl apply --namespace=kube-system -f /etc/kubernetes/addons --prune=true -l kubernetes.io/cluster-service=true --recursive >/dev/null at 2017-02-16T17:33:46+0000. 1 tries remaining. ==
error: error pruning namespaced object extensions/v1beta1, Kind=HorizontalPodAutoscaler: the server could not find the requested resource
WRN: == Failed to execute /usr/local/bin/kubectl apply --namespace=kube-system -f /etc/kubernetes/addons --prune=true -l kubernetes.io/cluster-service=true --recursive >/dev/null at 2017-02-16T17:33:53+0000. 0 tries remaining. ==
WRN: == Kubernetes addon update completed with errors at 2017-02-16T17:33:58+0000 ==
```
And notice this commit (f66679a4e9) came in two weeks ago, which removed HorizontalPodAutoscaler from extensions/v1beta1.
Addon-manager is now partially functioning that it could successfully create and update addons, but will fail to prune objects, which means upgrade tests may mostly fail.
Pushed another version of addon-manager with kubectl v1.6.0-alpha.2 ([release 2 days ago](https://github.com/kubernetes/kubernetes/releases/tag/v1.6.0-alpha.2)) for fixing, including below images:
- gcr.io/google-containers/kube-addon-manager:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-amd64:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-arm:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-arm64:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-ppc64le:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-s390x:v6.4-alpha.2
@mikedanese
cc @wojtek-t @shyamjvs
Automatic merge from submit-queue
Bump fluentd-gcp google_cloud plugin version
Bump the version of `fluent-plugin-google-cloud` in fluentd-gcp image, because it's broken for version `0.5.2`.
Recently, gem `google-api-client` was updated to version `0.10.0`. The new version broke `fluent-plugin-google-cloud` which doesn't specify the upper version of `google-api-client` gem. I'm bumping the version used in our image to allow future changes in this release to be run and tested.
This PR doesn't bump the version, since no effective changes has happened, leaving this for the next PR to do.
CC @igorpeshansky
Automatic merge from submit-queue (batch tested with PRs 40000, 41508, 41489)
Add toleration to fluentd daemonset to make it run on master
Because of https://github.com/kubernetes/kubernetes/pull/41172 fluentd pods stopped being allocated on master node.
This PR introduces toleration for master taint for fluentd.
CC @davidopp @janetkuo @kubernetes/sig-scheduling-bugs
Unfortunately, we don't have e2e tests to ensure that master logs are being ingested. This problem is a great signal to work on https://github.com/kubernetes/kubernetes/issues/41411
Automatic merge from submit-queue
fluentd-gcp: Add kube-apiserver-audit.log.
**What this PR does / why we need it**:
Add `kube-apiserver-audit.log` from https://github.com/kubernetes/kubernetes/pull/41211 to fluentd config, so the audit log gets sent to the same place as `kube-apiserver.log`.
**Which issue this PR fixes**:
**Special notes for your reviewer**:
We would like to backport this to release-1.5 also.
**Release note**:
```release-note
The apiserver audit log (`/var/log/kube-apiserver-audit.log`) will be sent through fluentd if enabled.
```
Automatic merge from submit-queue (batch tested with PRs 41357, 41178, 41280, 41184, 41278)
Switch RBAC subject apiVersion to apiGroup in v1beta1
Referencing a subject from an RBAC role binding, the API group and kind of the subject is needed to fully-qualify the reference.
The version is not, and adds complexity around re-writing the reference when returning the binding from different versions of the API, and when reconciling subjects.
This PR:
* v1beta1: change the subject `apiVersion` field to `apiGroup` (to match roleRef)
* v1alpha1: convert apiVersion to apiGroup for backwards compatibility
* all versions: add defaulting for the three allowed subject kinds
* all versions: add validation to the field so we can count on the data in etcd being good until we decide to relax the apiGroup restriction
```release-note
RBAC `v1beta1` RoleBinding/ClusterRoleBinding subjects changed `apiVersion` to `apiGroup` to fully-qualify a subject. ServiceAccount subjects default to an apiGroup of `""`, User and Group subjects default to an apiGroup of `"rbac.authorization.k8s.io"`.
```
@deads2k @kubernetes/sig-auth-api-reviews @kubernetes/sig-auth-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 41182, 41290)
Add a default storage class for Azure Disk
Part of https://github.com/kubernetes/kubernetes/issues/40071
@jsafrane @colemickens @codablock @rootfs