Automatic merge from submit-queue (batch tested with PRs 41364, 40317, 41326, 41783, 41782)
Add ability to enable cache mutation detector in GCE
Add the ability to enable the cache mutation detector in GCE. The current default behavior (disabled) is retained.
When paired with https://github.com/kubernetes/test-infra/pull/1901, we'll be able to detect shared informer cache mutations in gce e2e PR jobs.
Automatic merge from submit-queue (batch tested with PRs 41706, 39063, 41330, 41739, 41576)
[Kubemark] Add option to log hollow-node logs
Ref https://github.com/kubernetes/kubernetes/issues/41613
Added an option to log kubemark hollow-node logs which includes kubelet, kubeproxy and npd logs for each hollow-node.
Setting the env var `ENABLE_HOLLOW_NODE_LOGS=true` should now enable logging for tests.
cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek @yujuhong @Random-Liu
Automatic merge from submit-queue
Add standalone npd on GCI.
This PR added standalone NPD in GCE GCI cluster. I already verified the PR, and it should work.
/cc @dchen1107 @fabioy @andyxning @kubernetes/sig-node-misc
Automatic merge from submit-queue
Fix the output of health-mointor.sh
The script show prints the errors/response of the health check, but not
show the progress of `curl`.
Automatic merge from submit-queue
Added a basic monitor for providing etcd version related info
Fixes#41071
This tool scrapes metrics partly from etcd's /version and /metrics endpoints and partly using etcdctl and exposes them as prometheus metrics at `http://localhost:9101/metrics` endpoint on the master. Here is a summary of the metrics it exposes (self-explanatory from the code):
- etcdVersionFetchCount = prometheus.NewCounterVec(
prometheus.CounterOpts{
Namespace: "etcd",
Name: "version_info_fetch_count",
Help: "Number of times etcd's version info was fetched, labeled by etcd's server binary and cluster version",
},
[]string{"serverversion", "clusterversion"})
- etcdGRPCRequestsTotal = prometheus.NewCounterVec(
prometheus.CounterOpts{
Namespace: namespace,
Name: "grpc_requests_total",
Help: "Counter of received grpc requests, labeled by grpc method and grpc service names",
},
[]string{"grpc_method", "grpc_service"})
For further info on how to run this as a binary/docker-container/kubernetes-pod and checking the metrics, have a look at the README.md file.
cc @fgrzadkowski @wojtek-t @piosz
Allow cache mutation detector enablement by PRs in an attempt to find
mutations before they're merged in to the code base. It's just for the
apiserver and controller-manager for now. If/when the other components
start using a SharedInformerFactory, we should set them up just like
this as well.
Automatic merge from submit-queue
Reduce default value of kubemark's NUM_NODES to 10
Changing the default value of kubemark's NUM_NODES from 100 to 10, as it would then be possible to start kubemark on gce clusters that have been started using kube-up that uses the default config of three n1-standard-2 nodes. I've already been asked by a couple of people about why kubemark is not starting on their cluster because of this. More people shouldn't be facing this issue in future.
cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek
Automatic merge from submit-queue
Bump fluentd-gcp google_cloud plugin version
Bump the version of `fluent-plugin-google-cloud` in fluentd-gcp image, because it's broken for version `0.5.2`.
Recently, gem `google-api-client` was updated to version `0.10.0`. The new version broke `fluent-plugin-google-cloud` which doesn't specify the upper version of `google-api-client` gem. I'm bumping the version used in our image to allow future changes in this release to be run and tested.
This PR doesn't bump the version, since no effective changes has happened, leaving this for the next PR to do.
CC @igorpeshansky
Automatic merge from submit-queue (batch tested with PRs 40000, 41508, 41489)
Add toleration to fluentd daemonset to make it run on master
Because of https://github.com/kubernetes/kubernetes/pull/41172 fluentd pods stopped being allocated on master node.
This PR introduces toleration for master taint for fluentd.
CC @davidopp @janetkuo @kubernetes/sig-scheduling-bugs
Unfortunately, we don't have e2e tests to ensure that master logs are being ingested. This problem is a great signal to work on https://github.com/kubernetes/kubernetes/issues/41411
Automatic merge from submit-queue (batch tested with PRs 40000, 41508, 41489)
Make fluentd use default dns instead of cluster dns to make it work o…
Fix https://github.com/kubernetes/kubernetes/issues/41415
Fluentd for Stackdriver requires external urls (e.g. `logging.googleapis.com`) to be available in order to work. If fluentd runs on master, it cannot access the service endpoint of cluster DNS. This change makes fluentd use default dns to fix this problem.
CC @thockin @bowei
Automatic merge from submit-queue (batch tested with PRs 41104, 41245, 40722, 41439, 41502)
openstack-heat: do not daemonize salt-minion
_openstack-heat_ does currently not setup a _salt-master_, so it is not necessary to daemonize it.
**What this PR does / why we need it**:
as stated in #40721:
> The _openstack-heat_ provider only installs _salt-minions_, no _salt-master_. The configuration does not take this into account which causes the following issues:
>
> - the _salt minion_ is not able to DNS resolve `salt` (see fist part of error log below)
> - the _salt-minion_ is daemonized and fails finding the master (second part of error log below). From my understanding is not required when there is no salt-master, as the setup uses `salt-call`
> anyway (see [gce provider](https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/configure-vm.sh#L328-L339) as reference).
>
> ```
> Jan 31 03:00:04 kube-stack-master salt-minion[9795]: [ERROR ] DNS lookup of 'salt' failed.
> Jan 31 03:00:04 kube-stack-master salt-minion[9795]: [ERROR ] Master hostname: 'salt' not found. Retrying in 30 seconds
> ...
> Jan 31 02:35:30 kube-stack-master salt-minion[9690]: [ERROR ] Error while bringing up minion for multi-master. Is master at salt responding?
> ```
>
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#40721
**Release note**:
```release-note
Do not daemonize `salt-minion` for the openstack-heat provider.
```
Automatic merge from submit-queue (batch tested with PRs 41104, 41245, 40722, 41439, 41502)
Change the etcd rollback tool to do rollback to 2.2.1 version.
I did some tests of it and for my 3-node cluster with 1 deployment it worked fine.
But before merging this, we should probably do way more testing (we should rerun tests that @mml was doing for the previous script).
@lavalamp @xiang90
Automatic merge from submit-queue
Added configurable etcd initial-cluster-state to kube-up script.
Added configurable etcd initial-cluster-state to kube-up script. This
allows creation of multi-master cluster from scratch. This is a
cherry-pick of #41320 from 1.5 branch.
```release-note
Added configurable etcd initial-cluster-state to kube-up script.
```
Automatic merge from submit-queue (batch tested with PRs 41134, 41410, 40177, 41049, 41313)
Refactored kubemark code into provider-specific and provider-independent parts [Part-3]
Fixes#38967
Applying final part of the changes in PR #39033 (which refactored kubemark code completely). The changes included in this PR are:
- Removed `test/kubemark/common.sh` and moved relevant parts of its code to the right places in start-kubemark/stop-kubemark scripts.
- Added DOCKER_REGISTRY, PROJECT, KUBEMARK_IMAGE_MAKE_TARGET variables to `/test/kubemark/cloud-provider-config.sh` to make the kubemark image push location variable wrt provider.
- Removed get-real-pod-for-hollow-node.sh as it doesn't seem to do anything useful.
@kubernetes/sig-scalability-misc @wojtek-t @gmarek
Automatic merge from submit-queue
fluentd-gcp: Add kube-apiserver-audit.log.
**What this PR does / why we need it**:
Add `kube-apiserver-audit.log` from https://github.com/kubernetes/kubernetes/pull/41211 to fluentd config, so the audit log gets sent to the same place as `kube-apiserver.log`.
**Which issue this PR fixes**:
**Special notes for your reviewer**:
We would like to backport this to release-1.5 also.
**Release note**:
```release-note
The apiserver audit log (`/var/log/kube-apiserver-audit.log`) will be sent through fluentd if enabled.
```
Automatic merge from submit-queue (batch tested with PRs 41196, 41252, 41300, 39179, 41449)
Bump GCE ContainerVM to container-vm-v20170214
`container-vm-v20170214` is a re-build of the `docker-runc` in `container-vm-v20170201`, and should clear the GCE slow tests.
c.f. #40828
```release-note
Bump GCE ContainerVM to container-vm-v20170214 to address CVE-2016-9962.
```
Automatic merge from submit-queue (batch tested with PRs 40297, 41285, 41211, 41243, 39735)
cluster/gce: Add env var to enable apiserver basic audit log.
For now, this is focused on a fixed set of flags that makes the audit
log show up under /var/log/kube-apiserver-audit.log and behave similarly
to /var/log/kube-apiserver.log. Allowing other customization would
require significantly more complex changes.
Audit log rotation is handled the same as for `kube-apiserver.log`.
**What this PR does / why we need it**:
Add a knob to enable [basic audit logging](https://kubernetes.io/docs/admin/audit/) in GCE.
**Which issue this PR fixes**:
**Special notes for your reviewer**:
We would like to cherrypick/port this to release-1.5 also.
**Release note**:
```release-note
The kube-apiserver [basic audit log](https://kubernetes.io/docs/admin/audit/) can be enabled in GCE by exporting the environment variable `ENABLE_APISERVER_BASIC_AUDIT=true` before running `cluster/kube-up.sh`. This will log to `/var/log/kube-apiserver-audit.log` and use the same `logrotate` settings as `/var/log/kube-apiserver.log`.
```
Automatic merge from submit-queue (batch tested with PRs 40297, 41285, 41211, 41243, 39735)
Secure kube-scheduler
This PR:
* Adds a bootstrap `system:kube-scheduler` clusterrole
* Adds a bootstrap clusterrolebinding to the `system:kube-scheduler` user
* Sets up a kubeconfig for kube-scheduler on GCE (following the controller-manager pattern)
* Switches kube-scheduler to running with kubeconfig against secured port (salt changes, beware)
* Removes superuser permissions from kube-scheduler in local-up-cluster.sh
* Adds detailed RBAC deny logging
```release-note
On kube-up.sh clusters on GCE, kube-scheduler now contacts the API on the secured port.
```
For now, this is focused on a fixed set of flags that makes the audit
log show up under /var/log/kube-apiserver-audit.log and behave similarly
to /var/log/kube-apiserver.log. Allowing other customization would
require significantly more complex changes.
Audit log rotation is handled externally by the wildcard /var/log/*.log
already configured in configure-helper.sh.
Automatic merge from submit-queue (batch tested with PRs 41299, 41325, 41386, 41329, 41418)
Migrate etcd data using correct etcd version in case of previous crash
Fix#41324Fix#41323
@mml
Automatic merge from submit-queue (batch tested with PRs 41357, 41178, 41280, 41184, 41278)
Switch RBAC subject apiVersion to apiGroup in v1beta1
Referencing a subject from an RBAC role binding, the API group and kind of the subject is needed to fully-qualify the reference.
The version is not, and adds complexity around re-writing the reference when returning the binding from different versions of the API, and when reconciling subjects.
This PR:
* v1beta1: change the subject `apiVersion` field to `apiGroup` (to match roleRef)
* v1alpha1: convert apiVersion to apiGroup for backwards compatibility
* all versions: add defaulting for the three allowed subject kinds
* all versions: add validation to the field so we can count on the data in etcd being good until we decide to relax the apiGroup restriction
```release-note
RBAC `v1beta1` RoleBinding/ClusterRoleBinding subjects changed `apiVersion` to `apiGroup` to fully-qualify a subject. ServiceAccount subjects default to an apiGroup of `""`, User and Group subjects default to an apiGroup of `"rbac.authorization.k8s.io"`.
```
@deads2k @kubernetes/sig-auth-api-reviews @kubernetes/sig-auth-pr-reviews
Added configurable etcd initial-cluster-state to kube-up script. This
allows creation of multi-master cluster from scratch. This is a
cherry-pick of #41320 from 1.5 branch.
Automatic merge from submit-queue (batch tested with PRs 41182, 41290)
Add a default storage class for Azure Disk
Part of https://github.com/kubernetes/kubernetes/issues/40071
@jsafrane @colemickens @codablock @rootfs