Commit Graph

5250 Commits (7a06e41f938639d1a13fb4a5f0e969882c9c33dd)

Author SHA1 Message Date
Kubernetes Submit Queue 409d7d0a91 Merge pull request #41326 from ncdc/ci-cache-mutation
Automatic merge from submit-queue (batch tested with PRs 41364, 40317, 41326, 41783, 41782)

Add ability to enable cache mutation detector in GCE

Add the ability to enable the cache mutation detector in GCE. The current default behavior (disabled) is retained.

When paired with https://github.com/kubernetes/test-infra/pull/1901, we'll be able to detect shared informer cache mutations in gce e2e PR jobs.
2017-02-21 07:45:42 -08:00
Kubernetes Submit Queue 70c9eebd21 Merge pull request #41739 from shyamjvs/hollow-node-logs
Automatic merge from submit-queue (batch tested with PRs 41706, 39063, 41330, 41739, 41576)

[Kubemark] Add option to log hollow-node logs

Ref https://github.com/kubernetes/kubernetes/issues/41613

Added an option to log kubemark hollow-node logs which includes kubelet, kubeproxy and npd logs for each hollow-node.
Setting the env var `ENABLE_HOLLOW_NODE_LOGS=true` should now enable logging for tests.

cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek @yujuhong @Random-Liu
2017-02-21 02:24:43 -08:00
Shyam Jeedigunta ed0ab3cd8e [Kubemark] Add option to log hollow-node logs 2017-02-20 11:52:49 +01:00
Kubernetes Submit Queue ff12e5688c Merge pull request #40206 from Random-Liu/add-standalone-npd
Automatic merge from submit-queue

Add standalone npd on GCI.

This PR added standalone NPD in GCE GCI cluster. I already verified the PR, and it should work.

/cc @dchen1107 @fabioy @andyxning @kubernetes/sig-node-misc
2017-02-18 02:00:20 -08:00
Kubernetes Submit Queue 4b3a097ecd Merge pull request #41525 from yujuhong/fix_output
Automatic merge from submit-queue

Fix the output of health-mointor.sh

The script show prints the errors/response of the health check, but not
show the progress of `curl`.
2017-02-17 16:57:29 -08:00
Random-Liu d40c0a7099 Add standalone npd on GCI. 2017-02-17 16:18:08 -08:00
Kubernetes Submit Queue 6d5b2ef49e Merge pull request #41080 from shyamjvs/etcd-version-monitor
Automatic merge from submit-queue

Added a basic monitor for providing etcd version related info

Fixes #41071 

This tool scrapes metrics partly from etcd's /version and /metrics endpoints and partly using etcdctl and exposes them as prometheus metrics at `http://localhost:9101/metrics` endpoint on the master. Here is a summary of the metrics it exposes (self-explanatory from the code):
-        etcdVersionFetchCount   = prometheus.NewCounterVec(
                prometheus.CounterOpts{
                        Namespace: "etcd",
                        Name: "version_info_fetch_count",
                        Help: "Number of times etcd's version info was fetched, labeled by etcd's server binary and cluster version",
                },
                []string{"serverversion", "clusterversion"})
-         etcdGRPCRequestsTotal   = prometheus.NewCounterVec(
                prometheus.CounterOpts{
                        Namespace: namespace,
                        Name: "grpc_requests_total",
                        Help: "Counter of received grpc requests, labeled by grpc method and grpc service names",
                },
                []string{"grpc_method", "grpc_service"})

For further info on how to run this as a binary/docker-container/kubernetes-pod and checking the metrics, have a look at the README.md file.

cc @fgrzadkowski @wojtek-t @piosz
2017-02-17 10:18:48 -08:00
Kubernetes Submit Queue 46cd8ec91b Merge pull request #41637 from wojtek-t/expose_storage_format_as_env
Automatic merge from submit-queue

Expose storage media type as env variable

Ref #40636

@mml
2017-02-17 08:15:27 -08:00
Andy Goldstein 688c19ec71 Allow cache mutation detector enablement by PRs
Allow cache mutation detector enablement by PRs in an attempt to find
mutations before they're merged in to the code base. It's just for the
apiserver and controller-manager for now. If/when the other components
start using a SharedInformerFactory, we should set them up just like
this as well.
2017-02-17 10:03:13 -05:00
Kubernetes Submit Queue 3b14667afe Merge pull request #41604 from shyamjvs/kubemark-num-nodes
Automatic merge from submit-queue

Reduce default value of kubemark's NUM_NODES to 10

Changing the default value of kubemark's NUM_NODES from 100 to 10, as it would then be possible to start kubemark on gce clusters that have been started using kube-up that uses the default config of three n1-standard-2 nodes. I've already been asked by a couple of people about why kubemark is not starting on their cluster because of this. More people shouldn't be facing this issue in future.

cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek
2017-02-17 06:49:21 -08:00
Wojciech Tyczynski 3695e85b34 Expose storage media type as env variable 2017-02-17 14:16:55 +01:00
Shyam Jeedigunta 7e6b8ac26b Added a basic monitor for watching etcd version and size related info 2017-02-17 12:52:54 +01:00
Shyam Jeedigunta 94d2ed5e34 Reduce default value of kubemark's NUM_NODES to 10 2017-02-16 23:35:39 +01:00
Kubernetes Submit Queue 30e8953fad Merge pull request #41564 from Crassirostris/fluentd-gcp-plugin-version-bump
Automatic merge from submit-queue

Bump fluentd-gcp google_cloud plugin version

Bump the version of `fluent-plugin-google-cloud` in fluentd-gcp image, because it's broken for version `0.5.2`.

Recently, gem `google-api-client` was updated to version `0.10.0`. The new version broke `fluent-plugin-google-cloud` which doesn't specify the upper version of `google-api-client` gem. I'm bumping the version used in our image to allow future changes in this release to be run and tested.

This PR doesn't bump the version, since no effective changes has happened, leaving this for the next PR to do.

CC @igorpeshansky
2017-02-16 09:20:12 -08:00
Mik Vyatskov e8de31623f Bump fluentd-gcp google_cloud plugin version 2017-02-16 16:49:16 +01:00
Kubernetes Submit Queue 627c6ce2b8 Merge pull request #41489 from Crassirostris/fluentd-add-toleration
Automatic merge from submit-queue (batch tested with PRs 40000, 41508, 41489)

Add toleration to fluentd daemonset to make it run on master

Because of https://github.com/kubernetes/kubernetes/pull/41172 fluentd pods stopped being allocated on master node.

This PR introduces toleration for master taint for fluentd.

CC @davidopp @janetkuo @kubernetes/sig-scheduling-bugs

Unfortunately, we don't have e2e tests to ensure that master logs are being ingested. This problem is a great signal to work on https://github.com/kubernetes/kubernetes/issues/41411
2017-02-16 01:52:08 -08:00
Kubernetes Submit Queue 5ff9a72ea0 Merge pull request #41508 from Crassirostris/fluentd-dns-problem-fix
Automatic merge from submit-queue (batch tested with PRs 40000, 41508, 41489)

Make fluentd use default dns instead of cluster dns to make it work o…

Fix https://github.com/kubernetes/kubernetes/issues/41415

Fluentd for Stackdriver requires external urls (e.g. `logging.googleapis.com`) to be available in order to work. If fluentd runs on master, it cannot access the service endpoint of cluster DNS. This change makes fluentd use default dns to fix this problem.

CC @thockin @bowei
2017-02-16 01:52:06 -08:00
Yu-Ju Hong d3e24e1085 Fix the output of health-mointor.sh
The script show prints the errors/response of the health check, but not
show the progress of `curl`.
2017-02-15 18:08:27 -08:00
Kubernetes Submit Queue 01393e34d6 Merge pull request #40722 from micmro/40721
Automatic merge from submit-queue (batch tested with PRs 41104, 41245, 40722, 41439, 41502)

openstack-heat: do not daemonize salt-minion

_openstack-heat_ does currently not setup a _salt-master_, so it is not necessary to  daemonize it.

**What this PR does / why we need it**:
as stated in #40721:

> The _openstack-heat_ provider only installs _salt-minions_, no _salt-master_. The configuration does not take this into account which causes the following issues:
> 
> - the _salt minion_ is not able to DNS resolve `salt` (see fist part of error log below)
> - the _salt-minion_ is daemonized and fails finding the master (second part of error log below). From my understanding is not required when there is no salt-master, as the setup uses `salt-call` 
> anyway (see [gce provider](https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/configure-vm.sh#L328-L339) as reference).
> 
> ```
> Jan 31 03:00:04 kube-stack-master salt-minion[9795]: [ERROR   ] DNS lookup of 'salt' failed.
> Jan 31 03:00:04 kube-stack-master salt-minion[9795]: [ERROR   ] Master hostname: 'salt' not found. Retrying in 30 seconds
> ...
> Jan 31 02:35:30 kube-stack-master salt-minion[9690]: [ERROR   ] Error while bringing up minion for multi-master. Is master at salt responding?
> ```
> 

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #40721

**Release note**:
```release-note
Do not daemonize `salt-minion` for the openstack-heat provider.
```
2017-02-15 16:28:09 -08:00
Kubernetes Submit Queue e62866444f Merge pull request #41245 from wojtek-t/rollback_2_2_1
Automatic merge from submit-queue (batch tested with PRs 41104, 41245, 40722, 41439, 41502)

Change the etcd rollback tool to do rollback to 2.2.1 version.

I did some tests of it and for my 3-node cluster with 1 deployment it worked fine.

But before merging this, we should probably do way more testing (we should rerun tests that @mml was doing for the previous script).

@lavalamp @xiang90
2017-02-15 16:28:08 -08:00
Kubernetes Submit Queue 1fc1e5efb5 Merge pull request #41395 from gmarek/owners
Automatic merge from submit-queue

Add gmarek and jszczepkowski to cluster/gce owners

cc @mikedanese @zmerlynn @roberthbailey
2017-02-15 12:45:39 -08:00
Mik Vyatskov f6730bd334 Make fluentd use default dns instead of cluster dns to make it work on master 2017-02-15 20:53:32 +01:00
Kubernetes Submit Queue 33aedca59d Merge pull request #41332 from jszczepkowski/etcd-cluster-state-16
Automatic merge from submit-queue

Added configurable etcd initial-cluster-state to kube-up script.

Added configurable etcd initial-cluster-state to kube-up script. This
allows creation of multi-master cluster from scratch. This is a
cherry-pick of #41320 from 1.5 branch.

```release-note
Added configurable etcd initial-cluster-state to kube-up script.
```
2017-02-15 10:04:31 -08:00
Mik Vyatskov cbba60cc7d Add toleration to fluentd daemonset to make it run on master 2017-02-15 18:11:45 +01:00
Kubernetes Submit Queue 5cc2f73bc9 Merge pull request #41134 from shyamjvs/refactor-final-blow
Automatic merge from submit-queue (batch tested with PRs 41134, 41410, 40177, 41049, 41313)

Refactored kubemark code into provider-specific and provider-independent parts [Part-3]

Fixes #38967
Applying final part of the changes in PR #39033 (which refactored kubemark code completely). The changes included in this PR are:

- Removed `test/kubemark/common.sh` and moved relevant parts of its code to the right places in start-kubemark/stop-kubemark scripts.
- Added DOCKER_REGISTRY, PROJECT, KUBEMARK_IMAGE_MAKE_TARGET variables to `/test/kubemark/cloud-provider-config.sh` to make the kubemark image push location variable wrt provider.
- Removed get-real-pod-for-hollow-node.sh as it doesn't seem to do anything useful.

@kubernetes/sig-scalability-misc @wojtek-t @gmarek
2017-02-15 05:58:15 -08:00
Kubernetes Submit Queue 80be6a259f Merge pull request #41430 from mikedanese/preserve-key
Automatic merge from submit-queue (batch tested with PRs 41360, 41423, 41430, 40647, 41352)

preserve kube-master-cert metadata over upgrades
2017-02-15 05:06:10 -08:00
Kubernetes Submit Queue 2fde8f8efe Merge pull request #41360 from enisoc/fluentd-audit-log
Automatic merge from submit-queue

fluentd-gcp: Add kube-apiserver-audit.log.

**What this PR does / why we need it**:

Add `kube-apiserver-audit.log` from https://github.com/kubernetes/kubernetes/pull/41211 to fluentd config, so the audit log gets sent to the same place as `kube-apiserver.log`.

**Which issue this PR fixes**:

**Special notes for your reviewer**:

We would like to backport this to release-1.5 also.

**Release note**:
```release-note
The apiserver audit log (`/var/log/kube-apiserver-audit.log`) will be sent through fluentd if enabled.
```
2017-02-15 05:01:54 -08:00
Kubernetes Submit Queue 0e86d98f99 Merge pull request #41449 from zmerlynn/container-vm-v20170214
Automatic merge from submit-queue (batch tested with PRs 41196, 41252, 41300, 39179, 41449)

Bump GCE ContainerVM to container-vm-v20170214

`container-vm-v20170214` is a re-build of the `docker-runc` in `container-vm-v20170201`, and should clear the GCE slow tests.

c.f. #40828

```release-note
Bump GCE ContainerVM to container-vm-v20170214 to address CVE-2016-9962.
```
2017-02-15 04:14:17 -08:00
Kubernetes Submit Queue 4c02f29196 Merge pull request #41211 from enisoc/configure-audit-log
Automatic merge from submit-queue (batch tested with PRs 40297, 41285, 41211, 41243, 39735)

cluster/gce: Add env var to enable apiserver basic audit log.

For now, this is focused on a fixed set of flags that makes the audit
log show up under /var/log/kube-apiserver-audit.log and behave similarly
to /var/log/kube-apiserver.log. Allowing other customization would
require significantly more complex changes.

Audit log rotation is handled the same as for `kube-apiserver.log`.

**What this PR does / why we need it**:

Add a knob to enable [basic audit logging](https://kubernetes.io/docs/admin/audit/) in GCE.

**Which issue this PR fixes**:

**Special notes for your reviewer**:

We would like to cherrypick/port this to release-1.5 also.

**Release note**:
```release-note
The kube-apiserver [basic audit log](https://kubernetes.io/docs/admin/audit/) can be enabled in GCE by exporting the environment variable `ENABLE_APISERVER_BASIC_AUDIT=true` before running `cluster/kube-up.sh`. This will log to `/var/log/kube-apiserver-audit.log` and use the same `logrotate` settings as `/var/log/kube-apiserver.log`.
```
2017-02-15 03:25:12 -08:00
Kubernetes Submit Queue e4a4fe4a89 Merge pull request #41285 from liggitt/kube-scheduler-role
Automatic merge from submit-queue (batch tested with PRs 40297, 41285, 41211, 41243, 39735)

Secure kube-scheduler

This PR:
* Adds a bootstrap `system:kube-scheduler` clusterrole
* Adds a bootstrap clusterrolebinding to the `system:kube-scheduler` user
* Sets up a kubeconfig for kube-scheduler on GCE (following the controller-manager pattern)
* Switches kube-scheduler to running with kubeconfig against secured port (salt changes, beware)
* Removes superuser permissions from kube-scheduler in local-up-cluster.sh
* Adds detailed RBAC deny logging

```release-note
On kube-up.sh clusters on GCE, kube-scheduler now contacts the API on the secured port.
```
2017-02-15 03:25:10 -08:00
Michael Mrowetz 11ed1a9565 #40721 openstack-heat: salt-minion not daemonize
openstack-heat does currently not setup a salt-master, so it is not necessary to  daemonize it.
2017-02-15 17:51:07 +09:00
Kubernetes Submit Queue 0a56830520 Merge pull request #41383 from liggitt/v1beta1-cleanup
Automatic merge from submit-queue

Update rbac data to v1beta1

Update RBAC fixtures to v1beta1
2017-02-14 22:35:05 -08:00
Jordan Liggitt cc11d7367a
Switch kube-scheduler to secure API access 2017-02-15 01:05:42 -05:00
Zach Loafman b7229ed565 Bump GCE ContainerVM to container-vm-v20170214
container-vm-v20170214 is a re-build of the docker-runc in
container-vm-v20170201, and should clear the GCE slow tests.

c.f. #40828
2017-02-14 16:36:02 -08:00
Anthony Yeh 7500746e7f cluster/gce: Add env var to enable apiserver basic audit log.
For now, this is focused on a fixed set of flags that makes the audit
log show up under /var/log/kube-apiserver-audit.log and behave similarly
to /var/log/kube-apiserver.log. Allowing other customization would
require significantly more complex changes.

Audit log rotation is handled externally by the wildcard /var/log/*.log
already configured in configure-helper.sh.
2017-02-14 15:18:10 -08:00
Anthony Yeh 257a8745e3 fluentd-gcp: Add kube-apiserver-audit.log. 2017-02-14 14:23:36 -08:00
Kubernetes Submit Queue a48284862c Merge pull request #41407 from Crassirostris/fluentd-gcp-sysmted-fix
Automatic merge from submit-queue (batch tested with PRs 41382, 41407, 41409, 41296, 39636)

Fix copying systemd libraries upon fluentd-gcp startup

Fix https://github.com/kubernetes/kubernetes/issues/40936
Revert https://github.com/kubernetes/kubernetes/pull/40017
2017-02-14 13:04:21 -08:00
Kubernetes Submit Queue 90e1977a1c Merge pull request #41325 from wojtek-t/fix_etcd_migrate
Automatic merge from submit-queue (batch tested with PRs 41299, 41325, 41386, 41329, 41418)

Migrate etcd data using correct etcd version in case of previous crash

Fix #41324
Fix #41323

@mml
2017-02-14 11:42:35 -08:00
Mike Danese e17e4e110e preserve kube-master-cert metadata over upgrades 2017-02-14 11:02:11 -08:00
gmarek e6e1d3066e Add gmarek and jszczepkowski to cluster/gce owners 2017-02-14 17:53:39 +01:00
Fabian Deutsch f6ee79b2ec addonManager: Add note about labeling
The cluster manager is only picking up addons if they are labeled correctly.
2017-02-14 15:43:47 +01:00
Mik Vyatskov a1ec542d7c Fix copying systemd libraries upon fluentd-gcp startup 2017-02-14 15:41:15 +01:00
Wojciech Tyczynski 1ce544db9e Migrate etcd data using correct etcd version in case of previous crash 2017-02-14 11:30:00 +01:00
Jordan Liggitt 9e6a3496b4
Update rbac data to v1beta1 2017-02-14 00:50:31 -05:00
Kubernetes Submit Queue 1f4e2efc5b Merge pull request #41184 from liggitt/subject-apigroup
Automatic merge from submit-queue (batch tested with PRs 41357, 41178, 41280, 41184, 41278)

Switch RBAC subject apiVersion to apiGroup in v1beta1

Referencing a subject from an RBAC role binding, the API group and kind of the subject is needed to fully-qualify the reference.

The version is not, and adds complexity around re-writing the reference when returning the binding from different versions of the API, and when reconciling subjects.

This PR:
* v1beta1: change the subject `apiVersion` field to `apiGroup` (to match roleRef)
* v1alpha1: convert apiVersion to apiGroup for backwards compatibility
* all versions: add defaulting for the three allowed subject kinds
* all versions: add validation to the field so we can count on the data in etcd being good until we decide to relax the apiGroup restriction

```release-note
RBAC `v1beta1` RoleBinding/ClusterRoleBinding subjects changed `apiVersion` to `apiGroup` to fully-qualify a subject. ServiceAccount subjects default to an apiGroup of `""`, User and Group subjects default to an apiGroup of `"rbac.authorization.k8s.io"`.
```

@deads2k @kubernetes/sig-auth-api-reviews @kubernetes/sig-auth-pr-reviews
2017-02-13 21:07:10 -08:00
Bowei Du da291a7beb Send only cluster domain queries to kube-dns
Note: all PTR request must still traverse kube-dns. We can restrict
this to just the clusterCIDR in the future to reduce the amount of
PTR traffic.
2017-02-13 13:27:09 -08:00
Jordan Liggitt 2a76fa1c8f
Switch RBAC subject apiVersion to apiGroup in v1beta1 2017-02-13 15:33:09 -05:00
Jerzy Szczepkowski 80e57b7016 Added configurable etcd initial-cluster-state to kube-up script.
Added configurable etcd initial-cluster-state to kube-up script. This
allows creation of multi-master cluster from scratch. This is a
cherry-pick of #41320 from 1.5 branch.
2017-02-13 16:10:47 +01:00
Kubernetes Submit Queue e80afed777 Merge pull request #41035 from vishh/fluentd-critical
Automatic merge from submit-queue

Make fluentd a critical pod

For #40573
Based on https://github.com/kubernetes/kubernetes/pull/40655#issuecomment-277790544

```release-note
If `experimentalCriticalPodAnnotation` feature gate is set to true, fluentd pods will not be evicted by the kubelet.
```
2017-02-13 05:10:19 -08:00
Kubernetes Submit Queue 19ddde6b4f Merge pull request #41182 from brendandburns/storage
Automatic merge from submit-queue (batch tested with PRs 41182, 41290)

Add a default storage class for Azure Disk

Part of https://github.com/kubernetes/kubernetes/issues/40071

@jsafrane @colemickens @codablock @rootfs
2017-02-11 23:19:36 -08:00