Commit Graph

1165 Commits (9a1f0574a4ad5813410b445330d7240cf266b322)

Author SHA1 Message Date
Kubernetes Submit Queue 197bd532a2 Merge pull request #41700 from vishh/kube-proxy-oom-score
Automatic merge from submit-queue

Protect kubeproxy deployed via kube-up from system OOMs

This change is necessary until it can be moved to Guaranteed QoS Class.

For #40573
2017-02-25 07:07:01 -08:00
Kubernetes Submit Queue dbf5a40965 Merge pull request #41911 from ixdy/bump-rescheduler
Automatic merge from submit-queue (batch tested with PRs 41854, 41801, 40088, 41590, 41911)

Bump gcr.io/google-containers/rescheduler to v0.2.2

**What this PR does / why we need it**: updates the rescheduler image to one based on busybox instead of ubuntu-slim. Changes for the image were in https://github.com/kubernetes/contrib/pull/2390.

Do you think this merits a release note? I'm leaning towards no.

**Release note**:

```release-note
Update gcr.io/google-containers/rescheduler to v0.2.2, which uses busybox as a base image instead of ubuntu.
```

cc @timstclair
2017-02-25 05:02:58 -08:00
Zihong Zheng 64ba52ae71 Bumps addon-manager to v6.4-alpha.3 and updates template files 2017-02-24 16:52:31 -08:00
Kubernetes Submit Queue e70d23db2a Merge pull request #41667 from mikedanese/certs
Automatic merge from submit-queue (batch tested with PRs 41667, 41820, 40910, 41645, 41361)

refactor certs in GCE to break up usages

TODO: debian
2017-02-23 20:57:27 -08:00
Mike Danese 192392bddd refactor certs in GCE 2017-02-23 10:12:31 -08:00
Wojciech Tyczynski b70e392161 Update clusters to use 3.0.17 etcd 2017-02-23 10:08:50 +01:00
Kubernetes Submit Queue e64835683b Merge pull request #41795 from Crassirostris/fluentd-gcp-turn-supervisor-off
Automatic merge from submit-queue (batch tested with PRs 41797, 41793, 41795, 41807, 41781)

Turn fluentd supervisor off for fluentd-gcp

By default, turn fluentd supervisor off so that when fluentd process fails, for example due to OOM, container fails completely and it would be easy to detect.

CC @igorpeshansky @qingling128
2017-02-22 22:06:33 -08:00
Jeff Grafton 1e7b589977 Bump gcr.io/google-containers/rescheduler to v0.2.2 2017-02-22 10:42:16 -08:00
Kubernetes Submit Queue fe34705f8a Merge pull request #41587 from MrHohn/addon-manager-fix-hpa
Automatic merge from submit-queue (batch tested with PRs 41349, 41532, 41256, 41587, 41657)

Update kubectl in addon-manager to use HPA in autoscaling/v1

Addon-manager is broken since HPA objects were removed from extensions api group.

Came across the logs from [the latest addon-manager on Jenkins](https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce/4290/artifacts/bootstrap-e2e-master/kube-addon-manager.log):
```
INFO: == Entering periodical apply loop at 2017-02-16T17:33:37+0000 ==
error: error pruning namespaced object extensions/v1beta1, Kind=HorizontalPodAutoscaler: the server could not find the requested resource
WRN: == Failed to execute /usr/local/bin/kubectl  apply --namespace=kube-system -f /etc/kubernetes/addons     --prune=true -l kubernetes.io/cluster-service=true --recursive >/dev/null at 2017-02-16T17:33:38+0000. 2 tries remaining. ==
error: error pruning namespaced object extensions/v1beta1, Kind=HorizontalPodAutoscaler: the server could not find the requested resource
WRN: == Failed to execute /usr/local/bin/kubectl  apply --namespace=kube-system -f /etc/kubernetes/addons     --prune=true -l kubernetes.io/cluster-service=true --recursive >/dev/null at 2017-02-16T17:33:46+0000. 1 tries remaining. ==
error: error pruning namespaced object extensions/v1beta1, Kind=HorizontalPodAutoscaler: the server could not find the requested resource
WRN: == Failed to execute /usr/local/bin/kubectl  apply --namespace=kube-system -f /etc/kubernetes/addons     --prune=true -l kubernetes.io/cluster-service=true --recursive >/dev/null at 2017-02-16T17:33:53+0000. 0 tries remaining. ==
WRN: == Kubernetes addon update completed with errors at 2017-02-16T17:33:58+0000 ==
```

And notice this commit (f66679a4e9) came in two weeks ago, which removed HorizontalPodAutoscaler from extensions/v1beta1.

Addon-manager is now partially functioning that it could successfully create and update addons, but will fail to prune objects, which means upgrade tests may mostly fail.

Pushed another version of addon-manager with kubectl v1.6.0-alpha.2 ([release 2 days ago](https://github.com/kubernetes/kubernetes/releases/tag/v1.6.0-alpha.2)) for fixing, including below images:
- gcr.io/google-containers/kube-addon-manager:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-amd64:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-arm:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-arm64:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-ppc64le:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-s390x:v6.4-alpha.2

@mikedanese 

cc @wojtek-t @shyamjvs
2017-02-22 08:12:46 -08:00
Kubernetes Submit Queue 409d7d0a91 Merge pull request #41326 from ncdc/ci-cache-mutation
Automatic merge from submit-queue (batch tested with PRs 41364, 40317, 41326, 41783, 41782)

Add ability to enable cache mutation detector in GCE

Add the ability to enable the cache mutation detector in GCE. The current default behavior (disabled) is retained.

When paired with https://github.com/kubernetes/test-infra/pull/1901, we'll be able to detect shared informer cache mutations in gce e2e PR jobs.
2017-02-21 07:45:42 -08:00
Mik Vyatskov 5d59d4d27b Turn fluentd supervisor off for fluentd-gcp 2017-02-21 13:50:47 +01:00
Zihong Zheng 2c8e89820a Update kubectl in addon-manager to use HPA in autoscaling/v1 instead of extensions/v1beta1 2017-02-20 10:49:10 -08:00
Vishnu kannan 6438efeeda protect kubeproxy from system OOMs until it can be moved to Guaranteed QoS Class
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-02-18 18:46:50 -08:00
Random-Liu d40c0a7099 Add standalone npd on GCI. 2017-02-17 16:18:08 -08:00
Andy Goldstein 688c19ec71 Allow cache mutation detector enablement by PRs
Allow cache mutation detector enablement by PRs in an attempt to find
mutations before they're merged in to the code base. It's just for the
apiserver and controller-manager for now. If/when the other components
start using a SharedInformerFactory, we should set them up just like
this as well.
2017-02-17 10:03:13 -05:00
Wojciech Tyczynski 3695e85b34 Expose storage media type as env variable 2017-02-17 14:16:55 +01:00
Kubernetes Submit Queue 33aedca59d Merge pull request #41332 from jszczepkowski/etcd-cluster-state-16
Automatic merge from submit-queue

Added configurable etcd initial-cluster-state to kube-up script.

Added configurable etcd initial-cluster-state to kube-up script. This
allows creation of multi-master cluster from scratch. This is a
cherry-pick of #41320 from 1.5 branch.

```release-note
Added configurable etcd initial-cluster-state to kube-up script.
```
2017-02-15 10:04:31 -08:00
Kubernetes Submit Queue 2fde8f8efe Merge pull request #41360 from enisoc/fluentd-audit-log
Automatic merge from submit-queue

fluentd-gcp: Add kube-apiserver-audit.log.

**What this PR does / why we need it**:

Add `kube-apiserver-audit.log` from https://github.com/kubernetes/kubernetes/pull/41211 to fluentd config, so the audit log gets sent to the same place as `kube-apiserver.log`.

**Which issue this PR fixes**:

**Special notes for your reviewer**:

We would like to backport this to release-1.5 also.

**Release note**:
```release-note
The apiserver audit log (`/var/log/kube-apiserver-audit.log`) will be sent through fluentd if enabled.
```
2017-02-15 05:01:54 -08:00
Kubernetes Submit Queue 4c02f29196 Merge pull request #41211 from enisoc/configure-audit-log
Automatic merge from submit-queue (batch tested with PRs 40297, 41285, 41211, 41243, 39735)

cluster/gce: Add env var to enable apiserver basic audit log.

For now, this is focused on a fixed set of flags that makes the audit
log show up under /var/log/kube-apiserver-audit.log and behave similarly
to /var/log/kube-apiserver.log. Allowing other customization would
require significantly more complex changes.

Audit log rotation is handled the same as for `kube-apiserver.log`.

**What this PR does / why we need it**:

Add a knob to enable [basic audit logging](https://kubernetes.io/docs/admin/audit/) in GCE.

**Which issue this PR fixes**:

**Special notes for your reviewer**:

We would like to cherrypick/port this to release-1.5 also.

**Release note**:
```release-note
The kube-apiserver [basic audit log](https://kubernetes.io/docs/admin/audit/) can be enabled in GCE by exporting the environment variable `ENABLE_APISERVER_BASIC_AUDIT=true` before running `cluster/kube-up.sh`. This will log to `/var/log/kube-apiserver-audit.log` and use the same `logrotate` settings as `/var/log/kube-apiserver.log`.
```
2017-02-15 03:25:12 -08:00
Jordan Liggitt cc11d7367a
Switch kube-scheduler to secure API access 2017-02-15 01:05:42 -05:00
Anthony Yeh 7500746e7f cluster/gce: Add env var to enable apiserver basic audit log.
For now, this is focused on a fixed set of flags that makes the audit
log show up under /var/log/kube-apiserver-audit.log and behave similarly
to /var/log/kube-apiserver.log. Allowing other customization would
require significantly more complex changes.

Audit log rotation is handled externally by the wildcard /var/log/*.log
already configured in configure-helper.sh.
2017-02-14 15:18:10 -08:00
Anthony Yeh 257a8745e3 fluentd-gcp: Add kube-apiserver-audit.log. 2017-02-14 14:23:36 -08:00
Mik Vyatskov a1ec542d7c Fix copying systemd libraries upon fluentd-gcp startup 2017-02-14 15:41:15 +01:00
Jerzy Szczepkowski 80e57b7016 Added configurable etcd initial-cluster-state to kube-up script.
Added configurable etcd initial-cluster-state to kube-up script. This
allows creation of multi-master cluster from scratch. This is a
cherry-pick of #41320 from 1.5 branch.
2017-02-13 16:10:47 +01:00
Kubernetes Submit Queue e80afed777 Merge pull request #41035 from vishh/fluentd-critical
Automatic merge from submit-queue

Make fluentd a critical pod

For #40573
Based on https://github.com/kubernetes/kubernetes/pull/40655#issuecomment-277790544

```release-note
If `experimentalCriticalPodAnnotation` feature gate is set to true, fluentd pods will not be evicted by the kubelet.
```
2017-02-13 05:10:19 -08:00
Kubernetes Submit Queue 3b7440ca9a Merge pull request #41207 from mikedanese/rerevert
Automatic merge from submit-queue (batch tested with PRs 41223, 40892, 41220, 41207, 41242)

reenable kubelet auth

revert #41132 

This reverts commit fd56078298, reversing
changes made to d953402cdf.
2017-02-10 13:35:45 -08:00
Kubernetes Submit Queue 3f25bbcd17 Merge pull request #41037 from bprashanth/glbc_version
Automatic merge from submit-queue (batch tested with PRs 41037, 40118, 40959, 41084, 41092)

Bump up GLBC version from 0.9.0-beta to 0.9.1

Tests have been green, moving the beta to a release.
2017-02-09 16:44:38 -08:00
Mike Danese c8ce55fef4 Revert "Merge pull request #41132 from kubernetes/revert-40893-kubelet-auth"
This reverts commit fd56078298, reversing
changes made to d953402cdf.
2017-02-09 15:55:12 -08:00
Kubernetes Submit Queue b7772e4f89 Merge pull request #40048 from mtaufen/remove-deprecated-flags
Automatic merge from submit-queue (batch tested with PRs 41121, 40048, 40502, 41136, 40759)

Remove deprecated kubelet flags that look safe to remove

Removes:
```
--config
--auth-path
--resource-container
--system-container
```
which have all been marked deprecated since at least 1.4 and look safe to remove.

```release-note
The deprecated flags --config, --auth-path, --resource-container, and --system-container were removed.
```
2017-02-09 14:27:45 -08:00
bprashanth 906b16d8d6 Bump up GLBC version from 0.9.0-beta to 0.9.1 2017-02-09 11:33:45 -08:00
Wojciech Tyczynski 63531e56c5 Default TARGET_STORAGE to etcd3 in etcd manifest 2017-02-08 10:40:24 +01:00
Michael Taufen 982df56c52 Replace uses of --config with --pod-manifest-path 2017-02-07 14:32:37 -08:00
Kubernetes Submit Queue 5034d96bfb Merge pull request #40861 from lucab/to-k8s/bump-test-images
Automatic merge from submit-queue (batch tested with PRs 40345, 38183, 40236, 40861, 40900)

test: bump mounttest and mounttest-users images

This PR bumps two test images to latest versions:
 * mounttest to 0.8
 * mounttest-user to 0.5

It is a followup to https://github.com/kubernetes/kubernetes/pull/40613 and https://github.com/kubernetes/kubernetes/pull/40821.
2017-02-07 11:33:44 -08:00
Vishnu kannan 10e7902a12 make fluentd a critical pod
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-02-06 12:16:32 -08:00
Luca Bruno 85b1def175
test: update to use mounttest:0.8 and mounttest-user:0.5 2017-02-02 20:41:18 +00:00
Vishnu Kannan f85bbcb78d update kube proxy critical pod annotation comments to reflect reality
Signed-off-by: Vishnu Kannan <vishnuk@google.com>
2017-02-02 10:41:24 -08:00
Kubernetes Submit Queue 0477100f98 Merge pull request #33684 from fraenkel/port_forward_ws
Automatic merge from submit-queue

Add websocket support for port forwarding

#32880

**Release note**:
```release-note
Port forwarding can forward over websockets or SPDY.
```
2017-02-01 23:19:02 -08:00
Zihong Zheng c91d605124 Bumps addon-manager to v6.4-alpha.1 for supporting optional ConfigMap 2017-02-01 09:22:43 -08:00
Michael Fraenkel beb53fb71a Port forward over websockets
- split out port forwarding into its own package

Allow multiple port forwarding ports
- Make it easy to determine which port is tied to which channel
- odd channels are for data
- even channels are for errors

- allow comma separated ports to specify multiple ports

Add  portfowardtester 1.2 to whitelist
2017-02-01 06:32:04 -07:00
Kubernetes Submit Queue e80da46b94 Merge pull request #40565 from bprashanth/glbc-version
Automatic merge from submit-queue (batch tested with PRs 40126, 40565, 38777, 40564, 40572)

Bump up glbc version to 0.9.0-beta.1

I plan to bump up the version to 0.9.0 proper in time for the next 1.5.x release, and cherry-pick both this and the future pr.

Previously we were just using a single version, but the "-beta/alpha" is consistent with how we release kube and gives us a convenient revert target. It also forces us to remove the "beta" tag before code freeze, and track the kubernetes release cycle.
2017-01-27 01:34:17 -08:00
bprashanth b1e0bd0fa4 Bump up glbc version to beta.1 2017-01-26 14:53:50 -08:00
Mik Vyatskov 7b194d496f Fix fluentd-gcp configuration to handle different timezones on the node 2017-01-24 11:53:15 +01:00
Antoine Pelisse 62af7dd33d OWNERS: Update latest OWNERS files
These files have been created lately, so we don't have much information
about them anyway, so let's just:
- Remove assignees and make them approvers
- Copy approves as reviewers
2017-01-23 10:05:48 -08:00
Kubernetes Submit Queue 1430597f7e Merge pull request #39966 from liggitt/cert-users
Automatic merge from submit-queue (batch tested with PRs 40168, 40165, 39158, 39966, 40190)

Include system:masters group in the bootstrap admin client certificate

Sets up the bootstrap admin client certificate for new clusters to be in the system:masters group

Removes the need for an explicit grant to the kubecfg user in e2e-bindings

```release-note
The default client certificate generated by kube-up now contains the superuser `system:masters` group
```
2017-01-20 08:28:51 -08:00
thomasschickinger 42fbf93fb0 Add rule for detecting exceptions to fluentd config for GKE logging.
Bump version of gcp-fluentd container to 1.34
2017-01-19 15:51:47 +01:00
Kubernetes Submit Queue 5e4625cad7 Merge pull request #40017 from Crassirostris/fluentd-gcp-image-fix
Automatic merge from submit-queue (batch tested with PRs 40003, 40017)

Remove library copying from fluentd image

It seems that fluentd can no longer copy systemd libraries from host to be able to read journals.
2017-01-19 05:54:08 -08:00
Kubernetes Submit Queue b29d9cdbcf Merge pull request #39898 from ixdy/bazel-release-tars
Automatic merge from submit-queue

Build release tars using bazel

**What this PR does / why we need it**: builds equivalents of the various kubernetes release tarballs, solely using bazel.

For example, you can now do
```console
$ make bazel-release
$ hack/e2e.go -v -up -test -down
```

**Special notes for your reviewer**: this is currently dependent on 3b29803eb5, which I have yet to turn into a pull request, since I'm still trying to figure out if this is the best approach.

Basically, the issue comes up with the way we generate the various server docker image tarfiles and load them on nodes:
* we `md5sum` the binary being encapsulated (e.g. kube-proxy) and save that to `$binary.docker_tag` in the server tarball
* we then build the docker image and tag using that md5sum (e.g. `gcr.io/google_containers/kube-proxy:$MD5SUM`)
* we `docker save` this image, which embeds the full tag in the `$binary.tar` file.
* on cluster startup, we `docker load` these tarballs, which are loaded with the tag that we'd created at build time. the nodes then use the `$binary.docker_tag` file to find the right image.

With the current bazel `docker_build` rule, the tag isn't saved in the docker image tar, so the node is unable to find the image after `docker load`ing it.

My changes to the rule save the tag in the docker image tar, though I don't know if there are subtle issues with it. (Maybe we want to only tag when `--stamp` is given?)

Also, the docker images produced by bazel have the timestamp set to the unix epoch, which is not great for debugging. Might be another thing to change with a `--stamp`.

Long story short, we probably need to follow up with bazel folks on the best way to solve this problem.

**Release note**:

```release-note
NONE
```
2017-01-18 14:24:48 -08:00
Kubernetes Submit Queue 6dfe5c49f6 Merge pull request #38865 from vwfs/ext4_no_lazy_init
Automatic merge from submit-queue

Enable lazy initialization of ext3/ext4 filesystems

**What this PR does / why we need it**: It enables lazy inode table and journal initialization in ext3 and ext4.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #30752, fixes #30240

**Release note**:
```release-note
Enable lazy inode table and journal initialization for ext3 and ext4
```

**Special notes for your reviewer**:
This PR removes the extended options to mkfs.ext3/mkfs.ext4, so that the defaults (enabled) for lazy initialization are used.

These extended options come from a script that was historically located at */usr/share/google/safe_format_and_mount* and later ported to GO so this dependency to the script could be removed. After some search, I found the original script here: https://github.com/GoogleCloudPlatform/compute-image-packages/blob/legacy/google-startup-scripts/usr/share/google/safe_format_and_mount

Checking the history of this script, I found the commit [Disable lazy init of inode table and journal.](4d7346f7f5). This one introduces the extended flags with this description:
```
Now that discard with guaranteed zeroing is supported by PD,
initializing them is really fast and prevents perf from being affected
when the filesystem is first mounted.
```

The problem is, that this is not true for all cloud providers and all disk types, e.g. Azure and AWS. I only tested with magnetic disks on Azure and AWS, so maybe it's different for SSDs on these cloud providers. The result is that this performance optimization dramatically increases the time needed to format a disk in such cases.

When mkfs.ext4 is told to not lazily initialize the inode tables and the check for guaranteed zeroing on discard fails, it falls back to a very naive implementation that simply loops and writes zeroed buffers to the disk. Performance on this highly depends on free memory and also uses up all this free memory for write caching, reducing performance of everything else in the system. 

As of https://github.com/kubernetes/kubernetes/issues/30752, there is also something inside kubelet that somehow degrades performance of all this. It's however not exactly known what it is but I'd assume it has something to do with cgroups throttling IO or memory. 

I checked the kernel code for lazy inode table initialization. The nice thing is, that the kernel also does the guaranteed zeroing on discard check. If it is guaranteed, the kernel uses discard for the lazy initialization, which should finish in a just few seconds. If it is not guaranteed, it falls back to using *bio*s, which does not require the use of the write cache. The result is, that free memory is not required and not touched, thus performance is maxed and the system does not suffer.

As the original reason for disabling lazy init was a performance optimization and the kernel already does this optimization by default (and in a much better way), I'd suggest to completely remove these flags and rely on the kernel to do it in the best way.
2017-01-18 09:09:52 -08:00
Mik Vyatskov 83df5b8495 Remove library copying from fluentd image 2017-01-17 15:00:48 +01:00
Mik Vyatskov 5b96233423 Sync fluentd daemonset liveness probe with static pod liveness probe 2017-01-17 13:29:54 +01:00