Commit Graph

5093 Commits (093ceb95288498830943e6a9655acb02d1375f47)

Author SHA1 Message Date
Kubernetes Submit Queue 093ceb9528 Merge pull request #39033 from shyamjvs/provider-independent-kubemark
Automatic merge from submit-queue (batch tested with PRs 36693, 40154, 40170, 39033)

Refactored kubemark into cloud-provider independent code and GCE specific code

Ref issue #38967 

The following are the major changes as part of this refactoring:
- Moved cluster-kubemark/config-default.sh -> cluster-kubemark/gce/config-default.sh (as the config is gce-specific)
- Changed kubernetes/cluster/kubemark/util.sh to source the right scripts based on the cloud-provider
- Added test/kubemark/skeleton/util.sh which defines a well-commented interface that any cloud-provider should implement to run kubemark. (We have this interface defined only for gce currently)
  This includes functions like creating the master machine instance along with its resources, executing a given command on the master (like ssh), scp, deleting the master instance and its resources.
  All these functions have to be overrided by each cloud provider inside the file /test/kubemark/$CLOUD_PROVIDER/util.sh
- Added the file test/kubemark/cloud-provider-config.sh which sets the variable CLOUD_PROVIDER that is later picked up by various scripts (start-kubemark.sh, stop-kubemark.sh, run-e2e-tests.sh)
- Removed test/kubemark/common.sh and moved whatever provider-independent code it had into start-kubemark.sh (the only place where the scipt is called) and moved the little gce-specific code
  into test/kubemark/gce/util.sh.
- Finally, removed useless code and restructured start-kubemark.sh and stop-kubemark.sh scripts.

@kubernetes/sig-scalability-misc @wojtek-t @gmarek
2017-01-20 09:18:54 -08:00
Kubernetes Submit Queue 22a405055d Merge pull request #40170 from deads2k/client-10-restclient
Automatic merge from submit-queue (batch tested with PRs 36693, 40154, 40170, 39033)

make client-go authoritative for pkg/client/restclient

Moves client/restclient to client-go and a util/certs, util/testing as transitives.
2017-01-20 09:18:52 -08:00
Kubernetes Submit Queue 1430597f7e Merge pull request #39966 from liggitt/cert-users
Automatic merge from submit-queue (batch tested with PRs 40168, 40165, 39158, 39966, 40190)

Include system:masters group in the bootstrap admin client certificate

Sets up the bootstrap admin client certificate for new clusters to be in the system:masters group

Removes the need for an explicit grant to the kubecfg user in e2e-bindings

```release-note
The default client certificate generated by kube-up now contains the superuser `system:masters` group
```
2017-01-20 08:28:51 -08:00
deads2k ee6752ef20 find and replace 2017-01-20 08:04:53 -05:00
Kubernetes Submit Queue 0610a23986 Merge pull request #40164 from apelisse/update-root-approvers-files
Automatic merge from submit-queue

Update root approvers files

Replaces #40040 

Update top level OWNERS files mostly to set assignees to approvers. Also remove @bgrant0607 from everywhere but the very top level OWNERS file.
2017-01-19 17:02:18 -08:00
Kubernetes Submit Queue cc2250cb98 Merge pull request #40147 from rthallisey/common-ensure-temp-dir
Automatic merge from submit-queue

Use ensure-temp-dir in the common.sh script

Ref issue #38967

Instead of having an ensure-temp-dir function in multiple
places, add it to the common.sh script which is sourced by
all the providers.
2017-01-19 11:57:35 -08:00
Garrett Rodrigues ad1e5e98c2 Updated top level owners file to match new format 2017-01-19 11:29:16 -08:00
thomasschickinger 42fbf93fb0 Add rule for detecting exceptions to fluentd config for GKE logging.
Bump version of gcp-fluentd container to 1.34
2017-01-19 15:51:47 +01:00
Shyam Jeedigunta d2fadbe30f Refactored kubemark code into provider-specific and provider-independent parts 2017-01-19 15:34:13 +01:00
Ryan Hallisey dbb92f9836 Use ensure-temp-dir in the common.sh script
Instead of having an ensure-temp-dir function in multiple
places, add it to the common.sh script which is sourced by
all the providers.
2017-01-19 09:30:50 -05:00
Kubernetes Submit Queue 5e4625cad7 Merge pull request #40017 from Crassirostris/fluentd-gcp-image-fix
Automatic merge from submit-queue (batch tested with PRs 40003, 40017)

Remove library copying from fluentd image

It seems that fluentd can no longer copy systemd libraries from host to be able to read journals.
2017-01-19 05:54:08 -08:00
Kubernetes Submit Queue 29e2d8be09 Merge pull request #40113 from maisem/cos
Automatic merge from submit-queue

Adding cos as an alias for gci.

**What this PR does / why we need it**: Adding COS as an alias for GCI.

cc: @adityakali @wonderfly
2017-01-18 18:40:43 -08:00
Kubernetes Submit Queue 0c61553cbc Merge pull request #40105 from sc68cal/bugs/40102
Automatic merge from submit-queue (batch tested with PRs 40105, 40095)

[OpenStack-Heat] Fix regex used to get object-store URL

**Release note**:

```release-note

Fixes a bug in the OpenStack-Heat kubernetes provider, in the handling of differences between the Identity v2 and Identity v3 APIs

```
2017-01-18 15:54:08 -08:00
Maisem Ali 52b6c9bb41 Adding cos as an alias for gci. 2017-01-18 15:14:25 -08:00
Kubernetes Submit Queue b29d9cdbcf Merge pull request #39898 from ixdy/bazel-release-tars
Automatic merge from submit-queue

Build release tars using bazel

**What this PR does / why we need it**: builds equivalents of the various kubernetes release tarballs, solely using bazel.

For example, you can now do
```console
$ make bazel-release
$ hack/e2e.go -v -up -test -down
```

**Special notes for your reviewer**: this is currently dependent on 3b29803eb5, which I have yet to turn into a pull request, since I'm still trying to figure out if this is the best approach.

Basically, the issue comes up with the way we generate the various server docker image tarfiles and load them on nodes:
* we `md5sum` the binary being encapsulated (e.g. kube-proxy) and save that to `$binary.docker_tag` in the server tarball
* we then build the docker image and tag using that md5sum (e.g. `gcr.io/google_containers/kube-proxy:$MD5SUM`)
* we `docker save` this image, which embeds the full tag in the `$binary.tar` file.
* on cluster startup, we `docker load` these tarballs, which are loaded with the tag that we'd created at build time. the nodes then use the `$binary.docker_tag` file to find the right image.

With the current bazel `docker_build` rule, the tag isn't saved in the docker image tar, so the node is unable to find the image after `docker load`ing it.

My changes to the rule save the tag in the docker image tar, though I don't know if there are subtle issues with it. (Maybe we want to only tag when `--stamp` is given?)

Also, the docker images produced by bazel have the timestamp set to the unix epoch, which is not great for debugging. Might be another thing to change with a `--stamp`.

Long story short, we probably need to follow up with bazel folks on the best way to solve this problem.

**Release note**:

```release-note
NONE
```
2017-01-18 14:24:48 -08:00
Kubernetes Submit Queue 76d023ca90 Merge pull request #40094 from zmerlynn/cvm-v20170117
Automatic merge from submit-queue (batch tested with PRs 36467, 36528, 39568, 40094, 39042)

Bump GCE to container-vm-v20170117

Base image update only, no kubelet or Docker updates.

```release-note
Update GCE ContainerVM deployment to container-vm-v20170117 to pick up CVE fixes in base image.
```
2017-01-18 13:37:12 -08:00
Sean M. Collins 8ad7e1613a [OpenStack-Heat] Fix regex used to get object-store URL
"publicURL" is used for endpoints in the Identity v2 API, while in the
Identity v3 API it has been changed to just "public"

Fixes #40102
2017-01-18 16:29:41 -05:00
Zach Loafman a0b8fd618f Bump GCE to container-vm-v20170117
Base image update only, no kubelet or Docker updates.
2017-01-18 10:50:17 -08:00
Kubernetes Submit Queue 6dfe5c49f6 Merge pull request #38865 from vwfs/ext4_no_lazy_init
Automatic merge from submit-queue

Enable lazy initialization of ext3/ext4 filesystems

**What this PR does / why we need it**: It enables lazy inode table and journal initialization in ext3 and ext4.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #30752, fixes #30240

**Release note**:
```release-note
Enable lazy inode table and journal initialization for ext3 and ext4
```

**Special notes for your reviewer**:
This PR removes the extended options to mkfs.ext3/mkfs.ext4, so that the defaults (enabled) for lazy initialization are used.

These extended options come from a script that was historically located at */usr/share/google/safe_format_and_mount* and later ported to GO so this dependency to the script could be removed. After some search, I found the original script here: https://github.com/GoogleCloudPlatform/compute-image-packages/blob/legacy/google-startup-scripts/usr/share/google/safe_format_and_mount

Checking the history of this script, I found the commit [Disable lazy init of inode table and journal.](4d7346f7f5). This one introduces the extended flags with this description:
```
Now that discard with guaranteed zeroing is supported by PD,
initializing them is really fast and prevents perf from being affected
when the filesystem is first mounted.
```

The problem is, that this is not true for all cloud providers and all disk types, e.g. Azure and AWS. I only tested with magnetic disks on Azure and AWS, so maybe it's different for SSDs on these cloud providers. The result is that this performance optimization dramatically increases the time needed to format a disk in such cases.

When mkfs.ext4 is told to not lazily initialize the inode tables and the check for guaranteed zeroing on discard fails, it falls back to a very naive implementation that simply loops and writes zeroed buffers to the disk. Performance on this highly depends on free memory and also uses up all this free memory for write caching, reducing performance of everything else in the system. 

As of https://github.com/kubernetes/kubernetes/issues/30752, there is also something inside kubelet that somehow degrades performance of all this. It's however not exactly known what it is but I'd assume it has something to do with cgroups throttling IO or memory. 

I checked the kernel code for lazy inode table initialization. The nice thing is, that the kernel also does the guaranteed zeroing on discard check. If it is guaranteed, the kernel uses discard for the lazy initialization, which should finish in a just few seconds. If it is not guaranteed, it falls back to using *bio*s, which does not require the use of the write cache. The result is, that free memory is not required and not touched, thus performance is maxed and the system does not suffer.

As the original reason for disabling lazy init was a performance optimization and the kernel already does this optimization by default (and in a much better way), I'd suggest to completely remove these flags and rely on the kernel to do it in the best way.
2017-01-18 09:09:52 -08:00
Kubernetes Submit Queue 16f45aee85 Merge pull request #39925 from appscode/kube-dns-1.11.0
Automatic merge from submit-queue

Use kube-dns:1.11.0

Use [kube-dns:1.11.0](https://github.com/kubernetes/dns/releases/tag/1.11.0)

With: kubernetes/dns#25
Fixes kubernetes/kubernetes#26752
Fixes kubernetes/kubernetes#33470

@bowei @thockin
2017-01-17 10:08:48 -08:00
Kubernetes Submit Queue 685e421b89 Merge pull request #40020 from wojtek-t/really_enable_etcd3
Automatic merge from submit-queue (batch tested with PRs 34763, 38706, 39939, 40020)

Really enable etcd3

Ref #39589

@timothysc @hongchaodeng
2017-01-17 09:14:52 -08:00
sadlil e075e2e633 Use kube-dns:1.11.0 2017-01-17 08:37:24 -08:00
Wojciech Tyczynski 61f2201304 Really enable etcd3 2017-01-17 15:57:43 +01:00
Kubernetes Submit Queue 936a94f0a8 Merge pull request #40012 from Crassirostris/fluentd-liveness-probe-sync
Automatic merge from submit-queue (batch tested with PRs 39911, 40002, 39969, 40012, 40009)

Sync fluentd daemonset liveness probe with static pod liveness probe

Syncing change from https://github.com/kubernetes/kubernetes/pull/39949

Should also be cherry-picked
2017-01-17 06:46:58 -08:00
Mik Vyatskov 83df5b8495 Remove library copying from fluentd image 2017-01-17 15:00:48 +01:00
Kubernetes Submit Queue 002cdfa1ae Merge pull request #39861 from Traum-Ferienwohnungen/hostname_as_nodename
Automatic merge from submit-queue

Use $HOSTNAME as node.name by default

**What this PR does / why we need it**:
Allows to identify elasticsearch instances more easily.
As $HOSTNAME of a pod is unique, this should be no problem.
2017-01-17 04:57:09 -08:00
Mik Vyatskov 5b96233423 Sync fluentd daemonset liveness probe with static pod liveness probe 2017-01-17 13:29:54 +01:00
Janis Meybohm 6b3284acd2 Use $HOSTNAME as node.name by default
Allows to identify elasticsearch instances more easily.
As $HOSTNAME of a pod is unique, this should be no problem.
2017-01-17 08:38:53 +01:00
Jordan Liggitt 264dbf0daf
Remove direct kubecfg RBAC grant 2017-01-16 14:12:15 -05:00
Jordan Liggitt 7e98e06e48
Include system:masters group in the bootstrap admin client certificate 2017-01-16 14:01:24 -05:00
Kubernetes Submit Queue 06c610e276 Merge pull request #39949 from Crassirostris/fluentd-liveness-probe-fix
Automatic merge from submit-queue (batch tested with PRs 38592, 39949, 39946, 39882)

Remove fluentd buffers if fluentd is stuck

Fluentd now stores its buffers on disk for the resiliency. However, if buffer is corrupted, fluentd will be restarting forever.

Following change will make fluentd liveness probe delete buffers if fluentd is stuck for more than X minutes (15 by default).
2017-01-16 10:37:40 -08:00
Mik Vyatskov edf1ffc074 Remove fluentd buffers if fluentd is stuck 2017-01-16 13:47:23 +01:00
Jeff Grafton b9e060a630 Update scripts to look for binary artifacts in bazel-bin/ 2017-01-13 16:17:48 -08:00
Jeff Grafton bc4b6ac397 Build release tarballs in bazel and add `make bazel-release` rule 2017-01-13 16:17:44 -08:00
Jordan Liggitt d94bb26776
Conditionally write token file entries 2017-01-13 17:59:46 -05:00
Kubernetes Submit Queue 31483bf546 Merge pull request #39770 from ixdy/ubuntu-slim-base-image
Automatic merge from submit-queue

Update images that use ubuntu-slim base image to :0.6

**What this PR does / why we need it**: `ubuntu-slim:0.4` is somewhat old, being based on Ubuntu 16.04, whereas `ubuntu-slim:0.6` is based on Ubuntu 16.04.1.

**Special notes for your reviewer**: I haven't pushed any of these images yet, so I expect all of the e2e builds to fail. If we're happy with the changes, I can push the images and then re-trigger tests.

**Release note**:

```release-note
NONE
```

cc @aledbf as FYI
2017-01-12 20:39:13 -08:00
Kubernetes Submit Queue ae04755d71 Merge pull request #39827 from MrHohn/addon-manager-v6.2
Automatic merge from submit-queue

Update kubectl to stable version for Addon Manager

Bumps up Addon Manager to v6.2, below images are pushed:
- gcr.io/google-containers/kube-addon-manager:v6.2
- gcr.io/google-containers/kube-addon-manager-amd64:v6.2
- gcr.io/google-containers/kube-addon-manager-arm:v6.2
- gcr.io/google-containers/kube-addon-manager-arm64:v6.2
- gcr.io/google-containers/kube-addon-manager-ppc64le:v6.2
- gcr.io/google-containers/kube-addon-manager-s390x:v6.2

@mikedanese 

cc @ixdy
2017-01-12 15:54:24 -08:00
Kubernetes Submit Queue d50c027d0c Merge pull request #39537 from liggitt/legacy-policy
Automatic merge from submit-queue (batch tested with PRs 39803, 39698, 39537, 39478)

include bootstrap admin in super-user group, ensure tokens file is correct on upgrades

Fixes https://github.com/kubernetes/kubernetes/issues/39532

Possible issues with cluster bring-up scripts:

- [x] known_tokens.csv and basic_auth.csv is not rewritten if the file already exists
  * new users (like the controller manager) are not available on upgrade
  * changed users (like the kubelet username change) are not reflected
  * group additions (like the addition of admin to the superuser group) don't take effect on upgrade
  * this PR updates the token and basicauth files line-by-line to preserve user additions, but also ensure new data is persisted
- [x] existing 1.5 clusters may depend on more permissive ABAC permissions (or customized ABAC policies). This PR adds an option to enable existing ABAC policy files for clusters that are upgrading

Follow-ups:
- [ ] both scripts are loading e2e role-bindings, which only be loaded in e2e tests, not in normal kube-up scenarios
- [ ] when upgrading, set the option to use existing ABAC policy files
- [ ] update bootstrap superuser client certs to add superuser group? ("We also have a certificate that "used to be" a super-user. On GCE, it has CN "kubecfg", on GKE it's "client"")
- [ ] define (but do not load by default) a relaxed set of RBAC roles/rolebindings matching legacy ABAC, and document how to load that for new clusters that do not want to isolate user permissions
2017-01-12 15:06:31 -08:00
Zihong Zheng f62be637c8 Update kubectl to stable version for Addon Manager 2017-01-12 13:49:13 -08:00
Aleksandra Malinowska 043e809b8f update heapster version to 1.3.0-beta.0 2017-01-12 13:42:31 +01:00
Jeff Grafton 1c2ea28080 Update images that use ubuntu-slim base image to :0.6 2017-01-11 15:07:04 -08:00
Jordan Liggitt 968b0b30cf
Update token users if needed 2017-01-11 17:21:12 -05:00
Jordan Liggitt 21b422fccc
Allow enabling ABAC authz 2017-01-11 17:20:51 -05:00
Jordan Liggitt 1fe517e96a
Include admin in super-user group 2017-01-11 17:20:42 -05:00
Kubernetes Submit Queue 12e8271cd3 Merge pull request #33584 from marketlogicsoftware/kayrus/enable_elk_k8s_metadata
Automatic merge from submit-queue

Enable kubernetes_metadata by default for ELK stack

Looks like it was accidentally removed and was not restored back in this PR https://github.com/kubernetes/kubernetes/pull/29883
Because actually this plugin still exists in the image, but new ELK deployment don't allow you to index namespaces, pod names, etc.
2017-01-11 12:19:42 -08:00
Kubernetes Submit Queue 04326905b8 Merge pull request #39721 from euank/rkt-api-restart
Automatic merge from submit-queue (batch tested with PRs 39731, 39662, 39721)

container-linux: restart rkt-api on failure

This works around a flake I saw which had the same root cause as
https://github.com/coreos/rkt/issues/3513.

This will potentially help reduce the impact of such future problems as
well.

```release-note
NONE
```
2017-01-11 11:00:52 -08:00
Kubernetes Submit Queue 9814369ea1 Merge pull request #39662 from rf232/dashboard-v1.5.1
Automatic merge from submit-queue (batch tested with PRs 39731, 39662, 39721)

Update dashboard version to v1.5.1

**What this PR does / why we need it**:
Latest Dashboard developments, including a CSRF issue in the dashboard POST handlers

**Release note**:
```
Set Dashboard UI version to v1.5.1
```
2017-01-11 11:00:50 -08:00
kayrus 8435d19982 Enable kubernetes_metadata by default for ELK stack 2017-01-11 14:08:01 +01:00
Euan Kemp eeef293ee2 container-linux: restart rkt-api on failure
This works around a flake I saw which had the same root cause as
https://github.com/coreos/rkt/issues/3513.

This will potentially help reduce the impact of such future problems as
well.
2017-01-11 00:25:14 -08:00
Kubernetes Submit Queue ebc8e40694 Merge pull request #39691 from yujuhong/bump_timeout
Automatic merge from submit-queue (batch tested with PRs 39694, 39383, 39651, 39691, 39497)

Bump container-linux and gci timeout for docker health check

The command `docker ps` can take longer time to respond under heavy load or
when encountering some known issues. In these cases, the containers are running
fine, so aggressive health check could cause serious disruption. Bump the
timeout to 60s to be consistent with the debian-based containerVM.

This addresses #38588
2017-01-10 21:25:16 -08:00