Commit Graph

1476 Commits (319989854190e10364e83f051b6fcd7118cb5750)

Author SHA1 Message Date
Kubernetes Submit Queue 7fbb458f6d Merge pull request #40213 from jszczepkowski/ha-e2e-tests
Automatic merge from submit-queue (batch tested with PRs 39260, 40216, 40213, 40325, 40333)

Fixed propagation of kube master certs during master replication.

Fixed propagation of kube-master-certs during master replication.
2017-01-24 16:26:02 -08:00
Kubernetes Submit Queue 054c84e22f Merge pull request #40299 from lucab/to-k8s/rkt-1.23.0
Automatic merge from submit-queue (batch tested with PRs 40299, 40311)

cluster: update default rkt version to 1.23.0

This updates cluster configurations to current stable rkt version.
2017-01-24 08:59:57 -08:00
Antoine Pelisse 62af7dd33d OWNERS: Update latest OWNERS files
These files have been created lately, so we don't have much information
about them anyway, so let's just:
- Remove assignees and make them approvers
- Copy approves as reviewers
2017-01-23 10:05:48 -08:00
Luca Bruno b4bc44b9ff
cluster: update default rkt version to 1.23.0 2017-01-23 15:22:33 +00:00
Mike Danese 513994a9f8 pass CA key to signer in GCE 2017-01-20 11:10:19 -08:00
Jerzy Szczepkowski d1a73fa5cd Fixed propagation of kube master certs during master replication.
Fixed propagation of kube master certs during master replication.
2017-01-20 13:24:09 +01:00
Ryan Hallisey dbb92f9836 Use ensure-temp-dir in the common.sh script
Instead of having an ensure-temp-dir function in multiple
places, add it to the common.sh script which is sourced by
all the providers.
2017-01-19 09:30:50 -05:00
Maisem Ali 52b6c9bb41 Adding cos as an alias for gci. 2017-01-18 15:14:25 -08:00
Kubernetes Submit Queue b29d9cdbcf Merge pull request #39898 from ixdy/bazel-release-tars
Automatic merge from submit-queue

Build release tars using bazel

**What this PR does / why we need it**: builds equivalents of the various kubernetes release tarballs, solely using bazel.

For example, you can now do
```console
$ make bazel-release
$ hack/e2e.go -v -up -test -down
```

**Special notes for your reviewer**: this is currently dependent on 3b29803eb5, which I have yet to turn into a pull request, since I'm still trying to figure out if this is the best approach.

Basically, the issue comes up with the way we generate the various server docker image tarfiles and load them on nodes:
* we `md5sum` the binary being encapsulated (e.g. kube-proxy) and save that to `$binary.docker_tag` in the server tarball
* we then build the docker image and tag using that md5sum (e.g. `gcr.io/google_containers/kube-proxy:$MD5SUM`)
* we `docker save` this image, which embeds the full tag in the `$binary.tar` file.
* on cluster startup, we `docker load` these tarballs, which are loaded with the tag that we'd created at build time. the nodes then use the `$binary.docker_tag` file to find the right image.

With the current bazel `docker_build` rule, the tag isn't saved in the docker image tar, so the node is unable to find the image after `docker load`ing it.

My changes to the rule save the tag in the docker image tar, though I don't know if there are subtle issues with it. (Maybe we want to only tag when `--stamp` is given?)

Also, the docker images produced by bazel have the timestamp set to the unix epoch, which is not great for debugging. Might be another thing to change with a `--stamp`.

Long story short, we probably need to follow up with bazel folks on the best way to solve this problem.

**Release note**:

```release-note
NONE
```
2017-01-18 14:24:48 -08:00
Kubernetes Submit Queue 76d023ca90 Merge pull request #40094 from zmerlynn/cvm-v20170117
Automatic merge from submit-queue (batch tested with PRs 36467, 36528, 39568, 40094, 39042)

Bump GCE to container-vm-v20170117

Base image update only, no kubelet or Docker updates.

```release-note
Update GCE ContainerVM deployment to container-vm-v20170117 to pick up CVE fixes in base image.
```
2017-01-18 13:37:12 -08:00
Zach Loafman a0b8fd618f Bump GCE to container-vm-v20170117
Base image update only, no kubelet or Docker updates.
2017-01-18 10:50:17 -08:00
Kubernetes Submit Queue 6dfe5c49f6 Merge pull request #38865 from vwfs/ext4_no_lazy_init
Automatic merge from submit-queue

Enable lazy initialization of ext3/ext4 filesystems

**What this PR does / why we need it**: It enables lazy inode table and journal initialization in ext3 and ext4.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #30752, fixes #30240

**Release note**:
```release-note
Enable lazy inode table and journal initialization for ext3 and ext4
```

**Special notes for your reviewer**:
This PR removes the extended options to mkfs.ext3/mkfs.ext4, so that the defaults (enabled) for lazy initialization are used.

These extended options come from a script that was historically located at */usr/share/google/safe_format_and_mount* and later ported to GO so this dependency to the script could be removed. After some search, I found the original script here: https://github.com/GoogleCloudPlatform/compute-image-packages/blob/legacy/google-startup-scripts/usr/share/google/safe_format_and_mount

Checking the history of this script, I found the commit [Disable lazy init of inode table and journal.](4d7346f7f5). This one introduces the extended flags with this description:
```
Now that discard with guaranteed zeroing is supported by PD,
initializing them is really fast and prevents perf from being affected
when the filesystem is first mounted.
```

The problem is, that this is not true for all cloud providers and all disk types, e.g. Azure and AWS. I only tested with magnetic disks on Azure and AWS, so maybe it's different for SSDs on these cloud providers. The result is that this performance optimization dramatically increases the time needed to format a disk in such cases.

When mkfs.ext4 is told to not lazily initialize the inode tables and the check for guaranteed zeroing on discard fails, it falls back to a very naive implementation that simply loops and writes zeroed buffers to the disk. Performance on this highly depends on free memory and also uses up all this free memory for write caching, reducing performance of everything else in the system. 

As of https://github.com/kubernetes/kubernetes/issues/30752, there is also something inside kubelet that somehow degrades performance of all this. It's however not exactly known what it is but I'd assume it has something to do with cgroups throttling IO or memory. 

I checked the kernel code for lazy inode table initialization. The nice thing is, that the kernel also does the guaranteed zeroing on discard check. If it is guaranteed, the kernel uses discard for the lazy initialization, which should finish in a just few seconds. If it is not guaranteed, it falls back to using *bio*s, which does not require the use of the write cache. The result is, that free memory is not required and not touched, thus performance is maxed and the system does not suffer.

As the original reason for disabling lazy init was a performance optimization and the kernel already does this optimization by default (and in a much better way), I'd suggest to completely remove these flags and rely on the kernel to do it in the best way.
2017-01-18 09:09:52 -08:00
Jeff Grafton bc4b6ac397 Build release tarballs in bazel and add `make bazel-release` rule 2017-01-13 16:17:44 -08:00
Jordan Liggitt d94bb26776
Conditionally write token file entries 2017-01-13 17:59:46 -05:00
Kubernetes Submit Queue d50c027d0c Merge pull request #39537 from liggitt/legacy-policy
Automatic merge from submit-queue (batch tested with PRs 39803, 39698, 39537, 39478)

include bootstrap admin in super-user group, ensure tokens file is correct on upgrades

Fixes https://github.com/kubernetes/kubernetes/issues/39532

Possible issues with cluster bring-up scripts:

- [x] known_tokens.csv and basic_auth.csv is not rewritten if the file already exists
  * new users (like the controller manager) are not available on upgrade
  * changed users (like the kubelet username change) are not reflected
  * group additions (like the addition of admin to the superuser group) don't take effect on upgrade
  * this PR updates the token and basicauth files line-by-line to preserve user additions, but also ensure new data is persisted
- [x] existing 1.5 clusters may depend on more permissive ABAC permissions (or customized ABAC policies). This PR adds an option to enable existing ABAC policy files for clusters that are upgrading

Follow-ups:
- [ ] both scripts are loading e2e role-bindings, which only be loaded in e2e tests, not in normal kube-up scenarios
- [ ] when upgrading, set the option to use existing ABAC policy files
- [ ] update bootstrap superuser client certs to add superuser group? ("We also have a certificate that "used to be" a super-user. On GCE, it has CN "kubecfg", on GKE it's "client"")
- [ ] define (but do not load by default) a relaxed set of RBAC roles/rolebindings matching legacy ABAC, and document how to load that for new clusters that do not want to isolate user permissions
2017-01-12 15:06:31 -08:00
Jordan Liggitt 968b0b30cf
Update token users if needed 2017-01-11 17:21:12 -05:00
Jordan Liggitt 21b422fccc
Allow enabling ABAC authz 2017-01-11 17:20:51 -05:00
Jordan Liggitt 1fe517e96a
Include admin in super-user group 2017-01-11 17:20:42 -05:00
Euan Kemp eeef293ee2 container-linux: restart rkt-api on failure
This works around a flake I saw which had the same root cause as
https://github.com/coreos/rkt/issues/3513.

This will potentially help reduce the impact of such future problems as
well.
2017-01-11 00:25:14 -08:00
Kubernetes Submit Queue ebc8e40694 Merge pull request #39691 from yujuhong/bump_timeout
Automatic merge from submit-queue (batch tested with PRs 39694, 39383, 39651, 39691, 39497)

Bump container-linux and gci timeout for docker health check

The command `docker ps` can take longer time to respond under heavy load or
when encountering some known issues. In these cases, the containers are running
fine, so aggressive health check could cause serious disruption. Bump the
timeout to 60s to be consistent with the debian-based containerVM.

This addresses #38588
2017-01-10 21:25:16 -08:00
Kubernetes Submit Queue addc6cae4a Merge pull request #38212 from mikedanese/kubeletauth
Automatic merge from submit-queue (batch tested with PRs 38212, 38792, 39641, 36390, 39005)

Generate a kubelet CA and kube-apiserver cert-pair for kubelet auth.

cc @cjcullen
2017-01-10 19:48:09 -08:00
Yu-Ju Hong 4e87973a9b Bump container-linux and gci timeout for docker health check
The command `docker ps` can take longer time to respond under heavy load or
when encountering some known issues. In these cases, the containers are running
fine, so aggressive health check could cause serious disruption. Bump the
timeout to 60s to be consistent with the debian-based containerVM.
2017-01-10 13:07:21 -08:00
Kubernetes Submit Queue 8ef6902516 Merge pull request #39451 from euank/remove-abac
Automatic merge from submit-queue

cluster/cl: move abac to rbac

See #39092

We based off of GCI in the brief time where it was using abac.

fixes #39395

cc @yifan-gu 

**Release note**:
```release-note
NONE
```
2017-01-05 12:31:17 -08:00
Euan Kemp c1afc4a3d8 cluster/cl: move abac to rbac
See #39092

We based off of GCI in the brief time where it was using abac.
2017-01-04 16:10:59 -08:00
Mike Danese 3ab0e37cc6 implement upgrades 2017-01-04 11:45:57 -08:00
CJ Cullen d0997a3d1f Generate a kubelet CA and kube-apiserver cert-pair for kubelet auth.
Plumb through to kubelet/kube-apiserver on gci & cvm.
2017-01-03 14:30:45 -08:00
Zach Loafman a3b363000d Fix AWS break injected by kubernetes/kubernetes#39020 2017-01-03 13:52:02 -08:00
deads2k ecd23a0217 remove abac authorizer from e2e 2017-01-03 07:53:03 -05:00
Yifan Gu dd59aa1c3b cluster/gce: Rename coreos to container-linux. 2016-12-30 15:32:02 -08:00
Kubernetes Submit Queue 274a9f0f70 Merge pull request #38927 from luxas/remove_maintainer
Automatic merge from submit-queue

Remove all MAINTAINER statements in the codebase as they are deprecated

**What this PR does / why we need it**:
ref: https://github.com/docker/docker/pull/25466

**Release note**:

```release-note
Remove all MAINTAINER statements in Dockerfiles in the codebase as they are deprecated by docker
```
@ixdy @thockin (who else should be notified?)
2016-12-29 16:41:24 -08:00
deads2k 19391164b9 add additional e2e rbac bindings to match existing users 2016-12-21 16:24:45 -05:00
deads2k 2e2a2e4b94 update gce for RBAC, controllers, proxy, kubelet (p1) 2016-12-21 13:51:49 -05:00
deads2k 8360bc1a9f create kubelet client cert with correct group 2016-12-20 14:18:17 -05:00
Alexander Block 13a2bc8afb Enable lazy initialization of ext3/ext4 filesystems 2016-12-18 11:08:51 +01:00
Euan Kemp 028a0140d0 cluster/coreos: delete mounter
We don't use this bit of gci currently.
2016-12-17 21:36:32 -08:00
Euan Kemp 13afe18ab4 cluster/coreos: update to gci based implementation
This update includes significant refactoring. It moves almost all of the
logic into bash scripts, modeled after the `gci` cluster scripts.

The primary differences between the two are the following:
1. Use of the `/opt/kubernetes` directory over `/home/kubernetes`
2. Support for rkt as a runtime
3. No use of logrotate
4. No use of `/etc/default/`
5. No logic related to noexec mounts or gci-specific firewall-stuff
2016-12-17 21:36:31 -08:00
Euan Kemp e2644bb442 cluster/gce: copy gci -> coreos
This is for reviewing ease as the following commits introduce changes
to make the coreos kube-up deployment share significant code with the
gci code.
2016-12-17 21:36:30 -08:00
Lucas Käldström 3c5b5f5963 Remove all MAINTAINER statements in the codebase as they aren't very useful and now deprecated 2016-12-17 20:34:10 +02:00
Kubernetes Submit Queue a4577e70ab Merge pull request #38808 from du2016/change-heapster-version
Automatic merge from submit-queue (batch tested with PRs 38906, 38808)

change the version in the yaml file

change the version in heapster-controller.yaml with image version
2016-12-17 00:41:24 -08:00
Euan Kemp 9a8c6ac41e cluster/gce/coreos: add OWNERS 2016-12-16 14:08:54 -08:00
Piotr Szczesniak a52637f09f Migrated fluentd to daemon set 2016-12-15 13:48:32 +01:00
du2016 90e2c31fa7 change the version in the yaml file 2016-12-15 07:14:19 -05:00
Jeff Grafton 27d096d27d Rename build-tools/ back to build/ 2016-12-14 13:42:15 -08:00
Kubernetes Submit Queue 911d10654c Merge pull request #38638 from madhusudancs/fed-bootstrap-e2e-logs-firewall
Automatic merge from submit-queue

Use the cluster name in the names of the firewall rules that allow cluster-internal traffic to disambiguate the rules belonging to different clusters.

Also dropping the network name from these firewall rule names.

Network name was used to disambiguate firewall rules in a given network.
However, since two clusters cannot share a name in a GCE project, this
sufficiently disambiguates the firewall rule names. A potential confusion
arises when someone tries to create a firewall rule with the same name
in a different network, but that's also an indication that they shouldn't
be doing that.


@jszczepkowski due to PR #33094
@ixdy for test-infra

cc @kubernetes/sig-federation @nikhiljindal
2016-12-13 22:07:04 -08:00
Amey Deshpande 5ec42e6a25 Ensure the GCI metadata files do not have whitespace at the end
Fixes #36708
2016-12-13 13:41:54 -08:00
Madhusudan.C.S 174856509e Dropping the network name from the internal master and node firewall rules.
Network name was used to disambiguate firewall rules in a given network.
However, since two clusters cannot share a name in a GCE project, this
sufficiently disambiguates the firewall rule names. A potential confusion
arises when someone tries to create a firewall rule with the same name
in a different network, but that's also an indication that they shouldn't
be doing that.
2016-12-13 11:21:14 -08:00
Kubernetes Submit Queue 18d05c7d56 Merge pull request #38640 from mtaufen/gci-version-env
Automatic merge from submit-queue

Allow GCI_VERSION to come from env

This is to facilitate GCI tip vs. K8s tip testing; we need to
dynamically set the version of GCI to stay current with their
latest canary (latest of the "gci-base" prefixed images).
2016-12-13 09:54:45 -08:00
Michael Taufen fe4552057e Allow GCI_VERSION to come from env
This is to facilitate GCI tip vs. K8s tip testing; we need to
dynamically set the version of GCI to stay current with their
latest canary (latest of the "gci-base" prefixed images).
2016-12-12 11:19:56 -08:00
Madhusudan.C.S d92cf4df5e Use the cluster name in the names of the firewall rules that allow cluster-internal traffic to disambiguate the rules belonging to different clusters. 2016-12-12 10:58:53 -08:00
Jerzy Szczepkowski b01e3c1e17 Fixed detection of master during creation of multizone nodes.
Fixed detection of master during creation of multizone nodes.
2016-12-12 15:46:39 +01:00