Automatic merge from submit-queue
Nodefs becomes imagefs on GCI
Kubelet cannot identify rootfs correctly
For #33444
```release-note
Enforce Disk based pod eviction with GCI base image in Kubelet
```
Signed-off-by: Vishnu kannan <vishnuk@google.com>
Changelog:
* Built-in kubernetes updated to v1.4.0
* Enabled VXLAN and IP_SET config options in kernel to support some networking tools
* OpenSSL CVE fixes
Automatic merge from submit-queue
Speed up dockerized builds
This PR speeds up dockerized builds. First, we make sure that we are as incremental as possible. The bigger change is that now we use rsync to move sources into the container and get data back out.
To do yet:
* [x] Add a random password to rsync. This is 128bit MD4, but it is better than nothing.
* [x] Lock down rsync to only come from the host.
* [x] Deal with remote docker engines -- this should be necessary for docker-machine on the mac.
* [x] Allow users to specify the port for the rsync daemon. Perhaps randomize this or let docker pick an ephemeral port and detect the port?
* [x] Copy back generated files so that users can check them in. This is done for `zz_generated.*` files generated by `make generated_files`
* [x] This should include generated proto files so that we can remove the hack-o-rama that is `hack/hack/update-*-dockerized.sh`
* [x] Start "versioning" the build container and the data container so that the CI system doesn't have to be manually kicked.
* [x] Get some benchmarks to qualify how much faster.
This replaces #28518 and is related to #30600.
cc @thockin @spxtr @david-mcmahon @MHBauer
Benchmarks by running `make clean ; sync ; time bash -xc 'time build/make-build-image.sh ; time sync ; time build/run.sh make ; time sync; time build/run.sh make'` on a GCE n1-standard-8 with PD-SSD.
| setup | build image | sync | first build | sync | second build | total |
|-------|-------------|----- |----------|------|--------------|------|
| baseline | 0m11.420s | 0m0.812s | 7m2.353s | 0m42.380s | 7m8.381s | 15m5.348s |
| this pr | 0m10.977s | 0m15.168s | 7m31.096s | 1m55.692s | 0m16.514s | 10m9.449s |
Automatic merge from submit-queue
Add support for vpshere cloud provider in kubeup
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
**What this PR does / why we need it**:
vSphere cloud provider added in 1.3 was not configured when deploying via kubeup
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
Add support for vSphere Cloud Provider when deploying via kubeup on vSphere.
```
When deploying on vSphere using kube up add configuration
for vSphere cloud provider.
Automatic merge from submit-queue
Bump glbc version to 0.8.0
Picks up k8s.io godeps for v1.4 thereby fixing an int overflow bug in the upstream delayed-workqueue pkg. Without this the controller spams logs with retries in the "soft error" case, which is easy to come by when users eg: create ingresses that point to non-exist services.
Should go into 1.4.1, because 1.4.0 is pretty much out at this point.
https://github.com/kubernetes/kubernetes/issues/33279
Automatic merge from submit-queue
Bump up addon kube-dns to v20 for graceful termination
Below images are built and pushed:
- gcr.io/google_containers/kubedns-amd64:1.8
- gcr.io/google_containers/kubedns-arm:1.8
- gcr.io/google_containers/kubedns-arm64:1.8
- gcr.io/google_containers/kubedns-ppc64le:1.8
Both kubedns and dnsmasq are bumped up in the manifest files.
@thockin @bprashanth
Automatic merge from submit-queue
cluster/gci: Minor spacing tweak
Two shall be the number thou shalt indent, and the level of the indent
shall be two. Three shalt thou not indent, neither indent thou once,
excepting that thou then proceed to two. Five is right out.
/cc @andyzheng0831 @jlowdermilk
Two shall be the number thou shalt indent, and the level of the indent
shall be two. Three shalt thou not indent, neither indent thou once,
excepting that thou then proceed to two. Five is right out.
This bug was inadvertently introduced in #32406.
The longer term plan (shouldn't be too much longer) is to remove this
file entirely and rely on the `gci-trusty` version of it, but to stop
some bleeding and allow our jenkins using kube-up + coreos to work, we
should merge this fix until we have the more complete solution.
Automatic merge from submit-queue
Allow building experimenta etcd images
Ref #20504
Once this PR is in, I would like to build and push: "etcd:3.0.10-experimental" image to:
- start testing it
- to make it possible to build a different "3.0.10" image in the future (we will most probably built in some loging into it.
@lavalamp - FYI
Automatic merge from submit-queue
Tune down initialDelaySeconds for readinessProbe.
Fixed#33053.
Tuned down the `initialDelaySeconds`(original 30s) for readiness probe to 3 seconds and `periodSeconds`(default 10s) to 5 seconds to shorten the initial time before a dns server pod being exposed. This configuration passed DNS e2e tests and did not even hit any readiness failure(for kube-dns) with a GCE cluster with 4 nodes during the experiments.
For scaling out kube-dns servers, it took less than 10s for servers being exposed after they appeared as running, which is much faster than 30+s(the original cost).
`failureThreshold` is left as default(3) and it would not lead to restart because the status of readiness probe would only affect whether endpoints being exposed in service or not(in the dns service point of view). According to the implementation of [prober](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/prober/worker.go), the number of retries for readiness probe is unbounded. Hence there is no obvious effect if the readiness probe fail several times in the beginning.
The state machine of prober could be illustrated with below figure:
![drawing](https://cloud.githubusercontent.com/assets/8681801/18693503/fb4466dc-7f56-11e6-8671-0a14c4835d24.jpeg)
I want to see the e2e result of this PR for further evaluation.
@thockin @bprashanth
Automatic merge from submit-queue
Print a more helpful error message when failing to start rolling-updates
Hopefully this will help us track down where the 1.3 -> 1.4 upgrades are breaking down. We'll need to cherry-pick this into release-1.4 to have any effect, though.
Automatic merge from submit-queue
Split dns healthcheck into two different urls
Attempt to fix#30633.
<s>This new kube-dns pod template creates two exechealthz processes listen on two different ports for kubedns and dnsmasq correspondingly.
@thockin @girishkalele
Automatic merge from submit-queue
Alpha JWS Discovery API for locating an apiserver securely
This PR contains an early alpha prototype of the JWS discovery API outlined in proposal #30707.
CA certificate, API endpoints, and the token to be used to authenticate to this discovery API are currently passed in as secrets. If the caller provides a valid token ID, a JWS signed blob of ClusterInfo containing the API endpoints and the CA cert to use will be returned to the caller. This is used by the alpha kubeadm to allow seamless, very quick cluster setup with simple commands well suited for copy paste.
Current TODO list:
- [x] Allow the use of arbitrary strings as token ID/token, we're currently treating them as raw keys.
- [x] Integrate the building of the pod container, move to cluster/images/kube-discovery.
- [x] Build for: amd64, arm, arm64 and ppc64le. (just replace GOARCH=)
- [x] Rename to gcr.io/google_containers/kube-discovery-ARCH:1.0
- [x] Cleanup rogue files in discovery sub-dir.
- [x] Move pkg/discovery/ to cmd/discovery/app.
There is additional pending work to return a kubeconfig rather than ClusterInfo, however I believe this is slated for post-alpha.
Automatic merge from submit-queue
Reset core_patern on GCI
The default core_pattern pipes the core dumps to /sbin/crash_reporter
which is more restrictive in saving crash dumps. So for
now, set a generic core_pattern that users can work with.
@dchen1107 @aulanov can you please review?
cc/ @kubernetes/goog-image
Automatic merge from submit-queue
Update the containervm image to the latest one (container-v1-3-v20160…
Node e2e is running with old containervm image which only has docker 1.9.1. This pr fixed such issue.
Automatic merge from submit-queue
(GCI) Configure logrotate to rotate all .log files in /var/log.
Fixes logrotate configuration in GCI to rotate all "*.log" files in /var/log.
Fixes issue #33223.
Automatic merge from submit-queue
Setting the default image for GKE tests to Container_VM.
@vishh @spxtr @pwittrock
The purpose is to keep the current state of tests as is even if GKE changes the base image.
The default core_pattern pipes the core dumps to /sbin/crash_reporter
which is more restrictive in saving crash dumps. So for
now, set a generic core_pattern that users can work with.
Automatic merge from submit-queue
Bump up GCI version.
```release-note
Upgrading Container-VM base image for k8s on GCE. Brief changelog as follows:
- Fixed performance regression in veth device driver
- Docker and related binaries are statically linked
- Fixed the issue of systemd being oom-killable
```
Fixes#32596
This needs a cherrypick into v1.4 release branch because it is fixing v1.4 release blocking issues. This patch is easy and safe to rollback in case of emergencies.
@vishh can you please review?
Fixes#32596 and many other issues.
cc/ @kubernetes/goog-image FYI
Brief changelog compared to gci-dev-54-8743-3-0:
- Fixed performance regression in veth device driver
- Docker and related binaries are statically linked
- Fixed the issue of systemd being oom-killable
- Updated built-in kubelet version to 1.3.7
- add ethtool and ebtables binaries expected by kubelet
Fixes#32596
Automatic merge from submit-queue
Enable hostpath provisioner for vagrant environment
This flag is required to run e2e tests for certain features (petset), and for manual tests and debugging.
related: https://github.com/kubernetes/kubernetes/issues/32119
Automatic merge from submit-queue
Implemented KUBE_DELETE_NODES flag in kube-down.
Implemented KUBE_DELETE_NODES flag in kube-down script.
It prevents removal of nodes when shutting down a HA master replica.
Automatic merge from submit-queue
Use a patched golang version for building linux/arm
Fixes: #29904
Right now, linux/arm is broken because of an internal limitation in Go.
I've filed an issue for it here: https://github.com/golang/go/issues/17028
The affected binaries of this limitation are hyperkube and kube-apiserver, which are the largest binaries.
And when we now have a patched go 1.7.1 version for building "unsupported" but important architectures (ref: https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/multi-platform.md), we should also include the patch for ppc64le and start building ppc64le again.
As soon as @laboger has the patch I need up on Github, I'll include ppc64le to this PR and we'll merge it
TODO:
- [ ] ~~Update the PR with patches for ppc64le at the same time @luxas~~
- [x] Push the new kube-cross image @ixdy
- [x] Run a full `make release` before to verify nothing breaks @luxas + @ixdy
- [ ] Cherrypick into the 1.4 branch @luxas + (who?)
@lavalamp @smarterclayton @ixdy @rsc @davecheney @wojtek-t @jfrazelle @bradfitz @david-mcmahon @pwittrock
Tell systemd to keep trying to restart kubelet without limit. Without
this change at some stage systemd will stop trying to restart kubelet
and mark it failed.
These are the settings we're using elsewhere (e.g. Docker)
Automatic merge from submit-queue
Added --log-facility flag to enhance dnsmasq logging
Fix#31010.
Dnsmasq in kube-dns pod is logging in default setting, which is somehow hard to locate. Add --log-facility=- flag to redirect logs to std.
@girishkalele
Automatic merge from submit-queue
Add glusterfs-client in hyperkube image.
When we run kubernete in a docker container, the glusterfs volume doesn't work.
This PR add glusterfs-client package in hyperkube image to fix the bug.
It is required to run automated tests for certain features (petset),
and for manual tests and debugging.
Change-Id: I9203aab6d67c8ff0cc4574473e8d0af888fe1804
Automatic merge from submit-queue
etcd: data rollback tool of v3 -> v2
ref: https://github.com/kubernetes/features/issues/44
ref #20504
What?
This provides a rollback tool for some users to rollback etcd data from v3 to v2.
Automatic merge from submit-queue
Add flag to set CNI bin dir, and use it on gci nodes
**What this PR does / why we need it**:
When using `kube-up` on GCE, following #31023 which moved the workers from debian to gci, CNI just isn't working. The root cause is basically as discussed in #28563: one flag (`--network-plugin-dir`) means two different things, and the `configure-helper` script uses it for the wrong purpose.
This PR adds a new flag `--cni-bin-dir`, then uses it to configure CNI as desired.
As discussed at #28563, I have also added a flag `--cni-conf-dir` so users can be explicit
**Which issue this PR fixes** : fixes#28563
**Special notes for your reviewer**:
I left the old flag largely alone for backwards-compatibility, with the exception that I stop setting the default when CNI is in use. The value of `"/usr/libexec/kubernetes/kubelet-plugins/net/exec/"` is unlikely to be what is wanted there.
**Release note**:
```release-note
Added new kubelet flags `--cni-bin-dir` and `--cni-conf-dir` to specify where CNI files are located.
Fixed CNI configuration on GCI platform when using CNI.
```
Automatic merge from submit-queue
e2e/log-dump: Collect kernel log with journald
Related to #31928
The kern.log file does not exist on journald distros typically.
cc @vishh @Random-Liu
Automatic merge from submit-queue
Update container image version for downward api volume tests
Some tests were using 0.7, and some were using 0.6, so updating all to 0.7.
@kubernetes/rh-cluster-infra
Automatic merge from submit-queue
cluster/gce: Update master root disk size
As part of #29213, the hyperkube image will be deployed alongside
existing dependencies.
This ends up just running over the root disk size of 10 during
extraction.
cc @yifan-gu @aaronlevy
Automatic merge from submit-queue
Add detect-master to local provider to get e2e working
Make it possible to run some e2e tests using the local provider (./hack/local-up-cluster.sh)
This will now work for tests that don't need more than one node:
export KUBERNETES_PROVIDER=local
go run hack/e2e.go -v -test --check_node_count=false --check_version_skew=false --test_args="--ginkgo.focus=Cadvisor"
Note: without this commit, the port and ip address are wrong and require the --host option (which is inconsistent with the other providers).
Automatic merge from submit-queue
Teach create-kubeconfig() to deal with multi path KUBECONFIG
When KUBECONFIG is in the form "A:B:C" make sure each file is
created.
fixes#17778
Automatic merge from submit-queue
Use a Deployment for kube-dns
Attempt to fix#31554
Switching kube-dns from using Replication Controller to Deployment.
The outdated kube-dns YAML file in coreos and juju dir is also updated. Most of the specific memory limit in the files remain unchanged because it seems like people were modifying it explicitly(c8d82fc2a9). Only the memory limit for healthz is increased due to this pending investigation(#29688).
YAML files stay in *-rc.yaml format considering there are a lots of scripts in cluster and hack dirs are using this format. But it may be fine to changed them all.
@bprashanth @girishkalele
Automatic merge from submit-queue
Fix/centos docker download
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
**What this PR does / why we need it**: The CentOS cluster provider attempts to download docker from a location that 404's.
**Which issue this PR fixes**: addresses https://github.com/kubernetes/kubernetes/issues/27572#issuecomment-226690177
**Special notes for your reviewer**: I don't know how Kubernetes decides docker compatibility, but it was previously pulling `latest` so I chose the most recent release. Is there any mechanism for keeping things like this up to date?
What is the status of kubernetes rpm's? As far as I could tell there aren't any 1.3 rpm's published. Are those officially supported or a community project?
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
CentOS Cluster Provider: fix docker download location & use docker 1.12.0
```
Automatic merge from submit-queue
Fix etcd2 cross-build in the Makefile
fixes https://github.com/kubernetes/kubernetes/issues/32328
Make it possible to compile both etcd2 and etcd3 in the Makefile and compile attachlease for multiple arches as well.
@lavalamp The etcd build-from-source semantics changed between etcd2 and etcd3.
I updated it to etcd3 in my last PR, and didn't think we were gonna build etcd2 more.
However, I've now fixed it to build for both versions.
Thanks!
Automatic merge from submit-queue
Fix glbc name to match image version
Risk is low, we should get it into 1.4 to avoid confusion. Image is 0.7.1 (bumped in 1.3.6) so name and label should match.
Automatic merge from submit-queue
AWS: Change default networking for kube-up to kubenet
**What this PR does / why we need it**: Fixes AWS bring-up. Again.
There's a kubelet bug that prevents NETWORK_PROVIDER=none from working right now, and we should migrate AWS to `kubenet` anyways.
Working on reproing the `none` issue on GCE, then I'll file a bug on the main issue. But this fixes AWS, so quick tactical fix.
Automatic merge from submit-queue
Use etcd 2.3.7
This will switch to etcd 2.3.7 for release 1.4, to resolve issues rolling back from 1.4 to 1.3 (while preventing those same issues rolling back to 1.4.0 from a release including etcd 3.0.x).
Fixes#32253.
See #32253 (comment) for etcd roadmap.
Automatic merge from submit-queue
Fix 127.0.01 typo
**What this PR does / why we need it**:
Fixes a small typo, though typo seems inconsequential
**Release note**:
none
Automatic merge from submit-queue
Enable kubelet eviction whenever inodes free is < 5% on GCE
This is a pre-req for enabling inodes based evictions in GKE.
Automatic merge from submit-queue
Make image-puller work on GCI nodes.
Currently image-puller works only on debian nodes. This will make our test more flaky after we switch to the GCI by default. This PR ports the image-puller to the GCI-based Nodes.
cc @vishh @wonderfly @dchen1107
Automatic merge from submit-queue
rkt: Update kube-up rkt version to v1.14.0
cc @kubernetes/sig-rktnetes
This should have been included in #31286 (whoops).
This is a bugfix that I propose for v1.4 inclusion.
Automatic merge from submit-queue
move '(master)' to end of message for uniformity
**What this PR does / why we need it**: This is a small polish operation on the kubernetes charm wrt juju status output.
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
NONE
```
This changes the status output from:
```
kubernetes/0 active idle 3 172.27.24.54 8088/tcp
Kubernetes running.
kubernetes/1 active idle 4 172.27.24.55 6443/tcp
(master) Kubernetes services started
```
to this:
```
kubernetes/0 active idle 3 172.27.24.54 8088/tcp
Kubernetes running.
kubernetes/1 active idle 4 172.27.24.55 6443/tcp
Kubernetes services started (master)
```