In order to make NFSv3 work, mounter needs to start rpcbind daemon. This
change modify mounter's Dockerfile and mounter script to start the
rpcbind daemon if it is not running on the host.
After this change, need to make push the image and update the sha number in Changelog.
Automatic merge from submit-queue
Eviction Thresholds Update
Sets the defaults for the eviction-hard threshold for GCE based on what we were using during testing: "memory.available<250Mi,nodefs.available<10%,nodefs.inodesFree<5%".
Sets flags for e2e tests to use eviction-minimum-reclaim: "nodefs.available<5%,nodefs.inodesFree<5%"
this fixes#32537
- Adds command line flags --config-map, --config-map-ns.
- Fixes 36194 (https://github.com/kubernetes/kubernetes/issues/36194)
- Update kube-dns yamls
- Update bazel (hack/update-bazel.sh)
- Update known command line flags
- Temporarily reference new kube-dns image (this will be fixed with
a separate commit when the DNS image is created)
There is a concern that some GCE users may be running automation that
(a) turns up ephemeral clusters and (b) always uses the latest K8s
release. If any of these workloads fall outside the set supported on
GCI, cutting the release will break the automation. We are therefore
delaying this change until we have provided sufficient warning.
Automatic merge from submit-queue
Change master to advertise external IP in kubernetes service.
Change master to advertise external IP in kubernetes service.
In effect, in HA mode in case of multiple masters, IP of external load
balancer will be advertise in kubernetes service.
Automatic merge from submit-queue
Migrates addons from RCs to Deployments
Fixes#33698.
Below addons are being migrated:
- kube-dns
- GLBC default backend
- Dashboard UI
- Kibana
For the new deployments, the version suffixes are removed from their names. Version related labels are also removed because they are confusing and not needed any more with regard to how Deployment and the new Addon Manager works.
The `replica` field in `kube-dns` Deployment manifest is removed for the incoming DNS horizontal autoscaling feature #33239.
The `replica` field in `Dashboard` Deployment manifest is also removed because the rescheduler e2e test is manually scaling it.
Some resource limit related fields in `heapster-controller.yaml` are removed, as they will be set up by the `addon resizer` containers. Detailed reasons in #34513.
Three e2e tests are modified:
- `rescheduler.go`: Changed to resize Dashboard UI Deployment instead of ReplicationController.
- `addon_update.go`: Some namespace related changes in order to make it compatible with the new Addon Manager.
- `dns_autoscaling.go`: Changed to examine kube-dns Deployment instead of ReplicationController.
Both of above two tests passed on my own cluster. The upgrade process --- from old Addons with RCs to new Addons with Deployments --- was also tested and worked as expected.
The last commit upgrades Addon Manager to v6.0. It is still a work in process and currently waiting for #35220 to be finished. (The Addon Manager image in used comes from a non-official registry but it mostly works except some corner cases.)
@piosz @gmarek could you please review the heapster part and the rescheduler test?
@mikedanese @thockin
cc @kubernetes/sig-cluster-lifecycle
---
Notes:
- Kube-dns manifest still uses *-rc.yaml for the new Deployment. The stale file names are preserved here for receiving faster review. May send out PR to re-organize kube-dns's file names after this.
- Heapster Deployment's name remains in the old fashion(with `-v1.2.0` suffix) for avoiding describe this upgrade transition explicitly. In this way we don't need to attach fake apply labels to the old Deployments.
Automatic merge from submit-queue
Print osImage and kubeletVersion for nodes before and after GCE upgrade
This will print, e.g.:
```
== Pre-Upgrade Node OS and Kubelet Versions ==
name: "e2e-test-mtaufen-master", osImage: "Google Container-VM Image", kubeletVersion: "v1.4.5-beta.0.45+90d209221ec8dc-dirty"
name: "e2e-test-mtaufen-minion-group-jo79", osImage: "Debian GNU/Linux 7 (wheezy)", kubeletVersion: "v1.4.5-beta.0.45+90d209221ec8dc-dirty"
name: "e2e-test-mtaufen-minion-group-ox5l", osImage: "Debian GNU/Linux 7 (wheezy)", kubeletVersion: "v1.4.5-beta.0.45+90d209221ec8dc-dirty"
name: "e2e-test-mtaufen-minion-group-qvbq", osImage: "Debian GNU/Linux 7 (wheezy)", kubeletVersion: "v1.4.5-beta.0.45+90d209221ec8dc-dirty"
```
Let me know what output format you prefer and I'll see if I can make it work, I have the extent of flexibility allowed by jsonpath.
Change master to advertise external IP in kubernetes service.
In effect, in HA mode in case of multiple masters, IP of external load
balancer will be advertise in kubernetes service.
Automatic merge from submit-queue
kubelet bootstrap: start hostNetwork pods before we have PodCIDR
Network readiness was checked in the pod admission phase, but pods that
fail admission are not retried. Move the check to the pod start phase.
Issue #35409
Issue #35521
Automatic merge from submit-queue
Remove several red herring error messages in GCE cluster scripts
This fixes things like
```
I1018 15:57:53.524] Bringing down cluster
W1018 15:57:53.524] NODE_NAMES=
W1018 15:57:55.995] ERROR: (gcloud.compute.ssh) could not parse resource: []
W1018 15:57:56.392] ERROR: (gcloud.compute.ssh) could not parse resource: []
```
and
```
I1018 16:32:34.947] property "clusters.kubernetes-pr-cri-validation_cri-e2e-gce-agent-pr-25-0" unset.
I1018 16:32:35.079] property "users.kubernetes-pr-cri-validation_cri-e2e-gce-agent-pr-25-0" unset.
I1018 16:32:35.195] property "users.kubernetes-pr-cri-validation_cri-e2e-gce-agent-pr-25-0-basic-auth" unset.
I1018 16:32:35.307] property "contexts.kubernetes-pr-cri-validation_cri-e2e-gce-agent-pr-25-0" unset.
W1018 16:32:35.420] failed to get client config: Error in configuration: context was not found for specified context: kubernetes-pr-cri-validation_cri-e2e-gce-agent-pr-25-0
```
It seems like the `kubectl` behavior was introduced in #29236: if `current-context` is set to something invalid, it now complains.
Automatic merge from submit-queue
[Kubelet] Use the custom mounter script for Nfs and Glusterfs only
This patch reduces the scope for the containerized mounter to NFS and GlusterFS on GCE + GCI clusters
This patch also enabled the containerized mounter on GCI nodes
Shepherding multiple PRs through the submit queue is painful. Hence I combined them into this PR. Please review each commit individually.
cc @jingxu97 @saad-ali
https://github.com/kubernetes/kubernetes/pull/35652 has also been reverted as part of this PR
Automatic merge from submit-queue
On GCI, cleanup kubelet startup
-->
```release-note
* Avoid overriding system and kubelet cgroups on GCI
* Make the kubectl from k8s release the default on GCI
```
cc @kubernetes/sig-node @mtaufen
Automatic merge from submit-queue
Update rkt version on GCI nodes to v1.18.0
v1.18.0 avoids outputting debug information by default which happens to
pollute events and kubelet logs.
Automatic merge from submit-queue
Enable containerized storage plugins mounter on GCI
```release-note
On GCI, kubelet uses an external mounter script (typically a special container running in a chroot) to perform mount operations
```
Automatic merge from submit-queue
fixed problem with non masquerade cidr in kube-up gce/gci
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
**What this PR does / why we need it**:
fixed typo in script which made setting custom cidr in gce using kube-up impossible
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
fixed typo in script which made setting custom cidr in gce using kube-up impossible
```
Automatic merge from submit-queue
e2e node plumbing and bundling for GCI mounter
**Note:** The code in this PR only bundles the mounter and modifies `--mounter-path` if it can find `cluster/gce/gci/mounter` in the K8s source dir when building the test bundle.
This bundles the mounter script for GCI with the node e2e tests and allows the `--mounter-path` to be passed to the Kubelet via the node test framework. The node test runner will detect when we are running on a remote GCI node and add the appropriate `--mounter-path` to the `testArgs`.
It also includes a simple node test that mounts a tmpfs volume. This will exercise the Kubelet's mounter code path.
**ITEM OF NOTE:** To get the k8s root dir (in order to copy the mount script into the tarball), I changed `getK8sRootDir` -> `GetK8sRootDir` in `test/e2e_node/build/build.go`. Based on the comment above that function (and the fact that it was private to begin with), I'm not sure this is the best way to do things:
```
// TODO: Dedup / merge this with comparable utilities in e2e/util.go
```
On the other hand, the `e2e/util.go` file mentioned in that comment doesn't exist anymore. This should be resolved before this PR is merged.
Automatic merge from submit-queue
Update grafana in kubernetes to version 3.1.1
Fix#33775
```release-note
Update grafana version used by default in kubernetes to 3.1.1
```
@piosz
Automatic merge from submit-queue
Update elasticsearch and kibana usage
```release-note
Updated default Elasticsearch and Kibana used for elasticsearch logging destination to versions 2.4.1 and 4.6.1 respectively.
```
Updated controllers for elasticsearch and kibana to use newer versions of images. Fixed e2e test because of elasticsearch backward incompatible API changes.
Fixed out of sync elasticsearch controller for coreos.
@piosz
Automatic merge from submit-queue
Delete all firewall rules (and optionally network) on GCE/GKE cluster teardown
Not entirely ready for review yet; I want to see what Jenkins thinks of this.
Automatic merge from submit-queue
Add gcl cluster logging test
This PR changes default logging destination for tests to gcp and adds test for cluster logging using google cloud logging
Fix#20760
The original fix (#33147) sourced the correct `node-helper.sh` but set
`node_os_distribution` instead of `NODE_OS_DISTRIBUTION`. The
`set-node-image` function is imported indirectly via `source
"${KUBE_ROOT}/cluster/kube-util.sh"`, which in turn (in the GCE case)
sources `cluster/gce/util.sh`. Since the `set-node-image` function
relies on the `NODE_OS_DISTRIBUTION` variable, the original fix
did not have the entire intended effect.
I have confirmed that cherry-picking #33147 into the `release-1.4`
branch and layering this commit on top of it make for a successful
upgrade from a GCI based K8s 1.3 cluster to a GCI based K8s 1.4 cluster.
Automatic merge from submit-queue
Update GCI_VERSION to gci-dev-55-8866-0-0
Update GCI base image:
Change log:
* Built-in kubernetes updated to v1.4.0
* Enabled VXLAN and IP_SET config options in kernel to support some networking tools
* OpenSSL CVE fixes
```release-note
Update GCI base image:
* Enabled VXLAN and IP_SET config options in kernel to support some networking tools (ebtools)
* OpenSSL CVE fixes
```
cc/ @kubernetes/goog-image cc/ @dchen1107
Automatic merge from submit-queue
Nodefs becomes imagefs on GCI
Kubelet cannot identify rootfs correctly
For #33444
```release-note
Enforce Disk based pod eviction with GCI base image in Kubelet
```
Signed-off-by: Vishnu kannan <vishnuk@google.com>
Changelog:
* Built-in kubernetes updated to v1.4.0
* Enabled VXLAN and IP_SET config options in kernel to support some networking tools
* OpenSSL CVE fixes
Automatic merge from submit-queue
Bump up addon kube-dns to v20 for graceful termination
Below images are built and pushed:
- gcr.io/google_containers/kubedns-amd64:1.8
- gcr.io/google_containers/kubedns-arm:1.8
- gcr.io/google_containers/kubedns-arm64:1.8
- gcr.io/google_containers/kubedns-ppc64le:1.8
Both kubedns and dnsmasq are bumped up in the manifest files.
@thockin @bprashanth
Automatic merge from submit-queue
cluster/gci: Minor spacing tweak
Two shall be the number thou shalt indent, and the level of the indent
shall be two. Three shalt thou not indent, neither indent thou once,
excepting that thou then proceed to two. Five is right out.
/cc @andyzheng0831 @jlowdermilk
Two shall be the number thou shalt indent, and the level of the indent
shall be two. Three shalt thou not indent, neither indent thou once,
excepting that thou then proceed to two. Five is right out.
This bug was inadvertently introduced in #32406.
The longer term plan (shouldn't be too much longer) is to remove this
file entirely and rely on the `gci-trusty` version of it, but to stop
some bleeding and allow our jenkins using kube-up + coreos to work, we
should merge this fix until we have the more complete solution.
Automatic merge from submit-queue
Tune down initialDelaySeconds for readinessProbe.
Fixed#33053.
Tuned down the `initialDelaySeconds`(original 30s) for readiness probe to 3 seconds and `periodSeconds`(default 10s) to 5 seconds to shorten the initial time before a dns server pod being exposed. This configuration passed DNS e2e tests and did not even hit any readiness failure(for kube-dns) with a GCE cluster with 4 nodes during the experiments.
For scaling out kube-dns servers, it took less than 10s for servers being exposed after they appeared as running, which is much faster than 30+s(the original cost).
`failureThreshold` is left as default(3) and it would not lead to restart because the status of readiness probe would only affect whether endpoints being exposed in service or not(in the dns service point of view). According to the implementation of [prober](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/prober/worker.go), the number of retries for readiness probe is unbounded. Hence there is no obvious effect if the readiness probe fail several times in the beginning.
The state machine of prober could be illustrated with below figure:
![drawing](https://cloud.githubusercontent.com/assets/8681801/18693503/fb4466dc-7f56-11e6-8671-0a14c4835d24.jpeg)
I want to see the e2e result of this PR for further evaluation.
@thockin @bprashanth
Automatic merge from submit-queue
Print a more helpful error message when failing to start rolling-updates
Hopefully this will help us track down where the 1.3 -> 1.4 upgrades are breaking down. We'll need to cherry-pick this into release-1.4 to have any effect, though.
Automatic merge from submit-queue
Split dns healthcheck into two different urls
Attempt to fix#30633.
<s>This new kube-dns pod template creates two exechealthz processes listen on two different ports for kubedns and dnsmasq correspondingly.
@thockin @girishkalele
Automatic merge from submit-queue
Reset core_patern on GCI
The default core_pattern pipes the core dumps to /sbin/crash_reporter
which is more restrictive in saving crash dumps. So for
now, set a generic core_pattern that users can work with.
@dchen1107 @aulanov can you please review?
cc/ @kubernetes/goog-image
Automatic merge from submit-queue
Update the containervm image to the latest one (container-v1-3-v20160…
Node e2e is running with old containervm image which only has docker 1.9.1. This pr fixed such issue.
The default core_pattern pipes the core dumps to /sbin/crash_reporter
which is more restrictive in saving crash dumps. So for
now, set a generic core_pattern that users can work with.
Automatic merge from submit-queue
Bump up GCI version.
```release-note
Upgrading Container-VM base image for k8s on GCE. Brief changelog as follows:
- Fixed performance regression in veth device driver
- Docker and related binaries are statically linked
- Fixed the issue of systemd being oom-killable
```
Fixes#32596
This needs a cherrypick into v1.4 release branch because it is fixing v1.4 release blocking issues. This patch is easy and safe to rollback in case of emergencies.
@vishh can you please review?
Fixes#32596 and many other issues.
cc/ @kubernetes/goog-image FYI
Brief changelog compared to gci-dev-54-8743-3-0:
- Fixed performance regression in veth device driver
- Docker and related binaries are statically linked
- Fixed the issue of systemd being oom-killable
- Updated built-in kubelet version to 1.3.7
- add ethtool and ebtables binaries expected by kubelet
Fixes#32596
Automatic merge from submit-queue
Implemented KUBE_DELETE_NODES flag in kube-down.
Implemented KUBE_DELETE_NODES flag in kube-down script.
It prevents removal of nodes when shutting down a HA master replica.
Automatic merge from submit-queue
Added --log-facility flag to enhance dnsmasq logging
Fix#31010.
Dnsmasq in kube-dns pod is logging in default setting, which is somehow hard to locate. Add --log-facility=- flag to redirect logs to std.
@girishkalele
Automatic merge from submit-queue
Add flag to set CNI bin dir, and use it on gci nodes
**What this PR does / why we need it**:
When using `kube-up` on GCE, following #31023 which moved the workers from debian to gci, CNI just isn't working. The root cause is basically as discussed in #28563: one flag (`--network-plugin-dir`) means two different things, and the `configure-helper` script uses it for the wrong purpose.
This PR adds a new flag `--cni-bin-dir`, then uses it to configure CNI as desired.
As discussed at #28563, I have also added a flag `--cni-conf-dir` so users can be explicit
**Which issue this PR fixes** : fixes#28563
**Special notes for your reviewer**:
I left the old flag largely alone for backwards-compatibility, with the exception that I stop setting the default when CNI is in use. The value of `"/usr/libexec/kubernetes/kubelet-plugins/net/exec/"` is unlikely to be what is wanted there.
**Release note**:
```release-note
Added new kubelet flags `--cni-bin-dir` and `--cni-conf-dir` to specify where CNI files are located.
Fixed CNI configuration on GCI platform when using CNI.
```
Automatic merge from submit-queue
cluster/gce: Update master root disk size
As part of #29213, the hyperkube image will be deployed alongside
existing dependencies.
This ends up just running over the root disk size of 10 during
extraction.
cc @yifan-gu @aaronlevy
Automatic merge from submit-queue
Use a Deployment for kube-dns
Attempt to fix#31554
Switching kube-dns from using Replication Controller to Deployment.
The outdated kube-dns YAML file in coreos and juju dir is also updated. Most of the specific memory limit in the files remain unchanged because it seems like people were modifying it explicitly(c8d82fc2a9). Only the memory limit for healthz is increased due to this pending investigation(#29688).
YAML files stay in *-rc.yaml format considering there are a lots of scripts in cluster and hack dirs are using this format. But it may be fine to changed them all.
@bprashanth @girishkalele
Automatic merge from submit-queue
Enable kubelet eviction whenever inodes free is < 5% on GCE
This is a pre-req for enabling inodes based evictions in GKE.
Automatic merge from submit-queue
rkt: Update kube-up rkt version to v1.14.0
cc @kubernetes/sig-rktnetes
This should have been included in #31286 (whoops).
This is a bugfix that I propose for v1.4 inclusion.
As part of #29213, the hyperkube image will be deployed alongside
existing dependencies.
This ends up just running over the root disk size of 10 during
extraction.
Automatic merge from submit-queue
Enable Rescheduler by default
Rescheduler is stable - e2e test is passing constantly for >1week.
ref #29023
```release-note
Rescheduler which ensures that critical pods are always scheduled enabled by default in GCE.
```
Automatic merge from submit-queue
Pick a specific GCI version by default on GCE.
Prior to this change, a K8s branch (master as well as release) was
pinned to a GCI milestone. It would pick up the latest GCI release on
that milestone at the time of cluster creation. The rationale was the
K8s users would automatically get the bug fixes in newer versions of
GCI. However in practice, it makes the runtime environment
non-deterministic, and lack of continuous e2e tests mean we would run
into breakages sooner or later.
With this change, each K8s release will pick a specific version
of GCI by default (similar to how the Debian-based container-vm gets used).
Users can override the default version through KUBE_GCE_MASTER_IMAGE and
KUBE_GCE_NODE_IMAGE environment variables.
We expect the default GCI version will be updated relatively frequently stay
updated with newer GCI releases. We can also automate the process to
automatically bump the hard-coded GCI version in future.
@vishh @adityakali can you please review?
cc @kubernetes/goog-image FYI
Prior to this change, a K8s branch (master as well as release) was
pinned to a GCI milestone. It would pick up the latest GCI release on
that milestone at the time of cluster creation. The rationale was the
K8s users would automatically get the bug fixes in newer versions of
GCI. However in practice, it makes the runtime environment
non-deterministic, and lack of continuous e2e tests mean we would run
into breakages sooner or later.
With this change, each K8s release will pick a specific version
of GCI by default (similar to how the Debian-based container-vm gets used).
Users can override the default version through KUBE_GCE_MASTER_IMAGE and
KUBE_GCE_NODE_IMAGE environment variables.
We expect the default GCI version will be updated relatively frequently stay
updated with newer GCI releases. We can also automate the process to
automatically bump the hard-coded GCI version in future.
Automatic merge from submit-queue
keep docker0 with private cidr range
fixes: #31465
Keep docker0 when using kubenet on GCI. Assign 169.254.123.1/24 to docker0 to avoid cidr conflict.
Automatic merge from submit-queue
fix feature_gates salt plumbing
Fix salt plumbing for `--feature-gate` from `FEATURE_GATES kube env.
Was generating grains.conf and kube-env for master only. Verified it works now for gci and debian master/nodes.
cc @thockin @timstclair
Automatic merge from submit-queue
Build and push kube-dns for 1.4 release.
Fix#31355.
Following docker images had been uploaded:
gcr.io/google_containers/kubedns-amd64:1.7
gcr.io/google_containers/kubedns-arm:1.7
gcr.io/google_containers/kubedns-arm64:1.7
Build for ppc64le is disabled by default, and it failed to be built using:
`KUBE_BUILD_PPC64LE=y make release`
I'm still working on making the ppc64le build. Updates will be added following this thread.
@girishkalele @thockin
Automatic merge from submit-queue
gci: decouple from the built-in kubelet version
Prior to this change, configure.sh would:
(1) compare versions of built-in kubelet and downloaded kubelet, and
(2) bind-mount downloaded kubelet at /usr/bin/kubelet in case of
version mismatch
With this change, configure.sh:
(1) compares the two versions only on test clusters, and
(2) uses the actual file paths to start kubelet w/o any bind-mounting
To allow (2), this change also provides its own version of kubelet
systemd service file.
Effectively with this change we will always use the downloaded kubelet
binary along with its own systemd service file on non-test clusters. The
main advantage is this change does not rely on the kubelet being built in to
the OS image.
@dchen1107 @wonderfly can you please review
cc/ @kubernetes/goog-image FYI
Prior to this change, configure.sh would:
(1) compare versions of built-in kubelet and downloaded kubelet, and
(2) bind-mount downloaded kubelet at /usr/bin/kubelet in case of
version mismatch
With this change, configure.sh:
(1) compares the two versions only on test clusters, and
(2) uses the actual file paths to start kubelet w/o any bind-mounting
To allow (2), this change also provides its own version of kubelet
systemd service file.
Effectively with this change we will always use the downloaded kubelet
binary along with its own systemd service file on non-test clusters. The
main advantage is this change does not rely on the kubelet being built in to
the OS image.
Automatic merge from submit-queue
Add admission controller for default storage class.
The admission controller adds a default class to PVCs that do not require any
specific class. This way, users (=PVC authors) do not need to care about
storage classes, administrator can configure a default one and all these PVCs
that do not care about class will get the default one.
The marker of default class is annotation "volume.beta.kubernetes.io/storage-class", which must be set to "true" to work. All other values (or missing annotation) makes the class non-default.
Based on @thockin's code, added tests and made it not to reject a PVC when no class is marked as default.
.
@kubernetes/sig-storage
Automatic merge from submit-queue
Support for creation/removal of master replicas.
HA master: initial support for creation/removal of masters replicas by
kube-up/kube-down scripts for GCE on gci (other distributions, including debian, are not supported yet).
The admission controller adds a default class to PVCs that do not require any
specific class. This way, users (=PVC authors) do not need to care about
storage classes, administrator can configure a default one and all these PVCs
that do not care about class will get the default one.
Automatic merge from submit-queue
Use --regions instead of --region for gcloud list [resource]
gcloud has started complaining:
```
WARNING: Abbreviated flag [--region] will be disabled in release 132.0.0, use the full name [--regions].
WARNING: Abbreviated flag [--region] will be disabled in release 132.0.0, use the full name [--regions].
WARNING: Abbreviated flag [--region] will be disabled in release 132.0.0, use the full name [--regions].
```
We'll probably need to cherry-pick this, as otherwise the list-resources script will start failing at some point in the future.
Automatic merge from submit-queue
Update core etcd references to use 3.0.4
This updates the core references to use 3.0.4.
There are still legacy references in the code base that should be cleaned, or just removed but I'm reluctant to purge.
/cc @kubernetes/sig-scalability
Automatic merge from submit-queue
Avoid unnecessary copies on GCI initialization.
The issue I faced was that when starting a cluster I was getting:
```
Aug 12 11:12:46 e2e-test-wojtekt-master configure.sh[1079]: cp: error writing '/home/kubernetes/kubernetes-src.tar.gz': No space left on device
```
This PR reduces amount of space that is needed on startup, as well as this speeds up starting cluster.
@lavalamp @dchen1107
Automatic merge from submit-queue
Add support for kube-up.sh to deploy Calico network policy to GCI masters
Also remove requirement for calicoctl from Debian / salt installed nodes and clean it up a little by deploying calico-node with a manifest rather than calicoctl. This also makes it more reliable by retrying properly.
How to use:
```
make quick-release
NETWORK_POLICY_PROVIDER=calico cluster/kube-up.sh
```
One place where I was uncertain:
- CPU allocations (on the master particularly, where there's very little spare capacity). I took some from etcd, but if there's a better way to decide this, I'm happy to change it.
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.kubernetes.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.kubernetes.io/reviews/kubernetes/kubernetes/29037)
<!-- Reviewable:end -->
It can run tests against multiple existing images that match a regex.
GCI images will be using a regex.
Signed-off-by: Vishnu kannan <vishnuk@google.com>
Automatic merge from submit-queue
AWS/GCE: Rework use of master name
* Add a pillar for `hostname` (because even if there's a good Salt function for it, I don't trust it to return the short hostname)
* Move `INITIAL_ETCD_CLUSTER` to just the GCE turn-up
* Remove `master_name`, which isn't needed
* Add a pillar for hostname (because even if there's a good Salt
function for it, I don't trust it to return the short hostname)
* Move INITIAL_ETCD_CLUSTER to just the GCE turn-up
* Remove the master_name, which isn't needed as a pillar
Automatic merge from submit-queue
In cluster scripts correct gcloud list arg from '--zone' to '--zones'
I started getting these messages when doing `kube-up` and similar operations:
WARNING: Abbreviated flag [--zone] will be disabled in release 132.0.0, use the full name [--zones].
This PR corrects the flag where used.
Note there are many uses of `--zone` on commands like `gcloud instances describe` which are still correct - those commands do not accept multiple zones.
Automatic merge from submit-queue
[Garbage Collector] add e2e tests again
#27151 is reverted because gke didn't start correctly after it's merged (https://github.com/kubernetes/kubernetes/pull/27151#issuecomment-233030686).
The possible problem is the `unbound variable`, which is fixed in the second commit of this PR. However, I cannot verify if the PR will fail the gke suite since I don't have the environment to run that suite.
@wojtek-t @lavalamp
Automatic merge from submit-queue
kube-up: increase download timeout for kubernetes.tar.gz
Particularly on smaller instances on AWS, we were hitting the 80 second
timeout now that our image is well over the 1GB mark.
Increase the timeout from 80 seconds to 300 seconds.
Fix#29418
Particularly on smaller instances on AWS, we were hitting the 80 second
timeout now that our image is well over the 1GB mark.
Increase the timeout from 80 seconds to 300 seconds.
Fix#29418
Automatic merge from submit-queue
fix logrotate config (again)
we need to add the dateformat option so that the logrotate
can create unique logfiles for each rotation. Without this,
logrotation is skipped with message like (generated in
verbose mode of logrotate):
rotating log /var/log/rotate-test.log, log->rotateCount is 5
dateext suffix '-20160718'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
destination /var/log/rotate-test2.log-20160718.gz already exists, skipping rotation
Tested as follows:
# config in '/etc/logrotate.d/rotate-test':
/var/log/rotate-test.log {
rotate 5
copytruncate
missingok
notifempty
compress
maxsize 100M
daily
dateext
dateformat -%Y%m%d-%s
create 0644 root root
}
# create 150Mb of /var/log/rotate-test.log
$ dd if=/dev/zero of=/var/log/rotate-test.log bs=1048576 count=150 conv=notrunc oflag=append
# run logrotate
$ /usr/sbin/logrotate -v /etc/logrotate.conf
...
rotating pattern: /var/log/rotate-test.log after 1 days (5 rotations)
empty log files are not rotated, log files >= 104857600 are rotated earlier, old logs are removed
considering log /var/log/rotate-test.log
log needs rotating
rotating log /var/log/rotate-test.log, log->rotateCount is 5
Converted ' -%Y%m%d-%s' -> '-%Y%m%d-%s'
dateext suffix '-20160718-1468875268'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
copying /var/log/rotate-test.log to /var/log/rotate-test.log-20160718-1468875268
truncating /var/log/rotate-test.log
compressing log with: /bin/gzip
Repeating 'dd' and 'logrotate' commands now generate logfiles correctly.
#27754
@bprashanth can you please review?
we need to add the dateformat option so that the logrotate
can create unique logfiles for each rotation. Without this,
we logrotation is skipped with message like (generated in
verbose mode of logrotate):
rotating log /var/log/rotate-test.log, log->rotateCount is 5
dateext suffix '-20160718'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
destination /var/log/rotate-test2.log-20160718.gz already exists, skipping rotation
Tested as follows:
# config in '/etc/logrotate.d/rotate-test':
/var/log/rotate-test.log {
rotate 5
copytruncate
missingok
notifempty
compress
maxsize 100M
daily
dateext
dateformat -%Y%m%d-%s
create 0644 root root
}
# create 150Mb of /var/log/rotate-test.log
$ dd if=/dev/zero of=/var/log/rotate-test.log bs=1048576 count=150 conv=notrunc oflag=append
# run logrotate
$ /usr/sbin/logrotate -v /etc/logrotate.conf
...
rotating pattern: /var/log/rotate-test.log after 1 days (5 rotations)
empty log files are not rotated, log files >= 104857600 are rotated earlier, old logs are removed
considering log /var/log/rotate-test.log
log needs rotating
rotating log /var/log/rotate-test.log, log->rotateCount is 5
Converted ' -%Y%m%d-%s' -> '-%Y%m%d-%s'
dateext suffix '-20160718-1468875268'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
copying /var/log/rotate-test.log to /var/log/rotate-test.log-20160718-1468875268
truncating /var/log/rotate-test.log
compressing log with: /bin/gzip
Repeating 'dd' and 'logrotate' commands now generate logfiles correctly.
Automatic merge from submit-queue
[garbage collector] add e2e test
This PR also includes some changes to plumb controller-manager's `--enable_garbage_collector` from the environment variable.
The e2e test will not be run by the core suite because it's marked `[Feature:GarbageCollector]`.
The corresponding jenkins job configuration PR is https://github.com/kubernetes/test-infra/pull/132.
Automatic merge from submit-queue
Substitute federation_domain_map parameter with its value in node bootstrap scripts.
This PR also removes the substitution code we added to the build scripts.
**Release Note**
```release-note
If you use one of the kube-dns replication controller manifest in `cluster/saltbase/salt/kube-dns`, i.e. `cluster/saltbase/salt/kube-dns/{skydns-rc.yaml.base,skydns-rc.yaml.in}`, either substitute one of `__PILLAR__FEDERATIONS__DOMAIN__MAP__` or `{{ pillar['federations_domain_map'] }}` with the corresponding federation name to domain name value or remove them if you do not support cluster federation at this time. If you plan to substitute the parameter with its value, here is an example for `{{ pillar['federations_domain_map'] }`
pillar['federations_domain_map'] = "- --federations=myfederation=federation.test"
where `myfederation` is the name of the federation and `federation.test` is the domain name registered for the federation.
```
cc @erictune @kubernetes/sig-cluster-federation @MikeSpreitzer @luxas
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/.github/PULL_REQUEST_TEMPLATE.md?pixel)]()
Automatic merge from submit-queue
Add Calico as policy provider in GCE
Adds Calico as policy provider to GCE, enforcing the extensions/v1beta1 NetworkPolicy API.
Still to do:
- [x] Enable NetworkPolicy API when POLICY_PROVIDER is provided.
- [x] Fix CNI plugin, policy controller versions.
CC @thockin - does this general approach look good?
Automatic merge from submit-queue
federation: Updating KubeDNS to try finding a local service first for federation query
Ref https://github.com/kubernetes/kubernetes/issues/26762
Updating KubeDNS to try to find a local service first for federation query.
Without this change, KubeDNS always returns the DNS hostname, even if a local service exists.
Have updated the code to first remove federation name from path if it exists, so that the default search for local service happens. If we dont find a local service, then we try to find the DNS hostname.
Will appreciate a strong review since this is my first change to KubeDNS.
https://github.com/kubernetes/kubernetes/pull/25727 was the original PR that added federation support to KubeDNS.
cc @kubernetes/sig-cluster-federation @quinton-hoole @madhusudancs @bprashanth @mml
Automatic merge from submit-queue
Use new fluentd-gcp container with journal support
This makes use of the systemd-journal support added in PR #27981
and Fixes#27446.
cc/ @a-robinson @andyzheng0831
Following from #27830, this copies the source onto the instance and
displays the location of it prominently (keeping the download link for
anyone that just wants to curl it).
Example output (this tag doesn't exist yet):
---
Welcome to Kubernetes v1.4.0!
You can find documentation for Kubernetes at:
http://docs.kubernetes.io/
The source for this release can be found at:
/usr/local/share/doc/kubernetes/kubernetes-src.tar.gz
Or you can download it at:
https://storage.googleapis.com/kubernetes-release/release/v1.4.0/kubernetes-src.tar.gz
It is based on the Kubernetes source at:
https://github.com/kubernetes/kubernetes/tree/v1.4.0
For Kubernetes copyright and licensing information, see:
/usr/local/share/doc/kubernetes/LICENSES
---
Following from #27830, this copies the source onto the instance and
displays the location of it prominently (keeping the download link for
anyone that just wants to curl it).
Example output (this tag doesn't exist yet):
---
Welcome to Kubernetes v1.4.0!
You can find documentation for Kubernetes at:
http://docs.kubernetes.io/
The source for this release can be found at:
/usr/local/share/doc/kubernetes/kubernetes-src.tar.gz
Or you can download it at:
https://storage.googleapis.com/kubernetes-release/release/v1.4.0/kubernetes-src.tar.gz
It is based on the Kubernetes source at:
https://github.com/kubernetes/kubernetes/tree/v1.4.0
For Kubernetes copyright and licensing information, see:
/usr/local/share/doc/kubernetes/LICENSES
---
Automatic merge from submit-queue
GCE provider: Limit Filter calls to regexps rather than insane blobs
Filters can't exceed 4k, and GET requests against the GCE API are also limited, so these break down in different ways at different cluster counts. Fix it by introducing an advisory `node-instance-prefix` configuration in the GCE provider that can hint the `EnsureLoadBalancer`/`UpdateLoadBalancer code` (and the firewall creation/update code). If it's not there, or wrong (a hostname that's registered violates it), just ignore it and grab the whole project.
Fixes#27731
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/.github/PULL_REQUEST_TEMPLATE.md?pixel)]()
Filters can't exceed 4k, and GET requests against the GCE API are also
limited, so these break down in different ways at different cluster
counts. Fix it by introducing an advisory node-instance-prefix
configuration in the GCE provider that can hint the
EnsureLoadBalancer/UpdateLoadBalancer code (and the firewall
creation/update code). If it's not there, or wrong (a hostname that's
registered violates it), just ignore it and grab the whole project.
Automatic merge from submit-queue
federation: Creating kubeconfig files to be used for creating secrets for clusters on aws and gke
Extension of https://github.com/kubernetes/kubernetes/pull/26914 which created the kubeconfig files for gce clusters.
This PR extends it to AWS, vagrant and GKE.
The change for AWS and vagrant is exactly same as GCE.
For GKE, since `gcloud create clusters` creates kubeconfig, we are just copying the generated kubeconfig to the desired location
cc @kubernetes/sig-cluster-federation @colhom
@roberthbailey for GKE
Automatic merge from submit-queue
rkt: Map kubelet's `--stage1-image` flag to rkt's `--stage1-name` flag.
This enables rkt to use cached stage1 image instead of unpacking the stage1 image every time for every pod.
After this change, users need to preload the stage1 images in order to enable rkt to find the stage1 image with the name specified by this flag.
Also, the cloud config is modified to pre-load the stage1 images.
cc @kubernetes/sig-rktnetes @kubernetes/sig-node
Automatic merge from submit-queue
add logrotate service and configuration for GCI
This change mirrors the configuration in cluster/saltbase/salt/logrotate for GCI.
On GCI we use systemd timers (https://www.freedesktop.org/software/systemd/man/systemd.timer.html) and install an hourly timer - kube-logrotate.timer. This will invoke kube-logrotate.service (which calls /usr/sbin/logrotate) once every hour to perform log rotation as per the rotation rules installed under /etc/logrotate.d/.
@kubernetes/goog-image @zmerlynn @dchen1107 @andyzheng0831
This enables rkt to use cached stage1 image instead of unpacking the
stage1 image every time for every pod.
After this change, users need to preload the stage1 images in order to
enable rkt to find the stage1 image with the name specified by this flag.
Automatic merge from submit-queue
make GCI image detection robust
This change makes sure that in case we roll back a released GCI image, the image detection logic picks a correct active image.
@kubernetes/goog-image @Amey-D @wonderfly @dchen1107
Automatic merge from submit-queue
Prep for continuous Docker validation test
```release-note
Add a test config variable to specify desired Docker version to run on GCI.
```
We want to continuously validate Docker releases (#25215), on GCI. This change
adds a new test config variable, `KUBE_GCI_DOCKER_VERSION`, through which we can
specify which version of Docker we want to run on the master and nodes. This
change also patches the Jenkins e2e-runner with the ability to fetch the latest
Docker (pre)release, and sets the aforementioned variable accordingly.
Tested on my local Jenkins instance that was able to start a cluster with the latest Docker version (different from installed version) running on both master and nodes.
@dchen1107 Can you review?
cc/ @andyzheng0831 for changes in `cluster/gce/gci/helper.sh`, and @ixdy @spxtr for changes to the Jenkins e2e-runner
cc/ @kubernetes/goog-image
Automatic merge from submit-queue
Revert "Revert "GCI: add support for network plugin""
PR #27027 added the network plugin support in GCI config, but later a bug in the network plugin broke e2e tests (see issue #27118). The bug was fixed by #27141 and we have been repeatedly run the serial e2e tests more than 10 times to verify the fix. Now it should be safe to put the GCI network plugin support back.
We will first merge in the master branch and monitor the Jenkins serial tests for a while and then cherry-pick it into release-1.3 branch.
Automatic merge from submit-queue
version bump for gci to milestone 53
Fixes#26455
GCI release 53 includes kubernetes v1.3.0-alpha.5 with docker-1.11.2.
@dchen1107 @kubernetes/goog-image @andyzheng0831
Automatic merge from submit-queue
support for mounting local-ssds on GCI
This change adds support for mounting local ssds on GCI.
It updates the previous container-vm behavior as well to
match that for GCI nodes by mounting the local-ssds under
the same path (/mnt/disks/ssdN).
@vulpecula @roberthbailey @andyzheng0831 @kubernetes/goog-image
Automatic merge from submit-queue
Trusty: fix the 'ping' issue and fluentd-gcp issue #26379
This PR is mainly for being picking up the fix in #27016 and #27102 in trusty code, so that we can fix the issues in the release-1.2 branch for GCI. It contains two parts:
(1) Adding iptables rules to accept ICMP traffic, otherwise 'ping' from a pod does not work;
(2) Revising the code for cleaning up docker0 stuff including the bridge and iptables rules. I slightly refactor the code of starting kubelet and removing docker0 stuff before starting kubelet. The old code did it after starting kubelet but before restarting docker. I think doing it before starting kubelet is safter.
cc/ @roberthbailey @fabioy @dchen1107 @a-robinson @kubernetes/goog-image
Automatic merge from submit-queue
cluster/gce/coreos: Update heapster apiVersion
This fixes an inadvertant search-replace error in #26617.
The error was missed then because the search-replace issue wasn't
present in the standalone controllers, but was in all the others.
I verified that with this change heapster comes up under the default influxdb monitoring and without this change addon manager spits out validation failure errors for the heapster yaml.
cc @yifan-gu
This change adds support for mounting local ssds on GCI.
It updates the previous container-vm behavior as well to
match that for GCI nodes by mounting the local-ssds under
the same path (/mnt/disks/ssdN).
This fixes an inadvertant search-replace error in #26617.
The error was missed then because the search-replace issue wasn't
present in the standalone controllers, but was in all the others.
Automatic merge from submit-queue
GCI: fix the issue #26379
This PR deletes docker0 explicitly to fix the issue. In some cases, coexistence of docker0 and cbr0 make troubles in GCI-based cluster instances.
I verified it in GKE. With the fix, fluentd-gcp pod shows no error. "curl google.com" can work inside a pod. Mark it as P0 to match the issue priority.
@a-robinson @roberthbailey @freehan @kubernetes/goog-image
Automatic merge from submit-queue
Enable support for memory eviction configuration via salt
Added evictions based on memory by default whenever the available memory is < 100Mi.
Updated GCE and GCI.
Automatic merge from submit-queue
Bump cluster autoscaler version and enable scale down by default
Follow up of https://github.com/kubernetes/contrib/pull/1148.
cc: @piosz @fgrzadkowski @jszczepkowski
Automatic merge from submit-queue
Re-enable node problem detector by default
Re-enable node problem detector started in gce cluster by default.
For now, in the master node, the node problem detector will be started and do nothing (see https://github.com/kubernetes/node-problem-detector/pull/13).
But in fact, in my test cluster, the master has no extra cpu to run the node problem detector, so node problem detector is started on all nodes except master, which is what we want but not expected...
@dchen1107
/cc @kubernetes/sig-node
/cc @andyzheng0831 for the gci script change.
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/.github/PULL_REQUEST_TEMPLATE.md?pixel)]()
Automatic merge from submit-queue
Don't run fluentd-es on GCI masters
It isn't run on containervm masters. It can't do anything on the master because the master doesn't have kube-proxy running to enable fluentd to talk to the elasticsearch service.
@andyzheng0831
We want to continuously validate Docker releases (#25215), on GCI. This change
adds a new test config variable, `KUBE_GCI_DOCKER_VERSION`, through which we can
specify which version of Docker we want to run on the master and nodes. This
change also patches the Jenkins e2e-runner with the ability to fetch the latest
Docker (pre)release, and sets the aforementioned variable accordingly.
Automatic merge from submit-queue
GCI/Trusty: support the Docker registry mirror
@roberthbailey @zmerlynn please review it.
cc/ @fabioy @dchen1107 @kubernetes/goog-image FYI.
cc/ @ojarjur it is very straightforward to add support for GCI, which is pretty much like the change to ContainerVM's configure-vm.sh in your original PR #25841.
Automatic merge from submit-queue
GCI: correct the fix in #26363
This PR is mainly for correcting the fix to 'find' command in #26363. I added "-maxdepth 1" in an earlier change, and #26363 tried to fix it by changing the search path. This is potentially incorrect, when yaml files are in more than one layer deep. The real fix should be removing the "-maxdepth 1" flag from 'find' command. This PR also updates two minor places in the file configure-helper.sh introduced by two previous PR #26413 and #26048.
@roberthbailey @wonderfly
cc/ @dchen1107 @fabioy @kubernetes/goog-image
Automatic merge from submit-queue
pin GCI version to milestone 52
This is mainly for pinning the 1.2 branch to GCI milestone 52
which contains correct docker and kubelet built in.
Doing this allows us to upgrade docker to v1.11 (issue #26455)
in GCI 53 without breaking the 1.2 release branch.
@kubernetes/goog-image @dchen1107 @roberthbailey @andyzheng0831
This is mainly for pinning the 1.2 branch to GCI milestone 52
which contains correct docker and kubelet built in.
Doing this allows us to upgrade docker to v1.11 (issue #26455)
in GCI 53 without breaking the 1.2 release branch.
Automatic merge from submit-queue
Move the defaults setting of GCI to util.sh
fixes#26291
This change recovers some of the side effects of
https://github.com/kubernetes/kubernetes/pull/26197, i.e., keeps the defaults of
`NODE_IMAGE` and `NODE_IMAGE_PROJECT` to `MASTER_IMAGE` and
`MASTER_IMAGE_PROJECT`, for backward compatibility. Although it keeps
`OS_DISTRIBUTION` defaulting to `gci`, the default settings of these vars are
moved to `cluster/gce/util.sh` and conditioned on `OS_DISTRIBUTION==gci`.
@euank @roberthbailey Can you review?
Automatic merge from submit-queue
cluster/coreos: Update heapster addon to beta2
fixes#26616
As noted there, heapster was updated but not for gce/coreos which breaks anything that depends on heapster's new metrics API (i.e. autoscaling)
This change recovers some of the side effects of
https://github.com/kubernetes/kubernetes/pull/26197, i.e., keeps the defaults of
`NODE_IMAGE` and `NODE_IMAGE_PROJECT` to `MASTER_IMAGE` and
`MASTER_IMAGE_PROJECT`, for backward compatibility. Although it keeps
`OS_DISTRIBUTION` defaulting to `gci`, the default settings of these vars are
moved to `cluster/gce/util.sh` and conditioned on `OS_DISTRIBUTION==gci`.
Automatic merge from submit-queue
Support for cluster autoscaler in GCE Trusty and GCI images
Fixes: #26346
Ref: #26197
cc: @fgrzadkowski @vulpecula @piosz @jszczepkowski
Automatic merge from submit-queue
Prepull images in e2e
Quick and dirty image puller because the SQ stalled multiple times just *today* on image pull flake (https://github.com/kubernetes/kubernetes/issues/25277).
@kubernetes/sig-node @kubernetes/sig-testing wdyt?
Automatic merge from submit-queue
Make node-instance-group base names unique to prevent collisions
We create multiple IGMs for >1000 Node clusters. When we have a conflict on base name IGMs will fight over ownership of the VM that happen to have the name belonging to multiple IGMs.
This change will increase reliability of starting big clusters.
cc @wojtek-t @alex-mohr @roberthbailey @mikedanese