Automatic merge from submit-queue
Allow the /logs handler on the apiserver to be toggled.
Adds a flag to kube-apiserver, and plumbs through en environment variable in configure-helper.sh
Automatic merge from submit-queue
Enable "kick the tires" support for Nvidia GPUs in COS
This PR provides an installation daemonset that will install Nvidia CUDA drivers on Google Container Optimized OS (COS).
User space libraries and debug utilities from the Nvidia driver installation are made available on the host in a special directory on the host -
* `/home/kubernetes/bin/nvidia/lib` for libraries
* `/home/kubernetes/bin/nvidia/bin` for debug utilities
Containers that run CUDA applications on COS are expected to consume the libraries and debug utilities (if necessary) from the host directories using `HostPath` volumes.
Note: This solution requires updating Pod Spec across distros. This is a known issue and will be addressed in the future. Until then CUDA workloads will not be portable.
This PR updates the COS base image version to m59. This is coupled with this PR for the following reasons:
1. Driver installation requires disabling a kernel feature in COS.
2. The kernel API for disabling this interface changed across COS versions
3. If the COS image update is not handled in this PR, then a subsequent COS image update will break GPU integration and will require an update to the installation scripts in this PR.
4. Instead of having to post `3` PRs, one each for adding the basic installer, updating COS to m59, and then updating the installer again, this PR combines all the changes to reduce review overhead and latency, and additional noise that will be created when GPU tests break.
**Try out this PR**
1. Get Quota for GPUs in any region
2. `export `KUBE_GCE_ZONE=<zone-with-gpus>` KUBE_NODE_OS_DISTRIBUTION=gci`
3. `NODE_ACCELERATORS="type=nvidia-tesla-k80,count=1" cluster/kube-up.sh`
4. `kubectl create -f cluster/gce/gci/nvidia-gpus/cos-installer-daemonset.yaml`
5. Run your CUDA app in a pod.
**Another option is to run a e2e manually to try out this PR**
1. Get Quota for GPUs in any region
2. export `KUBE_GCE_ZONE=<zone-with-gpus>` KUBE_NODE_OS_DISTRIBUTION=gci
3. `NODE_ACCELERATORS="type=nvidia-tesla-k80,count=1"`
4. `go run hack/e2e.go -- --up`
5. `hack/ginkgo-e2e.sh --ginkgo.focus="\[Feature:GPU\]"`
The e2e will install the drivers automatically using the daemonset and then run test workloads to validate driver integration.
TODO:
- [x] Update COS image version to m59 release.
- [x] Remove sleep from the install script and add it to the daemonset
- [x] Add an e2e that will run the daemonset and run a sample CUDA app on COS clusters.
- [x] Setup a test project with necessary quota to run GPU tests against HEAD to start with https://github.com/kubernetes/test-infra/pull/2759
- [x] Update node e2e serial configs to install nvidia drivers on COS by default
Automatic merge from submit-queue (batch tested with PRs 46201, 45952, 45427, 46247, 46062)
remove the elasticsearch template
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```
NONE
```
Loading file-based index template has been disabled since 2.0.0-beta1 version of Elasticsearch. https://www.elastic.co/guide/en/elasticsearch/reference/2.0/breaking_20_index_api_changes.html#_file_based_index_templates
So the `template-k8s-logstash.json` is not longer useful.
On the other hand, as https://github.com/kubernetes/kubernetes/issues/25127 indicated, we might better curl the elasticsearch API to load this template.
Automatic merge from submit-queue
Add version for fluentd-gcp config
Fluentd-gcp config should be versioned, because otherwise during the update race can happen and the new pod can mount the old config
Packaged the script as a docker container stored in gcr.io/google-containers
A daemonset deployment is included to make it easy to consume the installer
A cluster e2e has been added to test the installation daemonset along with verifying installation
by using a sample CUDA application.
Node e2e for GPUs updated to avoid running on nodes without GPU devices.
Signed-off-by: Vishnu kannan <vishnuk@google.com>
Automatic merge from submit-queue
Update Calico add-on
**What this PR does / why we need it:**
Updates Calico to the latest version using self-hosted install as a DaemonSet, removes Calico's dependency on etcd.
- [x] Remove [last bits of Calico salt](175fe62720/cluster/saltbase/salt/calico/master.sls (L3))
- [x] Failing on the master since no kube-proxy to access API.
- [x] Fix outgoing NAT
- [x] Tweak to work on both debian / GCI (not just GCI)
- [x] Add the portmap plugin for host port support
Maybe:
- [ ] Add integration test
**Which issue this PR fixes:**
https://github.com/kubernetes/kubernetes/issues/32625
**Try it out**
Clone the PR, then:
```
make quick-release
export NETWORK_POLICY_PROVIDER=calico
export NODE_OS_DISTRIBUTION=gci
export MASTER_SIZE=n1-standard-4
./cluster/kube-up.sh
```
**Release note:**
```release-note
The Calico version included in kube-up for GCE has been updated to v2.2.
```
Automatic merge from submit-queue (batch tested with PRs 44606, 46038)
Add ip-masq-agent addon to the addons folder.
This also ensures that under gce we add this DaemonSet if the non-masq-cidr
is set to 0/0.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
Add ip-masq-agent addon to the addons folder which is used in GCE if --non-masquerade-cidr is set to 0/0
```
Automatic merge from submit-queue
Make real proxier in hollow-proxy optional (default=true)
Ref https://github.com/kubernetes/kubernetes/pull/45622
This allows using real proxier for hollow proxy, but we use the fake one by default.
cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek
Automatic merge from submit-queue
Update cluster startup scripts to use gcloud beta for alias IP support
The feature has gone from alpha to beta.
```release-note
NONE
```
Automatic merge from submit-queue
Update dashboard-controller.yaml
**What this PR does / why we need it**: Updates Dashboard addon to newest version. Changelog can be found at https://github.com/kubernetes/dashboard/releases/tag/v1.6.1.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
Update Dashboard version to 1.6.1
```
Automatic merge from submit-queue
fix: required openstack heat version for conditions is 2016-10-14 / newton
This fix sets the required heat version to 2016-10-14.
In OpenStack heat the conditions statement was introduced in version 2016-10-14 | newton, accourding to:
https://docs.openstack.org/releasenotes/heat/newton.html
and more specific:
https://docs.openstack.org/developer/heat/template_guide/hot_spec.html
The conditions are used to make the assignment of public ips / floating ips optional (added in commit 4eef540876). However this template is not compatible with OpenStack heat releases prior newton and produces the following error:
```
ERROR: Failed to validate: : resources.kube_minions: : "condition" is not a valid keyword inside a output definition
```
PR without a release note:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 45860, 45119, 44525, 45625, 44403)
Support running StatefulSetBasic e2e tests with local-up-cluster
**What this PR does / why we need it**:
Currently StatefulSet(s) fail when you use local-up-cluster without
setting a cloud provider. In this PR, we use set the
kubernetes.io/host-path provisioner as the default provisioner when
there CLOUD_PROVIDER is not specified. This enables e2e test(s)
(specifically StatefulSetBasic) to work.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 44337, 45775, 45832, 45574, 45758)
Fix lint failures on kubernetes-e2e charm
**What this PR does / why we need it**:
This fixes a test failure on the kubernetes-e2e charm relating to tox and flake8:
```DEBUG🏃/bin/sh: 1: flake8: not found```
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
This is a follow-up to https://github.com/kubernetes/kubernetes/pull/45494 where the same thing was done for kubernetes-master.
**Release note**:
```release-note
Fix lint failures on kubernetes-e2e charm
```
Automatic merge from submit-queue (batch tested with PRs 44337, 45775, 45832, 45574, 45758)
Refactor gcr.io/google_containers/elasticsearch to alpine
**What this PR does / why we need it**:
This reduces the image size of the gcr.io/google_containers/elasticsearch image.
Before:
```
REPOSITORY TAG IMAGE ID CREATED SIZE
gcr.io/google_containers/elasticsearch v2.4.1-2 6941e43df81a 4 weeks ago 419MB
```
After:
```
REPOSITORY TAG IMAGE ID CREATED SIZE
gcr.io/google_containers/elasticsearch v2.4.1-2 24ad40c21a52 About an hour ago 178MB
```
**Special notes for your reviewer**:
I used a workaround to make the elasticsearch_logging_discovery binary work with alpine. (See [stackoverflow](https://stackoverflow.com/questions/34729748/installed-go-binary-not-found-in-path-on-alpine-linux-docker/35613430#35613430)). Alternatively this can be solved by setting ```CGO_ENABLED=0```when compiling the binary. I didn't feel comfortable chaing the Makefile though, since I'm no golang expert. Feedback wanted!
Automatic merge from submit-queue (batch tested with PRs 45070, 45821, 45732, 45494, 45789)
Fix lint errors in juju kubernetes master and e2e charms
**What this PR does / why we need it**: Fixes style error in the Juju charms
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```
Code style fixes in Juju charms
```
Automatic merge from submit-queue (batch tested with PRs 45653, 45719, 45729, 45730, 44250)
Remove kubemark.sh as we don't use pod IP from it anymore
This has been pending for sometime now. We no longer seem to actually depend on the downwarp api for the pod IP (hollow-proxy for example now gets it using api call).
cc @wojtek-t @gmarek
Automatic merge from submit-queue
Remove unused file cluster/images/kubemark/build-kubemark.sh
It's irrelevant and we don't seem to use/need it anymore.
cc @wojtek-t @gmarek
Automatic merge from submit-queue (batch tested with PRs 45691, 45667, 45698, 45715)
Add general NoExecute Toleration to fluentd in gcp configuration
Ref #44445
Once merged I'll create a cherry-pick that will be picked up in GKE together with the next patch release.
cc @JorritSalverda @davidopp @aveshagarwal @nimeshksingh @piosz
```release-note
fluentd will tolerate all NoExecute Taints when run in gcp configuration.
```
Automatic merge from submit-queue
Update kube-dns version to 1.14.2
```release-note
Updates kube-dns to 1.14.2
- Support kube-master-url flag without kubeconfig
- Fix concurrent R/Ws in dns.go
- Fix confusing logging when initialize server
- Fix printf in cmd/kube-dns/app/server.go
- Fix version on startup and --version flag
- Support specifying port number for nameserver in stubDomains
```
Changes:
- Support kube-master-url flag without kubeconfig
- Fix concurrent R/Ws in dns.go
- Fix confusing logging when initialize server
- Fix printf in cmd/kube-dns/app/server.go
- Fix version on startup and --version flag
- Support specifying port number for nameserver in stubDomains
Automatic merge from submit-queue (batch tested with PRs 45569, 45602, 45604, 45478, 45550)
Don't append :443 to registry domain in the kubernetes-worker layer registry action
**What this PR does / why we need it**: Fixes#45547
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#45547
**Special notes for your reviewer**:
**Release note**:
```
Fix#45547 - don't append :443 to juju created docker registry config
```
Automatic merge from submit-queue (batch tested with PRs 45569, 45602, 45604, 45478, 45550)
Enable kernel memcg notification for node and cluster GCI/COS testing.
Sets --experimental-kernel-memcg-notification=true when running on the GCI/COS image. It sets this for master and nodes for cluster e2e tests, and for the node in node e2e tests.
Issue #42676
cc @dchen1107 @Random-Liu
IP aliases are an alpha feature, and node accelerators are a beta
feature. $gcloud determines which is appropriate.
Before, this would try to run "gcloud alpha beta", which is incoherent.