History

Kubernetes Submit Queue 1e2105808b Merge pull request #45136 from vishh/cos-nvidia-driver-install Automatic merge from submit-queue Enable "kick the tires" support for Nvidia GPUs in COS This PR provides an installation daemonset that will install Nvidia CUDA drivers on Google Container Optimized OS (COS). User space libraries and debug utilities from the Nvidia driver installation are made available on the host in a special directory on the host - * `/home/kubernetes/bin/nvidia/lib` for libraries * `/home/kubernetes/bin/nvidia/bin` for debug utilities Containers that run CUDA applications on COS are expected to consume the libraries and debug utilities (if necessary) from the host directories using `HostPath` volumes. Note: This solution requires updating Pod Spec across distros. This is a known issue and will be addressed in the future. Until then CUDA workloads will not be portable. This PR updates the COS base image version to m59. This is coupled with this PR for the following reasons: 1. Driver installation requires disabling a kernel feature in COS. 2. The kernel API for disabling this interface changed across COS versions 3. If the COS image update is not handled in this PR, then a subsequent COS image update will break GPU integration and will require an update to the installation scripts in this PR. 4. Instead of having to post `3` PRs, one each for adding the basic installer, updating COS to m59, and then updating the installer again, this PR combines all the changes to reduce review overhead and latency, and additional noise that will be created when GPU tests break. Try out this PR 1. Get Quota for GPUs in any region 2. `export `KUBE_GCE_ZONE=<zone-with-gpus>` KUBE_NODE_OS_DISTRIBUTION=gci` 3. `NODE_ACCELERATORS="type=nvidia-tesla-k80,count=1" cluster/kube-up.sh` 4. `kubectl create -f cluster/gce/gci/nvidia-gpus/cos-installer-daemonset.yaml` 5. Run your CUDA app in a pod. Another option is to run a e2e manually to try out this PR 1. Get Quota for GPUs in any region 2. export `KUBE_GCE_ZONE=<zone-with-gpus>` KUBE_NODE_OS_DISTRIBUTION=gci 3. `NODE_ACCELERATORS="type=nvidia-tesla-k80,count=1"` 4. `go run hack/e2e.go -- --up` 5. `hack/ginkgo-e2e.sh --ginkgo.focus="\[Feature:GPU\]"` The e2e will install the drivers automatically using the daemonset and then run test workloads to validate driver integration. TODO: - [x] Update COS image version to m59 release. - [x] Remove sleep from the install script and add it to the daemonset - [x] Add an e2e that will run the daemonset and run a sample CUDA app on COS clusters. - [x] Setup a test project with necessary quota to run GPU tests against HEAD to start with https://github.com/kubernetes/test-infra/pull/2759 - [x] Update node e2e serial configs to install nvidia drivers on COS by default		2017-05-23 10:46:10 -07:00
..
addons	Merge pull request #45952 from harryge00/update-es-image	2017-05-22 20:58:01 -07:00
aws	hack/cluster: consolidate cluster/ utils to hack/lib/util.sh	2017-03-30 22:34:46 -05:00
centos	Centos provider: generate SSL certificates for etcd cluster.	2017-03-24 09:15:57 +08:00
gce	update default project to cos-cloud in gce configs	2017-05-20 21:21:23 -07:00
gke	Merge pull request #44590 from ihmccreery/rotate-username	2017-05-05 14:08:08 -07:00
images	Merge pull request #45730 from shyamjvs/remove-kubemark-sh	2017-05-12 12:12:48 -07:00
juju	Fix lint failures on kubernetes-e2e charm	2017-05-15 13:22:55 -05:00
kubemark	Update COS version to m59	2017-05-20 21:17:19 -07:00
lib	hack/cluster: consolidate cluster/ utils to hack/lib/util.sh	2017-03-30 22:34:46 -05:00
libvirt-coreos	remove --api-version	2017-05-19 10:56:35 +08:00
local	Merge pull request #28469 from asalkeld/local-e2e	2016-09-11 05:44:47 -07:00
openstack-heat	fix: required openstack heat version for conditions is 2016-10-14 / newton	2017-05-13 17:12:45 +00:00
photon-controller	Merge pull request #42748 from dcbw/cfssl-localup	2017-04-10 14:27:11 -07:00
saltbase	Merge pull request #38169 from caseydavenport/calico-daemonset	2017-05-19 19:38:59 -07:00
skeleton	…
vagrant	remove --api-version	2017-05-19 10:56:35 +08:00
vsphere	Update generated for 2017	2017-01-01 23:11:09 -08:00
windows	Fixed the issue with log rotation	2016-12-12 11:08:41 -05:00
BUILD	Replace git_repository with http_archive and use ixdy's fork of bazel tools for pkg_tar	2017-05-03 10:13:06 -07:00
OWNERS	Updated top level owners file to match new format	2017-01-19 11:29:16 -08:00
README.md	Fix typos and linted_packages sorting	2016-10-31 18:31:08 +01:00
clientbin.sh	Refactor the common parts of cluster/kube{ctl,adm}.sh into a util script.	2017-01-26 21:29:49 -08:00
common.sh	Merge pull request #44062 from ixdy/semver-regexes	2017-05-01 12:54:44 -07:00
get-kube-binaries.sh	Make get-kube.sh work properly the "ci/latest" pointer	2017-04-05 15:02:10 -07:00
get-kube-local.sh	hack/cluster: map /run/xtables.lock into containerized kubelet filesystem	2017-05-05 23:34:06 -05:00
get-kube.sh	Merge pull request #44062 from ixdy/semver-regexes	2017-05-01 12:54:44 -07:00
kube-down.sh	Automatically download missing kube binaries in kube-up/kube-down.	2016-12-13 14:59:13 -08:00
kube-push.sh	Automatically download missing kube binaries in kube-up/kube-down.	2016-12-13 14:59:13 -08:00
kube-up.sh	Add KUBE_GCE_ENABLE_IP_ALIASES flag to the cluster turn up scripts.	2017-04-11 14:07:50 -07:00
kube-util.sh	Add KUBE_GCE_ENABLE_IP_ALIASES flag to the cluster turn up scripts.	2017-04-11 14:07:50 -07:00
kubeadm.sh	Refactor the common parts of cluster/kube{ctl,adm}.sh into a util script.	2017-01-26 21:29:49 -08:00
kubectl.sh	Fix failing kubectl skew tests	2017-03-08 16:08:47 -03:00
log-dump.sh	Allow disabling log dump for nodes (in preparation for using logexporter)	2017-04-25 10:48:33 +02:00
options.md	…
restore-from-backup.sh	Fix restore-from-backup.sh script	2017-03-21 11:58:13 +01:00
test-e2e.sh	…
test-network.sh	…
test-smoke.sh	…
update-storage-objects.sh	…
validate-cluster.sh	hack/cluster: consolidate cluster/ utils to hack/lib/util.sh	2017-03-30 22:34:46 -05:00

README.md

Cluster Configuration

Deprecation Notice: This directory has entered maintenance mode and will not be accepting new providers. Please submit new automation deployments to kube-deploy. Deployments in this directory will continue to be maintained and supported at their current level of support.

The scripts and data in this directory automate creation and configuration of a Kubernetes cluster, including networking, DNS, nodes, and master components.

See the getting-started guides for examples of how to use the scripts.

cloudprovider/config-default.sh contains a set of tweakable definitions/parameters for the cluster.

The heavy lifting of configuring the VMs is done by SaltStack.