k3s/images at 578d9fcf638f33a043af79e6a0f7c9444f81a456 - k3s

History

Kubernetes Submit Queue 1e2105808b Merge pull request #45136 from vishh/cos-nvidia-driver-install Automatic merge from submit-queue Enable "kick the tires" support for Nvidia GPUs in COS This PR provides an installation daemonset that will install Nvidia CUDA drivers on Google Container Optimized OS (COS). User space libraries and debug utilities from the Nvidia driver installation are made available on the host in a special directory on the host - * `/home/kubernetes/bin/nvidia/lib` for libraries * `/home/kubernetes/bin/nvidia/bin` for debug utilities Containers that run CUDA applications on COS are expected to consume the libraries and debug utilities (if necessary) from the host directories using `HostPath` volumes. Note: This solution requires updating Pod Spec across distros. This is a known issue and will be addressed in the future. Until then CUDA workloads will not be portable. This PR updates the COS base image version to m59. This is coupled with this PR for the following reasons: 1. Driver installation requires disabling a kernel feature in COS. 2. The kernel API for disabling this interface changed across COS versions 3. If the COS image update is not handled in this PR, then a subsequent COS image update will break GPU integration and will require an update to the installation scripts in this PR. 4. Instead of having to post `3` PRs, one each for adding the basic installer, updating COS to m59, and then updating the installer again, this PR combines all the changes to reduce review overhead and latency, and additional noise that will be created when GPU tests break. Try out this PR 1. Get Quota for GPUs in any region 2. `export `KUBE_GCE_ZONE=<zone-with-gpus>` KUBE_NODE_OS_DISTRIBUTION=gci` 3. `NODE_ACCELERATORS="type=nvidia-tesla-k80,count=1" cluster/kube-up.sh` 4. `kubectl create -f cluster/gce/gci/nvidia-gpus/cos-installer-daemonset.yaml` 5. Run your CUDA app in a pod. Another option is to run a e2e manually to try out this PR 1. Get Quota for GPUs in any region 2. export `KUBE_GCE_ZONE=<zone-with-gpus>` KUBE_NODE_OS_DISTRIBUTION=gci 3. `NODE_ACCELERATORS="type=nvidia-tesla-k80,count=1"` 4. `go run hack/e2e.go -- --up` 5. `hack/ginkgo-e2e.sh --ginkgo.focus="\[Feature:GPU\]"` The e2e will install the drivers automatically using the daemonset and then run test workloads to validate driver integration. TODO: - [x] Update COS image version to m59 release. - [x] Remove sleep from the install script and add it to the daemonset - [x] Add an e2e that will run the daemonset and run a sample CUDA app on COS clusters. - [x] Setup a test project with necessary quota to run GPU tests against HEAD to start with https://github.com/kubernetes/test-infra/pull/2759 - [x] Update node e2e serial configs to install nvidia drivers on COS by default		2017-05-23 10:46:10 -07:00
..
clusterapi-tester	autogenerated	2017-04-14 10:40:57 -07:00
dnsutils	chore (samples): Don't cache apks at all in Dockerfiles	2017-03-29 17:02:04 +02:00
entrypoint-tester	Always --pull in docker build to ensure recent base images	2017-01-10 16:21:05 -08:00
fakegitserver	Always --pull in docker build to ensure recent base images	2017-01-10 16:21:05 -08:00
goproxy	autogenerated	2017-04-14 10:40:57 -07:00
hostexec	Always --pull in docker build to ensure recent base images	2017-01-10 16:21:05 -08:00
iperf	Update images that use ubuntu-slim base image to :0.6	2017-01-11 15:07:04 -08:00
jessie-dnsutils	Always --pull in docker build to ensure recent base images	2017-01-10 16:21:05 -08:00
logs-generator	autogenerated	2017-04-14 10:40:57 -07:00
mount-tester	test/images/mount-tester: ensure exec binary is o+rx	2017-01-27 16:49:59 +00:00
mount-tester-user	test/images/mount-tester-user: bump base image to 0.8	2017-02-01 20:42:02 +00:00
n-way-http	Always --pull in docker build to ensure recent base images	2017-01-10 16:21:05 -08:00
net	Always --pull in docker build to ensure recent base images	2017-01-10 16:21:05 -08:00
netexec	Bump e2e netexec pod.xml image version to 1.7	2017-05-18 17:54:13 +08:00
network-tester	test/images/network-tester:bump rc/pod image verison to 1.9	2017-05-22 17:11:23 +08:00
nvidia-cuda	Adding an installer script that installs Nvidia drivers in Container Optimized OS	2017-05-20 21:17:19 -07:00
pets	Clean up petset	2017-05-06 11:24:34 +08:00
port-forward-tester	Always --pull in docker build to ensure recent base images	2017-01-10 16:21:05 -08:00
porter	Update gcr.io/google_containers/porter image to `4524579c0e`	2017-04-19 11:50:41 -07:00
redis	Make a smaller redis image for testing, based on Alpine.	2017-03-28 16:18:00 -07:00
resource-consumer	autogenerated	2017-04-14 10:40:57 -07:00
serve_hostname	Bump to go1.8.1 and remove the edge GOROOT	2017-04-25 23:45:47 +03:00
volumes-tester	Always --pull in docker build to ensure recent base images	2017-01-10 16:21:05 -08:00
BUILD	Enable auto-generating sources rules	2017-01-05 14:14:13 -08:00