Lightweight Kubernetes
 
 
 
 
Go to file
Kubernetes Submit Queue 1e2105808b Merge pull request #45136 from vishh/cos-nvidia-driver-install
Automatic merge from submit-queue

Enable "kick the tires" support for Nvidia GPUs in COS

This PR provides an installation daemonset that will install Nvidia CUDA drivers on Google Container Optimized OS (COS).
User space libraries and debug utilities from the Nvidia driver installation are made available on the host in a special directory on the host -
* `/home/kubernetes/bin/nvidia/lib` for libraries
*  `/home/kubernetes/bin/nvidia/bin` for debug utilities

Containers that run CUDA applications on COS are expected to consume the libraries and debug utilities (if necessary) from the host directories using `HostPath` volumes.

Note: This solution requires updating Pod Spec across distros. This is a known issue and will be addressed in the future. Until then CUDA workloads will not be portable.

This PR updates the COS base image version to m59. This is coupled with this PR for the following reasons:
1. Driver installation requires disabling a kernel feature in COS. 
2. The kernel API for disabling this interface changed across COS versions
3. If the COS image update is not handled in this PR, then a subsequent COS image update will break GPU integration and will require an update to the installation scripts in this PR.
4. Instead of having to post `3` PRs, one each for adding the basic installer, updating COS to m59, and then updating the installer again, this PR combines all the changes to reduce review overhead and latency, and additional noise that will be created when GPU tests break.

**Try out this PR**
1. Get Quota for GPUs in any region
2. `export `KUBE_GCE_ZONE=<zone-with-gpus>` KUBE_NODE_OS_DISTRIBUTION=gci`
3. `NODE_ACCELERATORS="type=nvidia-tesla-k80,count=1" cluster/kube-up.sh`
4. `kubectl create -f cluster/gce/gci/nvidia-gpus/cos-installer-daemonset.yaml`
5. Run your CUDA app in a pod.

**Another option is to run a e2e manually to try out this PR**
1. Get Quota for GPUs in any region
2. export `KUBE_GCE_ZONE=<zone-with-gpus>` KUBE_NODE_OS_DISTRIBUTION=gci
3. `NODE_ACCELERATORS="type=nvidia-tesla-k80,count=1"`
4. `go run hack/e2e.go -- --up` 
5. `hack/ginkgo-e2e.sh --ginkgo.focus="\[Feature:GPU\]"`
The e2e will install the drivers automatically using the daemonset and then run test workloads to validate driver integration.

TODO:
- [x] Update COS image version to m59 release.
- [x] Remove sleep from the install script and add it to the daemonset
- [x] Add an e2e that will run the daemonset and run a sample CUDA app on COS clusters.
- [x] Setup a test project with necessary quota to run GPU tests against HEAD to start with https://github.com/kubernetes/test-infra/pull/2759
- [x] Update node e2e serial configs to install nvidia drivers on COS by default
2017-05-23 10:46:10 -07:00
.github Redirect kubeadm issues to kubeadm repo 2017-05-19 12:18:55 +02:00
Godeps PBM govmomi dependencies 2017-05-22 19:43:10 -07:00
api PDB MaxUnavailable: Generated 2017-05-23 07:42:24 -07:00
build Merge pull request #45855 from mikedanese/out-of-root 2017-05-18 00:56:47 -07:00
cluster Merge pull request #45136 from vishh/cos-nvidia-driver-install 2017-05-23 10:46:10 -07:00
cmd Merge pull request #45427 from ncdc/gc-shared-informers 2017-05-22 20:58:03 -07:00
docs PDB MaxUnavailable: Generated 2017-05-23 07:42:24 -07:00
examples SPBM policy ID support in vsphere cloud provider 2017-05-22 19:45:17 -07:00
federation Merge pull request #46176 from vmware/vSphereStoragePolicySupport 2017-05-22 23:41:10 -07:00
hack Merge pull request #45136 from vishh/cos-nvidia-driver-install 2017-05-23 10:46:10 -07:00
hooks Fix spelling in package naming linter error message 2016-12-20 15:48:14 -05:00
logo Updated top level owners file to match new format 2017-01-19 11:29:16 -08:00
pkg Merge pull request #46286 from zjj2wry/timstamps-timestamps 2017-05-23 10:29:58 -07:00
plugin Merge pull request #46223 from smarterclayton/scheduler_max 2017-05-23 07:42:00 -07:00
staging Merge pull request #45587 from foxish/pdb-maxunavailab 2017-05-23 10:29:56 -07:00
test Merge pull request #45136 from vishh/cos-nvidia-driver-install 2017-05-23 10:46:10 -07:00
third_party autogenerated 2017-04-14 10:40:57 -07:00
translations Extract a bunch more strings from kubectl 2017-04-06 20:12:50 -07:00
vendor PBM govmomi dependencies 2017-05-22 19:43:10 -07:00
.bazelrc move build related files out of the root directory 2017-05-15 15:53:54 -07:00
.gazelcfg.json Add go_genrule for zz_generated.openapi.go. 2017-04-25 17:51:36 -07:00
.generated_files Move .generated_docs to docs/ so docs OWNERS can review / approve 2017-02-16 10:11:57 -08:00
.gitattributes Add -diff attributes for generated files 2016-12-08 17:12:07 -08:00
.gitignore Remove verify_gen_openapi make rule. 2017-04-25 17:41:33 -07:00
BUILD.bazel move build related files out of the root directory 2017-05-15 15:53:54 -07:00
CHANGELOG.md Update CHANGELOG.md for v1.6.4. 2017-05-19 12:30:25 -07:00
CONTRIBUTING.md Close kubernetes/community#420 2017-03-08 09:59:30 -08:00
LICENSE LICENSE: revert modifications to Apache license 2016-11-22 11:44:46 -08:00
Makefile move build related files out of the root directory 2017-05-15 15:53:54 -07:00
Makefile.generated_files move build related files out of the root directory 2017-05-15 15:53:54 -07:00
OWNERS Add wojtec to global approvers 2017-01-25 11:57:00 -06:00
OWNERS_ALIASES Move PDB controller and type ownership to SIG-Apps 2017-05-22 12:55:28 -07:00
README.md Adjust the link to the right troubleshooting doc page 2017-04-13 08:20:39 +00:00
Vagrantfile Customizable vagrant rsync args and excludes 2016-11-14 11:18:44 +01:00
WORKSPACE move build related files out of the root directory 2017-05-15 15:53:54 -07:00
code-of-conduct.md Change code of conduct to call CNCF CoC by reference 2016-10-19 13:22:35 -04:00
labels.yaml Update labels.yaml with sig labels 2017-04-28 14:27:32 -07:00

README.md

Kubernetes

Submit Queue Widget GoDoc Widget


Kubernetes is an open source system for managing containerized applications across multiple hosts, providing basic mechanisms for deployment, maintenance, and scaling of applications.

Kubernetes builds upon a decade and a half of experience at Google running production workloads at scale using a system called Borg, combined with best-of-breed ideas and practices from the community.

Kubernetes is hosted by the Cloud Native Computing Foundation (CNCF). If you are a company that wants to help shape the evolution of technologies that are container-packaged, dynamically-scheduled and microservices-oriented, consider joining the CNCF. For details about who's involved and how Kubernetes plays a role, read the CNCF announcement.


To start using Kubernetes

See our documentation on kubernetes.io.

Try our interactive tutorial.

Take a free course on Scalable Microservices with Kubernetes.

To start developing Kubernetes

The community repository hosts all information about building Kubernetes from source, how to contribute code and documentation, who to contact about what, etc.

If you want to build Kubernetes right away there are two options:

You have a working Go environment.
$ go get -d k8s.io/kubernetes
$ cd $GOPATH/src/k8s.io/kubernetes
$ make
You have a working Docker environment.
$ git clone https://github.com/kubernetes/kubernetes
$ cd kubernetes
$ make quick-release

If you are less impatient, head over to the developer's documentation.

Support

If you need support, start with the troubleshooting guide and work your way through the process that we've outlined.

That said, if you have questions, reach out to us one way or another.

Analytics