github/k3s - k3s - https://git.xinac.net

Commit Graph

Author	SHA1	Message	Date
Kubernetes Submit Queue	1e2105808b	Merge pull request #45136 from vishh/cos-nvidia-driver-install Automatic merge from submit-queue Enable "kick the tires" support for Nvidia GPUs in COS This PR provides an installation daemonset that will install Nvidia CUDA drivers on Google Container Optimized OS (COS). User space libraries and debug utilities from the Nvidia driver installation are made available on the host in a special directory on the host - * `/home/kubernetes/bin/nvidia/lib` for libraries * `/home/kubernetes/bin/nvidia/bin` for debug utilities Containers that run CUDA applications on COS are expected to consume the libraries and debug utilities (if necessary) from the host directories using `HostPath` volumes. Note: This solution requires updating Pod Spec across distros. This is a known issue and will be addressed in the future. Until then CUDA workloads will not be portable. This PR updates the COS base image version to m59. This is coupled with this PR for the following reasons: 1. Driver installation requires disabling a kernel feature in COS. 2. The kernel API for disabling this interface changed across COS versions 3. If the COS image update is not handled in this PR, then a subsequent COS image update will break GPU integration and will require an update to the installation scripts in this PR. 4. Instead of having to post `3` PRs, one each for adding the basic installer, updating COS to m59, and then updating the installer again, this PR combines all the changes to reduce review overhead and latency, and additional noise that will be created when GPU tests break. Try out this PR 1. Get Quota for GPUs in any region 2. `export `KUBE_GCE_ZONE=<zone-with-gpus>` KUBE_NODE_OS_DISTRIBUTION=gci` 3. `NODE_ACCELERATORS="type=nvidia-tesla-k80,count=1" cluster/kube-up.sh` 4. `kubectl create -f cluster/gce/gci/nvidia-gpus/cos-installer-daemonset.yaml` 5. Run your CUDA app in a pod. Another option is to run a e2e manually to try out this PR 1. Get Quota for GPUs in any region 2. export `KUBE_GCE_ZONE=<zone-with-gpus>` KUBE_NODE_OS_DISTRIBUTION=gci 3. `NODE_ACCELERATORS="type=nvidia-tesla-k80,count=1"` 4. `go run hack/e2e.go -- --up` 5. `hack/ginkgo-e2e.sh --ginkgo.focus="\[Feature:GPU\]"` The e2e will install the drivers automatically using the daemonset and then run test workloads to validate driver integration. TODO: - [x] Update COS image version to m59 release. - [x] Remove sleep from the install script and add it to the daemonset - [x] Add an e2e that will run the daemonset and run a sample CUDA app on COS clusters. - [x] Setup a test project with necessary quota to run GPU tests against HEAD to start with https://github.com/kubernetes/test-infra/pull/2759 - [x] Update node e2e serial configs to install nvidia drivers on COS by default	2017-05-23 10:46:10 -07:00
Kubernetes Submit Queue	03ba1324cf	Merge pull request #46224 from gmarek/kubemark_heapster Automatic merge from submit-queue (batch tested with PRs 46133, 46211, 46224, 46205, 45910) Make CPU request for heapster in kubemark scale with the number of Nodes	2017-05-22 15:50:03 -07:00
gmarek	27fc7be396	Make CPU request for heapster in kubemark scale with the number of Nodes	2017-05-22 16:20:27 +02:00
Vishnu kannan	86b5edb79a	Update COS version to m59 Signed-off-by: Vishnu kannan <vishnuk@google.com>	2017-05-20 21:17:19 -07:00
Shyam Jeedigunta	360054a75f	Add script to dump kubemark master logs	2017-05-20 13:12:38 +02:00
Kubernetes Submit Queue	a1c2db2fec	Merge pull request #45950 from shyamjvs/revert-proxier Automatic merge from submit-queue Make real proxier in hollow-proxy optional (default=true) Ref https://github.com/kubernetes/kubernetes/pull/45622 This allows using real proxier for hollow proxy, but we use the fake one by default. cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek	2017-05-18 07:55:09 -07:00
Shyam Jeedigunta	804a4f558c	Make usage of real proxier in hollow-proxy optional (default=true)	2017-05-18 14:30:12 +02:00
Michael Taufen	2ee2ec5e21	Remove the deprecated --babysit-daemons kubelet flag	2017-05-17 09:08:57 -07:00
Shyam Jeedigunta	87cde074f8	Minor fix in run-gcloud-compute-with-retries output piping	2017-05-15 13:39:10 +02:00
Shyam Jeedigunta	0f1d5e6e36	Remove kubemark.sh as we don't use pod IP from it anymore	2017-05-12 13:47:13 +02:00
Kubernetes Submit Queue	e939019900	Merge pull request #45604 from shyamjvs/start-km-master-fix Automatic merge from submit-queue (batch tested with PRs 45569, 45602, 45604, 45478, 45550) Minor bug fix in start-kubemark-master script cc @wojtek-t @gmarek	2017-05-10 21:34:41 -07:00
Shyam Jeedigunta	1078e9580c	Minor bug fix in start-kubemark-master script	2017-05-10 19:51:14 +02:00
Shyam Jeedigunta	1fc831e0ec	Fix bug in hollow-node deletion in stop-kubemark script	2017-05-10 12:57:43 +02:00
Shyam Jeedigunta	0759289dcf	Stream output of run-gcloud-compute-with-retries to stdout in realtime	2017-05-09 13:44:48 +02:00
Shyam Jeedigunta	2e800eef20	Fix add-metadata command for kubemark master	2017-05-08 20:44:20 +02:00
Shyam Jeedigunta	efc84378b8	Fix gcloud retries cmd to rightly capture return code	2017-05-08 19:34:26 +02:00
Shyam Jeedigunta	395d3bf3b4	Move hollow-node's initContainer from annotation to field	2017-05-04 11:41:33 +02:00
Dan Williams	b3705b6e35	hack/cluster: consolidate cluster/ utils to hack/lib/util.sh Per Clayton's suggestion, move stuff from cluster/lib/util.sh to hack/lib/util.sh. Also consolidate ensure-temp-dir and use the hack/lib/util.sh implementation rather than cluster/common.sh.	2017-03-30 22:34:46 -05:00
Kubernetes Submit Queue	4f606b9c8d	Merge pull request #42820 from MrHohn/addon-kubemark-v6.4-beta.1 Automatic merge from submit-queue (batch tested with PRs 42672, 42770, 42818, 42820, 40849) kubemark test: Bump addon-manager to v6.4-beta.1 Follow up PR of #42760. This PR bumps addon-manager to v6.4-beta.1 for kubemark test. Release note: ```release-note NONE ```	2017-03-25 14:27:27 -07:00
Piotr Szczesniak	69fd7aafd0	Bumped Heapster to v1.3.0	2017-03-17 15:45:52 +01:00
Random-Liu	c4b3fd4e63	Update npd to the official v0.3.0 release.	2017-03-15 14:26:12 -07:00
Zihong Zheng	34b8d008ec	kubemark test: Bump addon-manager to v6.4-beta.1	2017-03-09 10:13:07 -08:00
Kubernetes Submit Queue	c6d9d9c5ad	Merge pull request #42456 from Random-Liu/update-npd-in-kubemark Automatic merge from submit-queue (batch tested with PRs 42456, 42457, 42414, 42480, 42370) Update npd in kubemark since #42201 is merged. Revert https://github.com/kubernetes/kubernetes/pull/41716. #42201 has been merged, and #41713 is fixed. Now we could retry update npd in kubemark. /cc @shyamjvs @wojtek-t @dchen1107	2017-03-04 00:17:40 -08:00
Random-Liu	3f30532b0f	Update npd in kubemark since #42201 is merged.	2017-03-02 16:29:24 -08:00
gmarek	30b9490d66	Add alsologtostderr flag to hollow node	2017-03-03 01:29:02 +01:00
Kubernetes Submit Queue	db5e85af5f	Merge pull request #41980 from shyamjvs/one-more-time Automatic merge from submit-queue (batch tested with PRs 41980, 42192, 42223, 41822, 42048) Modified kubemark startup scripts to restore master on reboot Fixes #41735 As discussed in the issue, modified the scripts to satisfy the conditions of restoring master env, running non-idempotent operations only for the first time and persist important data like pki/auth files on a PD. Also attached `start-kubemark-master.sh` as startup-script metadata to master instance (on GCE) so that it is called automatically on each boot. cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek	2017-03-02 00:59:13 -08:00
Shyam JVS	ab78b20bc1	Make kubemark hollow node logging verbosity configurable	2017-03-01 20:24:30 +01:00
Kubernetes Submit Queue	32d59cbb2f	Merge pull request #42201 from shyamjvs/inotify-limit Automatic merge from submit-queue (batch tested with PRs 42316, 41618, 42201, 42113, 42191) [Kubemark] Add init container in hollow node for setting inotify limit of node to 200 Fixes #41713 Along with adding the init container, I also changed the manifest to a yaml as otherwise the entire init container annotation would have to be in a single line (with escaped characters), as json doesn't allow multi-line strings. cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek @Random-Liu	2017-03-01 07:48:20 -08:00
Kubernetes Submit Queue	1a35155025	Merge pull request #41973 from wojtek-t/build_non_alpha_3_0_17_etcd_image Automatic merge from submit-queue (batch tested with PRs 42162, 41973, 42015, 42115, 41923) Release 3.0.17 etcd image	2017-02-28 22:05:59 -08:00
Shyam Jeedigunta	4574900634	Modified kubemark startup scripts to restore master on reboots	2017-02-28 19:51:00 +01:00
Kubernetes Submit Queue	dac0296f0b	Merge pull request #42093 from liggitt/avoid-fake-node-names Automatic merge from submit-queue (batch tested with PRs 40746, 41699, 42108, 42174, 42093) Avoid fake node names in user info Node usernames should follow the format `system:node:<node-name>`, but if we don't know the node name, it's worse to put a fake one in. In the future, we plan to have a dedicated node authorizer, which would start rejecting requests from a user with a bogus node name like this. The right approach is to either mint correct credentials per node, or use node bootstrapping so it requests a correct client certificate itself.	2017-02-28 07:51:33 -08:00
Shyam JVS	75e602ca28	Convert hollow-node manifest to yaml and add init container for setting inotify limit	2017-02-28 00:53:36 +01:00
Wojciech Tyczynski	74266e0dc0	Release 3.0.17 etcd image	2017-02-27 16:23:44 +01:00
Kubernetes Submit Queue	61a2bd64a2	Merge pull request #42054 from fejta/kubemark Automatic merge from submit-queue (batch tested with PRs 41962, 42055, 42062, 42019, 42054) Update flag to --check-version-skew instead of --check_version_skew https://github.com/kubernetes/test-infra/issues/2012 Also add a `--` to send the flags to kubetest without triggering a warning.	2017-02-27 00:17:00 -08:00
Jordan Liggitt	34ac0dc302	Avoid fake node names in user info	2017-02-25 02:09:55 -05:00
Zihong Zheng	64ba52ae71	Bumps addon-manager to v6.4-alpha.3 and updates template files	2017-02-24 16:52:31 -08:00
Erick Fejta	db5a355336	Update flag to --check-version-skew instead of --check_version_skew	2017-02-24 07:49:55 -08:00
Kubernetes Submit Queue	ac293b857c	Merge pull request #41858 from shyamjvs/npd-logs Automatic merge from submit-queue (batch tested with PRs 38702, 41810, 41778, 41858, 41872) [Kubemark] Fixed hollow-npd container command to log to file Fixes #41802 cc @wojtek-t @gmarek @Random-Liu	2017-02-23 07:54:40 -08:00
Wojciech Tyczynski	b70e392161	Update clusters to use 3.0.17 etcd	2017-02-23 10:08:50 +01:00
Kubernetes Submit Queue	fe34705f8a	Merge pull request #41587 from MrHohn/addon-manager-fix-hpa Automatic merge from submit-queue (batch tested with PRs 41349, 41532, 41256, 41587, 41657) Update kubectl in addon-manager to use HPA in autoscaling/v1 Addon-manager is broken since HPA objects were removed from extensions api group. Came across the logs from [the latest addon-manager on Jenkins](https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce/4290/artifacts/bootstrap-e2e-master/kube-addon-manager.log): ``` INFO: == Entering periodical apply loop at 2017-02-16T17:33:37+0000 == error: error pruning namespaced object extensions/v1beta1, Kind=HorizontalPodAutoscaler: the server could not find the requested resource WRN: == Failed to execute /usr/local/bin/kubectl apply --namespace=kube-system -f /etc/kubernetes/addons --prune=true -l kubernetes.io/cluster-service=true --recursive >/dev/null at 2017-02-16T17:33:38+0000. 2 tries remaining. == error: error pruning namespaced object extensions/v1beta1, Kind=HorizontalPodAutoscaler: the server could not find the requested resource WRN: == Failed to execute /usr/local/bin/kubectl apply --namespace=kube-system -f /etc/kubernetes/addons --prune=true -l kubernetes.io/cluster-service=true --recursive >/dev/null at 2017-02-16T17:33:46+0000. 1 tries remaining. == error: error pruning namespaced object extensions/v1beta1, Kind=HorizontalPodAutoscaler: the server could not find the requested resource WRN: == Failed to execute /usr/local/bin/kubectl apply --namespace=kube-system -f /etc/kubernetes/addons --prune=true -l kubernetes.io/cluster-service=true --recursive >/dev/null at 2017-02-16T17:33:53+0000. 0 tries remaining. == WRN: == Kubernetes addon update completed with errors at 2017-02-16T17:33:58+0000 == ``` And notice this commit (`f66679a4e9`) came in two weeks ago, which removed HorizontalPodAutoscaler from extensions/v1beta1. Addon-manager is now partially functioning that it could successfully create and update addons, but will fail to prune objects, which means upgrade tests may mostly fail. Pushed another version of addon-manager with kubectl v1.6.0-alpha.2 ([release 2 days ago](https://github.com/kubernetes/kubernetes/releases/tag/v1.6.0-alpha.2)) for fixing, including below images: - gcr.io/google-containers/kube-addon-manager:v6.4-alpha.2 - gcr.io/google-containers/kube-addon-manager-amd64:v6.4-alpha.2 - gcr.io/google-containers/kube-addon-manager-arm:v6.4-alpha.2 - gcr.io/google-containers/kube-addon-manager-arm64:v6.4-alpha.2 - gcr.io/google-containers/kube-addon-manager-ppc64le:v6.4-alpha.2 - gcr.io/google-containers/kube-addon-manager-s390x:v6.4-alpha.2 @mikedanese cc @wojtek-t @shyamjvs	2017-02-22 08:12:46 -08:00
Wojciech Tyczynski	6d303d3329	Increase cpu for kubeproxy in kubemark in large clusters	2017-02-22 08:44:34 +01:00
Shyam Jeedigunta	f40b5eed5d	[Kubemark] Fixed hollow-npd container command to log to file	2017-02-22 02:38:38 +01:00
Kubernetes Submit Queue	70c9eebd21	Merge pull request #41739 from shyamjvs/hollow-node-logs Automatic merge from submit-queue (batch tested with PRs 41706, 39063, 41330, 41739, 41576) [Kubemark] Add option to log hollow-node logs Ref https://github.com/kubernetes/kubernetes/issues/41613 Added an option to log kubemark hollow-node logs which includes kubelet, kubeproxy and npd logs for each hollow-node. Setting the env var `ENABLE_HOLLOW_NODE_LOGS=true` should now enable logging for tests. cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek @yujuhong @Random-Liu	2017-02-21 02:24:43 -08:00
Zihong Zheng	2c8e89820a	Update kubectl in addon-manager to use HPA in autoscaling/v1 instead of extensions/v1beta1	2017-02-20 10:49:10 -08:00
Kubernetes Submit Queue	5fb6b91faf	Merge pull request #41751 from shyamjvs/fix-kubemark-default-suite Automatic merge from submit-queue Fix kubemark default e2e test suite's name Seems like the suite "[Feature:performance]" doesn't trigger tests anymore. Changed it to "[Feature:Performance]" in kubemark run-e2e-tests.sh. cc @wojtek-t @gmarek	2017-02-20 09:27:22 -08:00
Shyam Jeedigunta	7802c82671	Fix kubemark default e2e test suite's name	2017-02-20 16:08:28 +01:00
Shyam Jeedigunta	ed0ab3cd8e	[Kubemark] Add option to log hollow-node logs	2017-02-20 11:52:49 +01:00
Wojciech Tyczynski	4426156aa6	More resources for hollowproxy in large kubemarks	2017-02-20 09:26:17 +01:00
Random-Liu	47fc1d684d	Revert the npd change in kubemark.	2017-02-19 04:14:30 -08:00
Random-Liu	cd194bd9cc	Fix kubemark hollow-npd.	2017-02-18 21:01:56 -08:00

1 2 3 4 5

201 Commits (46e9b522fdbc14184539c57d3ba0380a7d7aa69f)