Commit Graph

38 Commits (52f3380ef3a5d5de4d75ff92defd1c9af9d775bc)

Author SHA1 Message Date
SataQiu d3a902ff5b e2e refactor: cleanup Logf form framework/util 2019-05-24 16:39:46 +08:00
Richard Chen 2a70a0b424 Add an e2e test for running a gpu job interrupted by node recreation. 2019-05-16 11:41:01 -07:00
danielqsj ccecc67a5b fix golint error in test/e2e/scheduling 2019-05-14 14:18:52 +08:00
danielqsj 15a4342fe8 remove dot imports in e2e/scheduling 2019-05-14 14:17:20 +08:00
draveness da7507500f refactor: use e2elog.Logf instead of framework.Logf 2019-05-07 08:15:31 +08:00
Kubernetes Prow Robot 6cd85298c5
Merge pull request #75566 from jiayingz/gpu-test-update
Update test/e2e/scheduling/nvidia-gpus to also run cuda10 vector add.
2019-04-24 14:20:47 -07:00
Jiatong Wang 7814865b40 Move gpu_util.go to e2e/framework/gpu 2019-04-10 14:30:24 -07:00
Jiaying Zhang 54c2c2690c Update test/e2e/scheduling/nvidia-gpus to also run cuda10 vector add. 2019-03-21 16:29:47 -07:00
Chris O'Haver 9060fc6e6d add opt to track dns pods 2018-10-01 10:00:16 -04:00
Da K. Ma adbdbdec49 Got allocatable GPUs.
Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>
2018-09-25 12:33:42 +08:00
Francois Tur 5c20fff19d
Revert "Add DNS pod resource monitoring option" 2018-09-19 14:54:29 -04:00
Chris O'Haver af0c1d2a4c Add dns pod monitoring option 2018-09-17 12:52:05 -04:00
linyouchong d7b7fdd0dc Make log more readable 2018-08-16 17:31:02 +08:00
Rohit Agarwal af3bc705b5 Remove COS requirement while running e2e nvidia gpu tests. 2018-06-26 12:12:06 -07:00
Maciej Szulik a2a3a98e1d
DaemonSet internals are still in extensions 2018-05-28 17:59:54 +02:00
Jiaying Zhang 6e0badc0d1 Fix DsFromManifest() after we switch from extensions/v1beta1 to apps/v1
in cluster/addons/device-plugins/nvidia-gpu/daemonset.yaml.
2018-05-25 16:05:35 -07:00
Kubernetes Submit Queue 043204b1e5
Merge pull request #61498 from mindprince/delete-in-tree-gpu
Automatic merge from submit-queue (batch tested with PRs 61498, 62030). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Delete in-tree support for NVIDIA GPUs.

This removes the alpha Accelerators feature gate which was deprecated in 1.10 (#57384).
The alternative feature DevicePlugins went beta in 1.10 (#60170).

Fixes #54012

```release-note
Support for "alpha.kubernetes.io/nvidia-gpu" resource which was deprecated in 1.10 is removed. Please use the resource exposed by DevicePlugins instead ("nvidia.com/gpu").
```
2018-04-03 02:02:04 -07:00
Rohit Agarwal 87dda3375b Delete in-tree support for NVIDIA GPUs.
This removes the alpha Accelerators feature gate which was deprecated in 1.10.
The alternative feature DevicePlugins went beta in 1.10.
2018-04-02 20:17:01 -07:00
Christoph Blecker 710c8563b4
Fix go vet errors 2018-04-02 17:57:44 -07:00
Jiaying Zhang 9a05af5502 Update gke nvidia-gpu-device-plugin to the latest version that supports
both v1alpha and v1beta1 device plugin versions.
Re-enables nvidia-gpus e2e test after verifying the test passes now.
2018-02-26 14:08:58 -08:00
vikaschoudhary16 e64517cd74 Migrate deviceplugin api from v1alpha to v1beta1 2018-02-21 01:26:20 -05:00
Rohit Agarwal d191c57cad Add e2e tests for GPU monitoring. 2018-01-26 15:30:55 -08:00
Rohit Agarwal a959ae636b Make it possible to override the driver installer daemonset url from test-infra. 2018-01-25 09:21:12 -08:00
Jiaying Zhang 4a1a205109 Changes nvidia-gpu device plugin addon config settings:
- Runs as system critical pod
- Makes resource limits to match its resource requets
- Modifies test/e2e/scheduling/nvidia-gpus.go to cope with the recent
change of running the device plugin as a system addon.
- The resource settings of the addon is based on the test results
from 8 nvidia-tesla-k80 gpus.
2017-11-20 17:32:53 -08:00
Kubernetes Submit Queue 87d45a54bd
Merge pull request #55940 from shyamjvs/reduce-spam-from-resource-gatherer
Automatic merge from submit-queue (batch tested with PRs 55233, 55927, 55903, 54867, 55940). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Control logs verbosity in resource gatherer

PR https://github.com/kubernetes/kubernetes/pull/53541 added some logging in resource gatherer which is a bit too verbose for normal purposes.
As a result, we're seeing a lot of spam in our large cluster performance tests (e.g - https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce-scalability/8046/build-log.txt)

This PR is making the verbosity of those logs controllable through an option. It's off by default, but turning it on for the gpu test to preserve behavior there.

/cc @jiayingz @mindprince
2017-11-18 12:26:18 -08:00
Shyam Jeedigunta fce28995e1 Control logs verbosity in resource gatherer 2017-11-17 13:03:32 +01:00
Rohit Agarwal 3ac94a57eb Update URLs for nvidia gpu device plugin and nvidia driver installer.
Device plugin is now an addon and its manifest is now in
kubernetes/kubernetes. The manifest on
GoogleCloudPlatform/container-engine-accelerators no longer contains
device plugin.
2017-11-14 15:31:22 -08:00
Jiaying Zhang ae36f8ee95 Extend test/e2e/scheduling/nvidia-gpus.go to track resource usage of
installer and device plugin containers.
To support this, exports certain functions and fields in
framework/resource_usage_gatherer.go so that it can be used in any
e2e test to track any specified pod resource usage with the specified
probe interval and duration.
2017-11-13 16:24:41 -08:00
supereagle b694d51842 use versiond group clients from client-go 2017-11-07 14:47:22 +08:00
Jiaying Zhang 6fecd04924 Fixes a regression introduced by PR 52290 that extended resource
capacity may temporarily drop to zero after kubelet restarts and
PODs restarted during that time window could fail to be scheduled.
2017-10-03 10:26:53 -07:00
Jiaying Zhang 65b76f361e Fixes a flakiness in GPUDevicePlugin e2e test.
Waits till nvidia gpu disappears from all nodes after deleting the
device plug DaemonSet to make sure its pods are deleted from all nodes.
2017-09-29 10:06:58 -07:00
Jiaying Zhang ba40bee5c1 Modified test/e2e_node/gpu-device-plugin.go to make sure it passes. 2017-09-22 20:21:26 +02:00
Renaud Gaubert 6993612cec Added device plugin e2e kubelet failure test
Signed-off-by: Renaud Gaubert <renaud.gaubert@gmail.com>
2017-09-22 01:24:01 +02:00
Jiaying Zhang 06b31849e1 Extends GPUDevicePlugin e2e test to exercise device plugin restarts. 2017-09-12 16:58:19 -07:00
Jiaying Zhang 01b49b4165 Extend test/e2e/scheduling/nvidia-gpus.go to include a device plugin based nvidia gpu e2e test. 2017-09-07 22:06:35 -07:00
Manjunath A Kumatagi ee4d54c70c Port e2e tests for multi architecture 2017-09-01 05:40:52 +05:30
Kevin f76ca1fb16 update clientset.Core() to clientset.CoreV1() in test 2017-08-14 16:53:55 +08:00
Rohit Agarwal cad0fe599c Move GPU e2e tests under owning SIG. 2017-07-18 10:31:48 -07:00