Commit Graph

903 Commits (6c38d009ceaa01488ea1ce2f53f864ff474f2664)

Author SHA1 Message Date
Rohit Agarwal 9c0bf19f80 Use cos-stable-59-9460-60-0 and newer installer for GPU node e2e tests. 2017-06-13 15:36:20 -07:00
Kubernetes Submit Queue f4d2c7b931 Merge pull request #46441 from dashpole/eviction_time
Automatic merge from submit-queue

Shorten eviction tests, and increase test suite timeout

After #43590, the eviction manager is less aggressive when evicting pods.  Because of that, many runs in the flaky suite time out.
To shorten the inode eviction test, I have lowered the eviction threshold.
To shorten the allocatable eviction test, I now set KubeReserved = NodeMemoryCapacity - 200Mb, so that any pod using 200Mb will be evicted.  This shortens this test from 40 minutes, to 10 minutes.
While this should be enough to not hit the flaky suite timeout anymore, it is better to keep lower individual test timeouts than a lower suite timeout, since hitting the suite timeout means that even successful test runs are not reported.

/assign @Random-Liu @mtaufen 

issue: #31362
2017-06-13 12:58:22 -07:00
David Ashpole 3365cca78a shorten eviction testst and lengthen flaky suite timeout 2017-06-12 12:56:45 -07:00
Rohit Agarwal f7a563435f Fix bad check in node e2e tests for GPUs.
When no nvidia device was attached, the -ne check had a syntax error:

    sh: -ne: argument expected

This resulted in 'Success' being echoed and the test passing incorrectly.
This was found while debugging issue #47216
2017-06-11 19:25:35 -07:00
Kubernetes Submit Queue 3040cba17d Merge pull request #47144 from jingxu97/May/emptyDir
Automatic merge from submit-queue

Fix local capacity isolation test
2017-06-09 12:17:19 -07:00
Kubernetes Submit Queue f75478875a Merge pull request #47113 from feiskyer/cri
Automatic merge from submit-queue

Kubelet: rename cri package name to pkg/kubelet/apis/cri/v1alpha1/runtime

**What this PR does / why we need it**:

We have moved CRI from api/v1alpha1/runtime to apis/cri/v1alpha1, which changed the package name of CRI. This would cause a significant problem: old-versioned runtime (based on CRI in v1.6) doesn't work with latest kubelet v1.7, and vice versa.

This PR renames cri package name to `pkg/kubelet/apis/cri/v1alpha1/runtime` for fixing the problem.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: 

fixes #47012

**Special notes for your reviewer**:

Should be included in v1.7.

**Release note**:

```release-note
CRI has been moved to package `pkg/kubelet/apis/cri/v1alpha1/runtime`.
```
2017-06-09 10:08:36 -07:00
Kubernetes Submit Queue 3a5df705fe Merge pull request #47190 from mindprince/faster-node-e2e-gci
Automatic merge from submit-queue

Move the nvidia installer to the beginning.

When the installer runs for the first time, it disables loadpin and restarts
the node. So, it is better to run it in the beginning so that we can avoid
redoing the later steps. One of the later steps include downloading a tar file
and untarring it. Doing that only once saves around 1m30s in test runtime for
the gci image.

/sig node
/area node-e2e

```release-note
NONE
```
2017-06-09 09:19:16 -07:00
Pengfei Ni 22e99504d7 Update CRI references 2017-06-09 10:16:40 +08:00
Kubernetes Submit Queue 3a96c31de5 Merge pull request #46885 from kewu1992/test_gci_next_canary
Automatic merge from submit-queue (batch tested with PRs 46885, 47197)

Let COS docker validation node test against gci-next-canary

**What this PR does / why we need it**:
This is for COS docker validation node test. We plan to use family gci-next-canary in container-vm-image-staging for future Docker upgration and validation.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #47134

**Special notes for your reviewer**:

**Release note**:

```release-note
```
2017-06-08 15:46:41 -07:00
Rohit Agarwal 4a5badfafa Move the nvidia installer to the beginning.
When the installer runs for the first time, it disables loadpin and restarts
the node. So, it is better to run it in the beginning so that we can avoid
redoing the later steps. One of the later steps include downloading a tar file
and untarring it. Doing that only once saves around 1m30s in test runtime for
the gci image.
2017-06-08 09:55:14 -07:00
Kubernetes Submit Queue 9c1b2aa9b5 Merge pull request #46743 from Random-Liu/bump-up-npd
Automatic merge from submit-queue

Bump up npd version to v0.4.0

Fixes #47070.

Bump up npd version to [v0.4.0](https://github.com/kubernetes/node-problem-detector/releases/tag/v0.4.0).

```release-note
Bump up Node Problem Detector version to v0.4.0, which added support of parsing log from /dev/kmsg and ABRT.
```

/cc @dchen1107 @ajitak
2017-06-08 08:24:18 -07:00
Jing Xu 426d44ded4 Fix local capacity isolation test
Fix issue #47128, also add flaky tag for this evition test
2017-06-08 06:30:29 -07:00
Kubernetes Submit Queue 6ee028249b Merge pull request #46385 from rickypai/rpai/host_mapping_node_e2e_test
Automatic merge from submit-queue (batch tested with PRs 43005, 46660, 46385, 46991, 47103)

add e2e node test for Pod hostAliases feature

**What this PR does / why we need it**: adds node e2e test for #45148 

tests requested in https://github.com/kubernetes/kubernetes/issues/43632#issuecomment-298434125

**Release note**:
```release-note
NONE
```

@yujuhong @thockin
2017-06-07 13:31:00 -07:00
Random-Liu 1d3979190c Bump up npd version to v0.4.0 2017-06-06 16:30:02 -07:00
Kubernetes Submit Queue 6ed4bc7b97 Merge pull request #46828 from cblecker/links-update
Automatic merge from submit-queue (batch tested with PRs 46718, 46828, 46988)

Update docs/ links to point to main site

**What this PR does / why we need it**:
This updates various links to either point to kubernetes.io or to the kubernetes/community repo instead of the legacy docs/ tree in k/k
Pre-requisite for #46813

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:

**Release note**:
```release-note
NONE
```

@kubernetes/sig-docs-maintainers @chenopis @ahmetb @thockin
2017-06-06 11:43:18 -07:00
Kubernetes Submit Queue 3338d784ba Merge pull request #46899 from mindprince/issue-46889-node-e2e-gpu-cos-fix
Automatic merge from submit-queue (batch tested with PRs 46897, 46899, 46864, 46854, 46875)

Wait for cloud-init to finish before starting tests.

This fixes #46889.


**Release note**:
```release-note
NONE
```
2017-06-06 05:22:42 -07:00
Christoph Blecker 1bdc7a29ae
Update docs/ URLs to point to proper locations 2017-06-05 22:13:54 -07:00
Jing Xu 0b13aee0c0 Add EmptyDir Volume and local storage for container overlay Isolation
This PR adds two features:
1. add support for isolating the emptyDir volume use. If user
sets a size limit for emptyDir volume, kubelet's eviction manager
monitors its usage
and evict the pod if the usage exceeds the limit.
2. add support for isolating the local storage for container overlay. If
the container's overly usage exceeds the limit defined in container
spec, eviction manager will evict the pod.
2017-06-05 12:05:48 -07:00
Rohit Agarwal 1561f55c4c Wait for cloud-init to finish before starting tests.
This fixes #46889.
2017-06-05 10:50:24 -07:00
Kubernetes Submit Queue 6fef1a1deb Merge pull request #46810 from vishh/gpu-cos-image-validation
Automatic merge from submit-queue (batch tested with PRs 46734, 46810, 46759, 46259, 46771)

Update the COS kernel sha for node e2e gpu installer

cc @mindprince

Relevant COS image - https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/jenkins/image-config-serial.yaml#L19
2017-06-05 06:51:23 -07:00
Kubernetes Submit Queue e6c74bbaaf Merge pull request #46221 from FengyunPan/close-file
Automatic merge from submit-queue

Close file after os.Open()

None
2017-06-03 04:42:00 -07:00
Kubernetes Submit Queue b8c9ee8abb Merge pull request #46456 from jingxu97/May/allocatable
Automatic merge from submit-queue

Add local storage (scratch space) allocatable support

This PR adds the support for allocatable local storage (scratch space).
This feature is only for root file system which is shared by kubernetes
componenets, users' containers and/or images. User could use
--kube-reserved flag to reserve the storage for kube system components.
If the allocatable storage for user's pods is used up, some pods will be
evicted to free the storage resource.

This feature is part of local storage capacity isolation and described in the proposal https://github.com/kubernetes/community/pull/306

**Release note**:

```release-note
This feature exposes local storage capacity for the primary partitions, and supports & enforces storage reservation in Node Allocatable 
```
2017-06-03 00:24:29 -07:00
Kubernetes Submit Queue d063ce213f Merge pull request #46801 from dashpole/summary_container_restart
Automatic merge from submit-queue

[Flaky PR Test] Fix summary test

fixes issue: #46797 

As we can see in the [example failure build log](https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-node-kubelet/4319/build-log.txt), the summary containers are pinging google 100s of times a second.  This causes the summary container to be killed occasionally, and fail the test.  The summary containers are only supposed to ping every 10 seconds according to the current test.  As it turns out, we were missing a semicolon, and were not sleeping between pings.  For background, we ping google to generate network traffic, so that the summary test can validate network metrics.

This PR adds the semicolon to make the container sleep between calls, and decreases the sleep time from 10 seconds to 1 second, as 1 call / 10 seconds did not produce enough activity.

cc @kubernetes/kubernetes-build-cops @dchen1107
2017-06-02 18:02:19 -07:00
Ricky Pai 4e7fed4479 e2e node test for PodSpec HostAliases 2017-06-02 17:01:44 -07:00
Kubernetes Submit Queue a6f0033164 Merge pull request #46238 from yguo0905/package-validator
Automatic merge from submit-queue (batch tested with PRs 46648, 46500, 46238, 46668, 46557)

Support validating package versions in node conformance test

**What this PR does / why we need it**:

This PR adds a package validator in node conformance test for checking whether the locally installed packages meet the image spec.

**Special notes for your reviewer**:

The image spec for GKE (which has the package spec) will be in a separate PR. Then we will publish a new node conformance test image for GKE whose name should use the convention in https://github.com/kubernetes/kubernetes/issues/45760 and have `gke` in it.


**Release note**:
```
NONE
```
2017-06-02 15:20:47 -07:00
Ke Wu cb28ed1f95 Let COS docker validation node tests against gci-next-canary
We plan to use family gci-next-canary in container-vm-image-staging
for future Docker upgration and validation.
2017-06-02 14:52:01 -07:00
Vishnu kannan d45286c575 update cos kernel sha for node e2e GPU installer
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-06-01 17:09:18 -07:00
Jing Xu 943fc53bf7 Add predicates check for local storage request
This PR adds the check for local storage request when admitting pods. If
the local storage request exceeds the available resource, pod will be
rejected.
2017-06-01 15:57:50 -07:00
Jing Xu dd67e96c01 Add local storage (scratch space) allocatable support
This PR adds the support for allocatable local storage (scratch space).
This feature is only for root file system which is shared by kubernetes
componenets, users' containers and/or images. User could use
--kube-reserved flag to reserve the storage for kube system components.
If the allocatable storage for user's pods is used up, some pods will be
evicted to free the storage resource.
2017-06-01 15:57:50 -07:00
David Ashpole d1545e1e47 add semicolon 2017-06-01 13:32:59 -07:00
Yang Guo ecf214729d Support validating package versions in node conformance test 2017-05-30 17:44:40 -07:00
David Ashpole e2718f3bc5 fix crossbuild, verify container restarts, and restart only once 2017-05-30 13:15:22 -07:00
Kubernetes Submit Queue 54a47a6f1d Merge pull request #46308 from dashpole/summary_container_restart
Automatic merge from submit-queue (batch tested with PRs 46429, 46308, 46395, 45867, 45492)

Summary Test looks at pods that have containers that restart.

Occasionally, the node can report extra containers that had been restarted through the summary API.
This test change tests a pod that restarts, and hopefully should allow us to reproduce and debug this behavior.

/assign @dchen1107 

/release-note-none
2017-05-25 22:42:04 -07:00
Kubernetes Submit Queue ed8843406e Merge pull request #46303 from Random-Liu/fix-cos-image-project
Automatic merge from submit-queue (batch tested with PRs 46299, 46309, 46311, 46303, 46150)

Fix cos image project to cos-cloud.

Addressed https://github.com/kubernetes/kubernetes/pull/45136#discussion_r118092211.

@vishh @yujuhong @dchen1107
2017-05-24 23:19:09 -07:00
Kubernetes Submit Queue 8d88c55231 Merge pull request #46311 from dashpole/disable_ubuntu_gpu_test
Automatic merge from submit-queue (batch tested with PRs 46299, 46309, 46311, 46303, 46150)

Dont attach a GPU to ubuntu test machines for node e2e serial tests

This should fix flakes in the e2e_node serial suite.

@vishh I think this is what you were asking for...

/assign @vishh
2017-05-24 23:19:07 -07:00
Kubernetes Submit Queue b71ca6691b Merge pull request #46309 from Random-Liu/move-docker-validation-to-separate-project
Automatic merge from submit-queue (batch tested with PRs 46299, 46309, 46311, 46303, 46150)

Move docker validation test to separate project.

Docker validation test is leaking VMs because new docker version `DOCKER_VERSION=17.05.0-c` totally breaks the new gci image `GCE_IMAGES=gci-test-60-9579-0-0` with the `gci-docker-version` metadata specified.

The test successfully created the instance, but timed out when checking VM aliveness, and leaked the VM.

I've cleaned up all leaked VMs. This PR moves docker validation node e2e test into a separate project to not influencing other node e2e test.

@kewu1992 We should fix the docker automated validation test.

/cc @dchen1107 @yujuhong @abgworrall
2017-05-24 23:19:05 -07:00
David Ashpole 1a6572fc6c summary test now tests a pod that has containers that have restarted 2017-05-24 13:27:57 -07:00
Random-Liu 82f588b483 Fix cos image project to cos-cloud. 2017-05-23 15:12:03 -07:00
David Ashpole 8341d544f3 remove unused test properties 2017-05-23 14:39:18 -07:00
David Ashpole 20eb016597 dont attach a GPU to ubuntu machines 2017-05-23 14:34:18 -07:00
Random-Liu dc023144a3 Move docker validation test to separate project. 2017-05-23 14:07:15 -07:00
FengyunPan 287f703d3a Close file after os.Open() 2017-05-22 21:51:11 +08:00
Vishnu kannan 86b5edb79a Update COS version to m59
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-05-20 21:17:19 -07:00
Vishnu kannan 1e77594958 Adding an installer script that installs Nvidia drivers in Container Optimized OS
Packaged the script as a docker container stored in gcr.io/google-containers
A daemonset deployment is included to make it easy to consume the installer
A cluster e2e has been added to test the installation daemonset along with verifying installation
by using a sample CUDA application.
Node e2e for GPUs updated to avoid running on nodes without GPU devices.

Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-05-20 21:17:19 -07:00
Kubernetes Submit Queue 112ed869c7 Merge pull request #46053 from dashpole/test_eviction_metrics
Automatic merge from submit-queue (batch tested with PRs 46033, 46122, 46053, 46018, 45981)

Log age of stats used for evictions during eviction tests

I recently added prometheus metrics for the age of the metrics used for evictions #43031.  It would be nice to surface these during eviction tests, so I can better assess how old stats are, and whether or not the age of stats causes extra evictions.

This isnt super-high priority, and can be done after code-freeze, since it is a testing improvement.  Feel free to take a look whenever either of you has time.

/assign @mtaufen 
/assign @Random-Liu
2017-05-19 23:29:28 -07:00
Kubernetes Submit Queue 51f3ac1b99 Merge pull request #45004 from feiskyer/hostnetwork
Automatic merge from submit-queue

Add node e2e tests for hostNetwork

**What this PR does / why we need it**:

Add node e2e tests for hostNetwork.

**Which issue this PR fixes**

Part of #44118.

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```

/assign @Random-Liu @yujuhong
2017-05-19 01:18:07 -07:00
David Ashpole 0bd0d705e3 log age of stats used for evictions during eviction tests 2017-05-18 13:51:23 -07:00
Kubernetes Submit Queue 23f0fe8632 Merge pull request #45901 from Random-Liu/fix-node-e2e
Automatic merge from submit-queue (batch tested with PRs 44520, 45253, 45838, 44685, 45901)

Fix node e2e panic when not using image config file.

In https://github.com/kubernetes/kubernetes/pull/45430, `resources` field in image config is a pointer, and we only initialize it when using image config file.

However, we still have test specifying images directly without image config file, this will cause those test to panic. See:
* https://k8s-testgrid.appspot.com/google-docker#kubelet
* https://k8s-testgrid.appspot.com/google-docker#e2e-cos

This PR fixes this.

@vishh @mtaufen @kewu1992
2017-05-16 21:28:02 -07:00
Kubernetes Submit Queue 85775105f1 Merge pull request #44520 from dashpole/test_eviction_fix
Automatic merge from submit-queue (batch tested with PRs 44520, 45253, 45838, 44685, 45901)

Ensure ordering of using dynamic kubelet config and setting up tests.

This PR simply places the body of the eviction test within its own context.  This ensures that the kubelet config is set before the pods are created, and that the kubelet config is reverted only after the pods are deleted.
2017-05-16 21:27:54 -07:00
Pengfei Ni f9eafea8bf Add node e2e tests for hostNetwork 2017-05-17 10:31:17 +08:00