Commit Graph

913 Commits (381729f1fb88e01d35b7429b86476ca6304b639e)

Author SHA1 Message Date
Chao Xu 60604f8818 run hack/update-all 2017-06-22 11:31:03 -07:00
Chao Xu f4989a45a5 run root-rewrite-v1-..., compile 2017-06-22 10:25:57 -07:00
mbohlool c91a12d205 Remove all references to types.UnixUserID and types.UnixGroupID 2017-06-21 04:09:07 -07:00
Kubernetes Submit Queue cc645a8c6f Merge pull request #46327 from supereagle/mark-network-plugin-dir-deprecated
Automatic merge from submit-queue (batch tested with PRs 46327, 47166)

mark --network-plugin-dir deprecated for kubelet

**What this PR does / why we need it**:

**Which issue this PR fixes** : fixes #43967

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2017-06-19 11:23:54 -07:00
Kubernetes Submit Queue b6faf34862 Merge pull request #47530 from mindprince/issue-47388-remove-dead-code
Automatic merge from submit-queue (batch tested with PRs 47530, 47679)

Use cos-stable-59-9460-64-0 instead of cos-beta-59-9460-20-0.

Remove dead code that has now moved to another repo as part of #47467

**Release note**:
```release-note
NONE
```

/sig node
2017-06-16 20:57:58 -07:00
Rohit Agarwal 3a86c97cf6 Use cos-stable-59-9460-64-0 instead of cos-beta-59-9460-20-0.
- It contains a fix for ipaliasing.
- It contains a fix which decouples GPU driver installation from kernel
version.

Remove dead code that has now moved to another repo as part of #47467
2017-06-16 13:48:50 -07:00
Bowei Du 1ed4afca80 Fix hardcoded CIDR in the validation_test
The ideal fix is to not hardcode these values.

fixes #47479
2017-06-15 22:15:56 -07:00
Kubernetes Submit Queue d797c219b3 Merge pull request #47260 from yguo0905/perf-dash
Automatic merge from submit-queue (batch tested with PRs 47470, 47260, 47411, 46852, 46135)

Logs node e2e perf data to standalone json files

Fixes the node-dash-perf issue in https://github.com/kubernetes/kubernetes/issues/44003.

- Move perf data types to `test/e2e/perftype/perftype.go` so that the node-perf-dash can depend on.
- Logs the perf data to standalone json files so that node-perf-dash can consume it easily. A sample run of `ci-kubernetes-node-kubelet-benchmark` is at https://console.cloud.google.com/storage/browser/ygg-gke-dev-bucket/e2e-node-test/ci-kubernetes-node-kubelet-benchmark/1.

The corresponding changes in node-perf-dash is at https://github.com/kubernetes/contrib/pull/2628.

**Release note**:
`None`

/sig node
/area node-e2e
/assign @Random-Liu
2017-06-14 12:52:18 -07:00
Rohit Agarwal 9c0bf19f80 Use cos-stable-59-9460-60-0 and newer installer for GPU node e2e tests. 2017-06-13 15:36:20 -07:00
Kubernetes Submit Queue f4d2c7b931 Merge pull request #46441 from dashpole/eviction_time
Automatic merge from submit-queue

Shorten eviction tests, and increase test suite timeout

After #43590, the eviction manager is less aggressive when evicting pods.  Because of that, many runs in the flaky suite time out.
To shorten the inode eviction test, I have lowered the eviction threshold.
To shorten the allocatable eviction test, I now set KubeReserved = NodeMemoryCapacity - 200Mb, so that any pod using 200Mb will be evicted.  This shortens this test from 40 minutes, to 10 minutes.
While this should be enough to not hit the flaky suite timeout anymore, it is better to keep lower individual test timeouts than a lower suite timeout, since hitting the suite timeout means that even successful test runs are not reported.

/assign @Random-Liu @mtaufen 

issue: #31362
2017-06-13 12:58:22 -07:00
Yang Guo 29b2db5af3 Logs node e2e perf data to standalone json files 2017-06-12 14:27:56 -07:00
David Ashpole 3365cca78a shorten eviction testst and lengthen flaky suite timeout 2017-06-12 12:56:45 -07:00
Rohit Agarwal f7a563435f Fix bad check in node e2e tests for GPUs.
When no nvidia device was attached, the -ne check had a syntax error:

    sh: -ne: argument expected

This resulted in 'Success' being echoed and the test passing incorrectly.
This was found while debugging issue #47216
2017-06-11 19:25:35 -07:00
Kubernetes Submit Queue 3040cba17d Merge pull request #47144 from jingxu97/May/emptyDir
Automatic merge from submit-queue

Fix local capacity isolation test
2017-06-09 12:17:19 -07:00
Kubernetes Submit Queue f75478875a Merge pull request #47113 from feiskyer/cri
Automatic merge from submit-queue

Kubelet: rename cri package name to pkg/kubelet/apis/cri/v1alpha1/runtime

**What this PR does / why we need it**:

We have moved CRI from api/v1alpha1/runtime to apis/cri/v1alpha1, which changed the package name of CRI. This would cause a significant problem: old-versioned runtime (based on CRI in v1.6) doesn't work with latest kubelet v1.7, and vice versa.

This PR renames cri package name to `pkg/kubelet/apis/cri/v1alpha1/runtime` for fixing the problem.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: 

fixes #47012

**Special notes for your reviewer**:

Should be included in v1.7.

**Release note**:

```release-note
CRI has been moved to package `pkg/kubelet/apis/cri/v1alpha1/runtime`.
```
2017-06-09 10:08:36 -07:00
Kubernetes Submit Queue 3a5df705fe Merge pull request #47190 from mindprince/faster-node-e2e-gci
Automatic merge from submit-queue

Move the nvidia installer to the beginning.

When the installer runs for the first time, it disables loadpin and restarts
the node. So, it is better to run it in the beginning so that we can avoid
redoing the later steps. One of the later steps include downloading a tar file
and untarring it. Doing that only once saves around 1m30s in test runtime for
the gci image.

/sig node
/area node-e2e

```release-note
NONE
```
2017-06-09 09:19:16 -07:00
Pengfei Ni 22e99504d7 Update CRI references 2017-06-09 10:16:40 +08:00
Kubernetes Submit Queue 3a96c31de5 Merge pull request #46885 from kewu1992/test_gci_next_canary
Automatic merge from submit-queue (batch tested with PRs 46885, 47197)

Let COS docker validation node test against gci-next-canary

**What this PR does / why we need it**:
This is for COS docker validation node test. We plan to use family gci-next-canary in container-vm-image-staging for future Docker upgration and validation.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #47134

**Special notes for your reviewer**:

**Release note**:

```release-note
```
2017-06-08 15:46:41 -07:00
Rohit Agarwal 4a5badfafa Move the nvidia installer to the beginning.
When the installer runs for the first time, it disables loadpin and restarts
the node. So, it is better to run it in the beginning so that we can avoid
redoing the later steps. One of the later steps include downloading a tar file
and untarring it. Doing that only once saves around 1m30s in test runtime for
the gci image.
2017-06-08 09:55:14 -07:00
Kubernetes Submit Queue 9c1b2aa9b5 Merge pull request #46743 from Random-Liu/bump-up-npd
Automatic merge from submit-queue

Bump up npd version to v0.4.0

Fixes #47070.

Bump up npd version to [v0.4.0](https://github.com/kubernetes/node-problem-detector/releases/tag/v0.4.0).

```release-note
Bump up Node Problem Detector version to v0.4.0, which added support of parsing log from /dev/kmsg and ABRT.
```

/cc @dchen1107 @ajitak
2017-06-08 08:24:18 -07:00
Jing Xu 426d44ded4 Fix local capacity isolation test
Fix issue #47128, also add flaky tag for this evition test
2017-06-08 06:30:29 -07:00
Kubernetes Submit Queue 6ee028249b Merge pull request #46385 from rickypai/rpai/host_mapping_node_e2e_test
Automatic merge from submit-queue (batch tested with PRs 43005, 46660, 46385, 46991, 47103)

add e2e node test for Pod hostAliases feature

**What this PR does / why we need it**: adds node e2e test for #45148 

tests requested in https://github.com/kubernetes/kubernetes/issues/43632#issuecomment-298434125

**Release note**:
```release-note
NONE
```

@yujuhong @thockin
2017-06-07 13:31:00 -07:00
Random-Liu 1d3979190c Bump up npd version to v0.4.0 2017-06-06 16:30:02 -07:00
Kubernetes Submit Queue 6ed4bc7b97 Merge pull request #46828 from cblecker/links-update
Automatic merge from submit-queue (batch tested with PRs 46718, 46828, 46988)

Update docs/ links to point to main site

**What this PR does / why we need it**:
This updates various links to either point to kubernetes.io or to the kubernetes/community repo instead of the legacy docs/ tree in k/k
Pre-requisite for #46813

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:

**Release note**:
```release-note
NONE
```

@kubernetes/sig-docs-maintainers @chenopis @ahmetb @thockin
2017-06-06 11:43:18 -07:00
Kubernetes Submit Queue 3338d784ba Merge pull request #46899 from mindprince/issue-46889-node-e2e-gpu-cos-fix
Automatic merge from submit-queue (batch tested with PRs 46897, 46899, 46864, 46854, 46875)

Wait for cloud-init to finish before starting tests.

This fixes #46889.


**Release note**:
```release-note
NONE
```
2017-06-06 05:22:42 -07:00
Christoph Blecker 1bdc7a29ae
Update docs/ URLs to point to proper locations 2017-06-05 22:13:54 -07:00
Jing Xu 0b13aee0c0 Add EmptyDir Volume and local storage for container overlay Isolation
This PR adds two features:
1. add support for isolating the emptyDir volume use. If user
sets a size limit for emptyDir volume, kubelet's eviction manager
monitors its usage
and evict the pod if the usage exceeds the limit.
2. add support for isolating the local storage for container overlay. If
the container's overly usage exceeds the limit defined in container
spec, eviction manager will evict the pod.
2017-06-05 12:05:48 -07:00
Rohit Agarwal 1561f55c4c Wait for cloud-init to finish before starting tests.
This fixes #46889.
2017-06-05 10:50:24 -07:00
Kubernetes Submit Queue 6fef1a1deb Merge pull request #46810 from vishh/gpu-cos-image-validation
Automatic merge from submit-queue (batch tested with PRs 46734, 46810, 46759, 46259, 46771)

Update the COS kernel sha for node e2e gpu installer

cc @mindprince

Relevant COS image - https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/jenkins/image-config-serial.yaml#L19
2017-06-05 06:51:23 -07:00
Kubernetes Submit Queue e6c74bbaaf Merge pull request #46221 from FengyunPan/close-file
Automatic merge from submit-queue

Close file after os.Open()

None
2017-06-03 04:42:00 -07:00
Kubernetes Submit Queue b8c9ee8abb Merge pull request #46456 from jingxu97/May/allocatable
Automatic merge from submit-queue

Add local storage (scratch space) allocatable support

This PR adds the support for allocatable local storage (scratch space).
This feature is only for root file system which is shared by kubernetes
componenets, users' containers and/or images. User could use
--kube-reserved flag to reserve the storage for kube system components.
If the allocatable storage for user's pods is used up, some pods will be
evicted to free the storage resource.

This feature is part of local storage capacity isolation and described in the proposal https://github.com/kubernetes/community/pull/306

**Release note**:

```release-note
This feature exposes local storage capacity for the primary partitions, and supports & enforces storage reservation in Node Allocatable 
```
2017-06-03 00:24:29 -07:00
Kubernetes Submit Queue d063ce213f Merge pull request #46801 from dashpole/summary_container_restart
Automatic merge from submit-queue

[Flaky PR Test] Fix summary test

fixes issue: #46797 

As we can see in the [example failure build log](https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-node-kubelet/4319/build-log.txt), the summary containers are pinging google 100s of times a second.  This causes the summary container to be killed occasionally, and fail the test.  The summary containers are only supposed to ping every 10 seconds according to the current test.  As it turns out, we were missing a semicolon, and were not sleeping between pings.  For background, we ping google to generate network traffic, so that the summary test can validate network metrics.

This PR adds the semicolon to make the container sleep between calls, and decreases the sleep time from 10 seconds to 1 second, as 1 call / 10 seconds did not produce enough activity.

cc @kubernetes/kubernetes-build-cops @dchen1107
2017-06-02 18:02:19 -07:00
Ricky Pai 4e7fed4479 e2e node test for PodSpec HostAliases 2017-06-02 17:01:44 -07:00
Kubernetes Submit Queue a6f0033164 Merge pull request #46238 from yguo0905/package-validator
Automatic merge from submit-queue (batch tested with PRs 46648, 46500, 46238, 46668, 46557)

Support validating package versions in node conformance test

**What this PR does / why we need it**:

This PR adds a package validator in node conformance test for checking whether the locally installed packages meet the image spec.

**Special notes for your reviewer**:

The image spec for GKE (which has the package spec) will be in a separate PR. Then we will publish a new node conformance test image for GKE whose name should use the convention in https://github.com/kubernetes/kubernetes/issues/45760 and have `gke` in it.


**Release note**:
```
NONE
```
2017-06-02 15:20:47 -07:00
Ke Wu cb28ed1f95 Let COS docker validation node tests against gci-next-canary
We plan to use family gci-next-canary in container-vm-image-staging
for future Docker upgration and validation.
2017-06-02 14:52:01 -07:00
Vishnu kannan d45286c575 update cos kernel sha for node e2e GPU installer
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-06-01 17:09:18 -07:00
Jing Xu 943fc53bf7 Add predicates check for local storage request
This PR adds the check for local storage request when admitting pods. If
the local storage request exceeds the available resource, pod will be
rejected.
2017-06-01 15:57:50 -07:00
Jing Xu dd67e96c01 Add local storage (scratch space) allocatable support
This PR adds the support for allocatable local storage (scratch space).
This feature is only for root file system which is shared by kubernetes
componenets, users' containers and/or images. User could use
--kube-reserved flag to reserve the storage for kube system components.
If the allocatable storage for user's pods is used up, some pods will be
evicted to free the storage resource.
2017-06-01 15:57:50 -07:00
David Ashpole d1545e1e47 add semicolon 2017-06-01 13:32:59 -07:00
supereagle dc9f0f9729 mark --network-plugin-dir deprecated for kubelet, and update related bootstrap scripts 2017-06-01 22:06:44 +08:00
Yang Guo ecf214729d Support validating package versions in node conformance test 2017-05-30 17:44:40 -07:00
David Ashpole e2718f3bc5 fix crossbuild, verify container restarts, and restart only once 2017-05-30 13:15:22 -07:00
Kubernetes Submit Queue 54a47a6f1d Merge pull request #46308 from dashpole/summary_container_restart
Automatic merge from submit-queue (batch tested with PRs 46429, 46308, 46395, 45867, 45492)

Summary Test looks at pods that have containers that restart.

Occasionally, the node can report extra containers that had been restarted through the summary API.
This test change tests a pod that restarts, and hopefully should allow us to reproduce and debug this behavior.

/assign @dchen1107 

/release-note-none
2017-05-25 22:42:04 -07:00
Kubernetes Submit Queue ed8843406e Merge pull request #46303 from Random-Liu/fix-cos-image-project
Automatic merge from submit-queue (batch tested with PRs 46299, 46309, 46311, 46303, 46150)

Fix cos image project to cos-cloud.

Addressed https://github.com/kubernetes/kubernetes/pull/45136#discussion_r118092211.

@vishh @yujuhong @dchen1107
2017-05-24 23:19:09 -07:00
Kubernetes Submit Queue 8d88c55231 Merge pull request #46311 from dashpole/disable_ubuntu_gpu_test
Automatic merge from submit-queue (batch tested with PRs 46299, 46309, 46311, 46303, 46150)

Dont attach a GPU to ubuntu test machines for node e2e serial tests

This should fix flakes in the e2e_node serial suite.

@vishh I think this is what you were asking for...

/assign @vishh
2017-05-24 23:19:07 -07:00
Kubernetes Submit Queue b71ca6691b Merge pull request #46309 from Random-Liu/move-docker-validation-to-separate-project
Automatic merge from submit-queue (batch tested with PRs 46299, 46309, 46311, 46303, 46150)

Move docker validation test to separate project.

Docker validation test is leaking VMs because new docker version `DOCKER_VERSION=17.05.0-c` totally breaks the new gci image `GCE_IMAGES=gci-test-60-9579-0-0` with the `gci-docker-version` metadata specified.

The test successfully created the instance, but timed out when checking VM aliveness, and leaked the VM.

I've cleaned up all leaked VMs. This PR moves docker validation node e2e test into a separate project to not influencing other node e2e test.

@kewu1992 We should fix the docker automated validation test.

/cc @dchen1107 @yujuhong @abgworrall
2017-05-24 23:19:05 -07:00
David Ashpole 1a6572fc6c summary test now tests a pod that has containers that have restarted 2017-05-24 13:27:57 -07:00
Random-Liu 82f588b483 Fix cos image project to cos-cloud. 2017-05-23 15:12:03 -07:00
David Ashpole 8341d544f3 remove unused test properties 2017-05-23 14:39:18 -07:00
David Ashpole 20eb016597 dont attach a GPU to ubuntu machines 2017-05-23 14:34:18 -07:00