Commit Graph

218 Commits (ce4fd07b0624d654b5b7c9bda4747b2b7f239876)

Author SHA1 Message Date
Michail Kargakis 69bb4e4c84 test: add/remove myself from tests appropriately 2016-09-15 12:27:05 +02:00
Kubernetes Submit Queue f2951a54f9 Merge pull request #30674 from ivan4th/add-e2e-tests-for-wrapped-volume-race
Automatic merge from submit-queue

Add e2e tests that check for wrapped volume race

This PR adds two new e2e tests that reproduce the race condition fixed in #29641 (see e.g. #29297)

In order to observe the race, you need to revert the PR that fixes it, via e.g.
```
git revert -n df1e925143
```
or
```
curl -sL https://github.com/kubernetes/kubernetes/pull/29641.patch | patch -p1 -R
```

The tests are `[Slow]` because they need to run several passes that involve creating pods with many volumes. They also are `[Serial]` because the load on the cluster may affect reproducibility of the race. They take about ~450s each when they fail on standard GCE cluster created by `go run hack/e2e.go -v --up`. `git_repo` test takes about 66s to run when it succeeds (fix PR not reverted) and `configmap` test takes about 546s in this case because configmap mounting is slower and still requires 3 passes x 5 pods x 50 configmap volumes to fail constantly with fix PR reverted. Probably these times can be reduced but frankly I've already spent quite a bit of time on tuning the numbers to find a balance between reproducibility and speed.

Managed to reproduce the problem in more or less reliable way for `configMap` and `gitRepo` volumes. Tried to reproduce it for `secret` volumes too but without success so far because they use tmpfs-based `emptyDir` variety. For `downwardAPI` volumes I expect the same problems with race reproducibility as with `secret` volumes, although I think some e2e races were caused by the bug, e.g. #29633.

The tests operate by creating several pods (via an RC) with many volumes and waiting for them to become Running. It sets node affinity for pods so that they all get created on a single node (the first one in the node list). The race condition leads to volume mount failures with slow retries, thus causing the test to time out.

The test failures look like this:

configmap:
```
• Failure [435.547 seconds]
[k8s.io] Wrapped EmptyDir volumes
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:709
  should not cause race condition when used for configmaps [Serial] [Slow] [It]
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/wrapped_empty_dir.go:170

  Failed waiting for pod wrapped-volume-race-8c097734-6376-11e6-9ffa-5254003793ad-acbtt to enter running state
  Expected error:
      <*errors.errorString | 0xc8201758d0>: {
          s: "timed out waiting for the condition",
      }
      timed out waiting for the condition
  not to have occurred

  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/wrapped_empty_dir.go:395
```
You'll see errors like this in kubelet log on the first node in the cluster:
```
E0816 00:27:23.319431    3510 configmap.go:174] Error creating atomic writer: stat /var/lib/kubelet/pods/e5986355-6347-11e6-a5d7-42010af00002/volumes/kubernetes.io~configmap/racey-configmap-14: no such file or directory
E0816 00:27:23.319478    3510 nestedpendingoperations.go:232] Operation for "\"kubernetes.io/configmap/e5986355-6347-11e6-a5d7-42010af00002-racey-configmap-14\" (\"e5986355-6347-11e6-a5d7-42010af00002\")" failed. No retries permitted until 2016-08-16 00:28:27.319450118 +0000 UTC (durationBeforeRetry 1m4s). Error: MountVolume.SetUp failed for volume "kubernetes.io/configmap/e5986355-6347-11e6-a5d7-42010af00002-racey-configmap-14" (spec.Name: "racey-configmap-14") pod "e5986355-6347-11e6-a5d7-42010af00002" (UID: "e5986355-6347-11e6-a5d7-42010af00002") with: stat /var/lib/kubelet/pods/e5986355-6347-11e6-a5d7-42010af00002/volumes/kubernetes.io~configmap/racey-configmap-14: no such file or directory
```

git_repo:
```
• Failure [455.035 seconds]                                                                                                                                                                                                                           [0/1882]
[k8s.io] Wrapped EmptyDir volumes
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:709
  should not cause race condition when used for git_repo [Serial] [Slow] [It]
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/wrapped_empty_dir.go:179

  Failed waiting for pod wrapped-volume-race-71b12b3d-6375-11e6-9ffa-5254003793ad-b0slz to enter running state
  Expected error:
      <*errors.errorString | 0xc8201758d0>: {
          s: "timed out waiting for the condition",
      }
      timed out waiting for the condition
  not to have occurred

  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/wrapped_empty_dir.go:395
```
Errors in kubelet log:
```
E0815 23:41:08.670203    3510 nestedpendingoperations.go:232] Operation for "\"kubernetes.io/git-repo/97636bd8-6341-11e6-a5d7-42010af00002-racey-git-repo-8\" (\"97636bd8-6341-11e6-a5d7-42010af00002\")" failed. No retries permitted until 2016-08-15 23:42:12.670181604 +0000 UTC (durationBeforeRetry 1m4s). Error: MountVolume.SetUp failed for volume "kubernetes.io/git-repo/97636bd8-6341-11e6-a5d7-42010af00002-racey-git-repo-8" (spec.Name: "racey-git-repo-8") pod "97636bd8-6341-11e6-a5d7-42010af00002" (UID: "97636bd8-6341-11e6-a5d7-42010af00002") with: failed to exec 'git clone http://10.0.68.35:2345 test': : chdir /var/lib/kubelet/pods/97636bd8-6341-11e6-a5d7-42010af00002/volumes/kubernetes.io~git-repo/racey-git-repo-8: no such file or directory
```

Generally, the races cause unexpected "no such directory" errors in kubelet logs with subsequent volume mount failures.

I've added race tests to e2e test `empty_dir_wrapper.go` ("EmptyDir wrapper volumes"). This test was added in #18445, the same PR that introduced the race bug. The original purpose of the test was making sure that no conflicts occur between different wrapped emptyDir volumes, so I've replaced "should becomes" with "should not conflict" in the first `It(...)`.
2016-09-11 03:39:21 -07:00
Kubernetes Submit Queue 8780961e94 Merge pull request #32112 from soltysh/test_owners
Automatic merge from submit-queue

Updated test owners and assigned ScheduledJobs to soltysh

I've updated test owners by running `hack/update_owners.py` and assigned all ScheduledJob related issues to myself. 

@fejta ptal
2016-09-09 00:48:14 -07:00
Kubernetes Submit Queue ddcbdcb8c8 Merge pull request #31535 from aveshagarwal/master-e2e-downward-api-issues
Automatic merge from submit-queue

Fix downward api tests to output node allocatable not node capacity

@kubernetes/rh-cluster-infra @derekwaynecarr
2016-09-07 16:25:19 -07:00
Maciej Szulik ac1335c979 Updated test owners and assigned ScheduledJobs to soltysh 2016-09-06 11:38:57 +02:00
Ryan Hitchman 0c80bce7a7 Fix test owners for horizontal pod autoscaling. 2016-08-30 13:30:45 -07:00
Erick Fejta fdb085ff61 Add missing tests 2016-08-29 15:22:06 -07:00
Avesh Agarwal db74d4dbc2 Fix downward api tests to output node allocatable not node capacity 2016-08-26 16:13:24 -04:00
Erick Fejta 5c821c1fed Update test assignments 2016-08-19 18:43:40 -07:00
Ivan Shvedunov 8ff00d17d8 Add e2e tests that check for wrapped volume race
See #29641 for details.
2016-08-17 12:14:14 +03:00
Erick Fejta 17d91dd2ec Assign Probing Container tests to Random-Liu 2016-08-09 17:20:00 -07:00
Kubernetes Submit Queue 7da75631f6 Merge pull request #29956 from david-mcmahon/test_owners
Automatic merge from submit-queue

Remove myself from test ownership.

These are almost certainly not correct, but probably more likely owners than myself.
@rmmh @dchen1107 @timstclair @erictune @mtaufen @caesarxuchao @fgrzadkowski @krousey @lavalamp
2016-08-04 00:01:51 -07:00
David McMahon 3a88747ef8 Remove myself from test ownership. 2016-08-03 14:34:31 -07:00
gmarek f1167e9b9c Change the owner of JSON NodeAffinity test 2016-08-03 10:42:07 +02:00
Alex Robinson 0ed8fa5693 Give away my e2e tests. 2016-08-02 22:43:20 +00:00
Ryan Hitchman 5d53b3a686 Update test-owners with new tests, add catch-all assignment to
test-infra team.

A future update to the munger will use this to assign any flake without
an explicit owner to a member of the test-infra team.
2016-08-01 16:02:39 -07:00
Ryan Hitchman 616e938662 Address PR comments, randomly assign owners for new tests. 2016-07-06 13:22:53 -07:00
Ryan Hitchman 3d485098c3 Add test/test_owners.csv, for automatic assignment of test failures.
This file will be read by the munger -- see kubernetes/contrib#1264

This also includes a simple script to do minor automatic updates to the
CSV.
2016-07-01 17:39:14 -07:00