Commit Graph

3072 Commits (c3b124d550122f62101b7fbeffc0edb25c7b6974)

Author SHA1 Message Date
k8s-merge-robot 85de6acadc Merge pull request #23208 from deads2k/fix-version-override
Automatic merge from submit-queue

make storage enablement, serialization, and location orthogonal

This allows a caller (command-line, config, code) to specify multiple separate pieces of config information regarding storage and have them properly composed at runtime.  The information provided is exposed through interfaces to allow alternate implementations, which allows us to change the expression of the config moving forward.  I also fixed up the types to be correct as I moved through.

The same options still exist, but they're composed slightly differently
 1. specify target etcd servers per Group or per GroupResource
 1. specify storage GroupVersions per Groups or per GroupResource
 1. specify etcd prefixes per GroupVersion or per GroupResource
 1. specify that multiple GroupResources share the same location in etcd
 1. enable GroupResources by GroupVersion or by GroupResource whitelist or GroupResource blacklist

The `storage.Interface` is built per GroupResource by:
 1. find the set of possible storage GroupResource based on the priority list of cohabitators
 1. choose a GroupResource from the set by looking at which Groups have the resource enabled
 1. find the target etcd server, etcd prefix, and storage encoding based on the GroupResource

The API server can have its resources separately enabled, but for now I've kept them linked.

@liggitt I think we need this (or something like it) to be able to go from config to these interfaces.  Given another round of refactoring, we may be able to reshape these to be more forward driving.

@smarterclayton this is important for rebasing and for a seamless 1.2 to 1.3 migration for us.
2016-04-21 08:24:29 -07:00
k8s-merge-robot 35ea9b87b8 Merge pull request #24185 from jsafrane/devel/stabilize-provisioning-e2e
Automatic merge from submit-queue

Increase provisioning test timeouts.

We've encountered flakes in our e2e infrastructure when kubelet took more than one minute to detach a volume used by a deleted pod.

Let's increase the wait period from 1 to 3 minutes. This slows down the test by 2 minutes, but it makes the test more stable.

In addition, when kubelet cannot detach a volume for 3 minutes, let the test wait for additional recycle controller retry interval (10 minutes) and hope the volume is deleted by then. This should not increase usual test time, it makes the test stable when kubelet is _extremely_ slow when releasing the volume.

Fixes: #24161
2016-04-21 06:03:37 -07:00
deads2k 6670b73b18 make storage enablement, serialization, and location orthogonal 2016-04-21 08:18:55 -04:00
deads2k 3be4b690ea create a negotiating serializer that wraps a single serializer 2016-04-21 07:51:59 -04:00
Prashanth Balasubramanian 0ac10c6cc2 PetSet type, apps apigroup 2016-04-20 18:49:31 -07:00
k8s-merge-robot 8a76a1bd36 Merge pull request #24234 from goltermann/vetclean
Automatic merge from submit-queue

Enable go vet.
2016-04-20 14:50:37 -07:00
goltermann 3fa6c6f6d9 Enable vet 2016-04-20 09:48:24 -07:00
k8s-merge-robot 3753e2bded Merge pull request #24180 from AdoHe/reuse_node_port
Automatic merge from submit-queue

Fix unintended change of Service.spec.ports[].nodePort during kubectl apply

Please refer #23551 for more detail. @bgrant0607 I think this simple fix should be ok to reuse nodePort. @thockin ptal.

Release note: Fix unintended change of `Service.spec.ports[].nodePort` during `kubectl apply`.
2016-04-20 08:51:40 -07:00
k8s-merge-robot f3f6ffaa28 Merge pull request #24524 from wojtek-t/fix_scheduler_2
Automatic merge from submit-queue

Add RC and container pors to scheduler benchmark

Fix #23263

Ref  #24408
However - scheduler throughput is still ~140 initially, whereas in reality we have 35-40. There are still significant difference we should understand.

@hongchaodeng @xiang90
2016-04-20 07:18:20 -07:00
Wojciech Tyczynski a4b3f47347 Add RC and container pors to scheduler benchmark 2016-04-20 15:10:57 +02:00
k8s-merge-robot 15ed9dbd02 Merge pull request #23771 from jayunit100/ClusterVerificationFramework
Automatic merge from submit-queue

Cluster Verification Framework

I've spent the last few days looking at the general patterns of verification we have that we tend to reuse in the e2es.  Basically, we need 
 
- label filters
- forEach and WaitFor (where forEach doesn't necessarily waitFor anything).
- timeouts
- multiple phases (reusable definition of state)
- an extensible way to define cluster state that can evolve over time in a data object rather than as a set of parameters that have magic semantics

This PR 
- implements the abstract above functionality declaratively, and w/o hidden semantics.
- addresses the sprawling duplicate methods in #23540, so that we can phase out the wrapper methods and replace them with well defined, extensible semantics for cluster state.
- fixes the recently discovered #23730 issue (where kubectl.go is relying on examples.go, which is obviously wacky) by using the new framework to implement forEachPod in just a couple of lines and migrating the wrapper function into framework.go.

There is some cleanup to do here, but this is seemingly working for a couple of use cases that are important (spark,cassandra,...,kubectl) tests. - i played with a few different ideas and this wound up seeming to be the most natural implementation from a usability standpoint... 

in any case, just thought id push this up as a first iteration, open to feedback.

@kubernetes/sig-testing @timothysc
2016-04-20 04:23:21 -07:00
k8s-merge-robot 1c80864913 Merge pull request #24257 from zmerlynn/1000nodes
Automatic merge from submit-queue

Fix DNS test for larger clusters

On GKE, we scale the number of DNS pods based on the cluster size. For
testing on larger clusters, relax the DNS pod check.
2016-04-20 03:49:40 -07:00
AdoHe 16960d3ad2 fix reuse nodePort issue 2016-04-20 02:30:03 -04:00
k8s-merge-robot 86544c2288 Merge pull request #24426 from pwittrock/flaky
Automatic merge from submit-queue

Incremental improvements to kubelet e2e tests

- Add keep-alive to ssh connection
- Don't try to stop services on image-based runs
- Increase jenkins ci timeout to 90 minutes to accomadate unpredictable go build times
- Remove spammy log statement
2016-04-19 22:36:40 -07:00
k8s-merge-robot 4638f2f355 Merge pull request #24466 from wojtek-t/fix_scheduler_benchmark
Automatic merge from submit-queue

Improve script for running scheduler benchmarks

Without this change, this script didn't work in my environment - it's making more consistent with other scripts.

@hongchaodeng @xiang90
2016-04-19 08:43:50 -07:00
Wojciech Tyczynski 762bfa3d97 Improve script for running scheduler benchmarks 2016-04-19 16:23:23 +02:00
k8s-merge-robot db28f73c3b Merge pull request #24282 from goltermann/spelling
Automatic merge from submit-queue

Fix misspellings in comments
2016-04-19 03:47:07 -07:00
k8s-merge-robot 56d7579bfd Merge pull request #24359 from janetkuo/rollover-e2e-no-events
Automatic merge from submit-queue

Avoid relying on events in deployment rollover e2e test

Fixes #22028

@kubernetes/sig-config
2016-04-18 16:51:22 -07:00
Phillip Wittrock 90d2f9ad5e Incremental improvements to kubelet e2e tests
- Add keep-alive to ssh connection
- Don't try to stop services on image-based runs
- Increase jenkins ci timeout to 90 minutes to accomadate unpredictable go build times
- Remove spammy log statement
2016-04-18 13:56:07 -07:00
k8s-merge-robot 5ad27f2720 Merge pull request #23575 from deads2k/shared-cache
Automatic merge from submit-queue

shared controller informers

Related to https://github.com/kubernetes/kubernetes/issues/14978

This demonstrates how controllers which use an `Informer`, would be able to share the same watch and store.  A similar "setup and run" approach could be done for an `IndexInformer` to share that cache.  I found adding listeners here to be easier than intercepting at the watch interface (problems with resourceVersion) or the reflector (same plumbing, but you have to fan out to multiple stores).

We could also use the cache we build here to back several of the admission plugins that currently run their own lookup caches today.

If there's interest, I can finish out the `SharedInformer` and switch the low hanging fruit over.  

@kubernetes/rh-cluster-infra @smarterclayton @liggitt @wojtek-t
2016-04-18 07:48:29 -07:00
k8s-merge-robot d37e6ad332 Merge pull request #24126 from Random-Liu/fix-pull-image
Automatic merge from submit-queue

Fix PullImage and add corresponding node e2e test

Fixes #24101. This is a bug introduced by #23506, since ref #23563.

The root cause of #24101 is described [here](https://github.com/kubernetes/kubernetes/issues/24101#issuecomment-208547623).

This PR
1) Fixes #24101 by decoding the messages returned during pulling image, and return error if any of the messages contains error.
2) Add the node e2e test to detect this kind of failure.
3) Get present check out of `ConformanceImage.Remove()` and `ConformanceImage.Pull()`. Because sometimes we may expect error to occur in `PullImage()` and `RemoveImage()`, but even that doesn't happen, the `Present()` check will still return error and let the test pass.

@yujuhong @freehan @liangchenye 

Also /cc @resouer, because he is doing the image related functions refactoring.
2016-04-18 07:05:44 -07:00
Jay Vyas 632a0a81d4 Cluster verification framework supporting declarative definition and iteration against pod spectrum
- rebase: ForEach only on Running pods
- add waitFor step in guestbook describe and wrapper
- simplify logs in polling, make panic immediate, give rolluped stats in
the logs.

Improve logging for failure on ForEach
2016-04-18 10:01:10 -04:00
deads2k f0c33d65b6 start sharing the pod cache and list/watch 2016-04-18 08:51:55 -04:00
Jan Safranek 3137b4cd02 Increase provisioning test timeouts.
We've encountered flakes in our e2e infrastructure when kubelet took more than
one minute to detach a volume used by a deleted pod.

Let's increase the wait period from 1 to 3 minutes. This slows down the test
by 2 minutes, but it makes the test more stable.

In addition, when kubelet cannot detach a volume for 3 minutes, let the test
wait for additional recycle controller retry interval (10 minutes) and hope the
volume is deleted by then. This should not increase usual test time, it makes
the test stable when kubelet is _extremely_ slow when releasing the volume.
2016-04-18 13:06:09 +02:00
k8s-merge-robot b73164792f Merge pull request #24098 from rootfs/e2e-test
Automatic merge from submit-queue

some enhancement to run volume e2e tests
2016-04-17 23:19:10 -07:00
k8s-merge-robot 934d7ebb33 Merge pull request #23968 from yujuhong/privileged_containers
Automatic merge from submit-queue

e2e_node: port privileged pod tests from test/e2e/priviliged.go

The ported test is functionally the same as the original test.
The main difference between the two tests is that the original test relies on
`kubectl` to exec into the container, while the latter directly uses the REST
client of the apiserver. This avoids the need to copy kubectl to the node under
test.
2016-04-17 12:37:12 -07:00
k8s-merge-robot 2b9637da6a Merge pull request #23945 from smarterclayton/move_reset_metrics
Automatic merge from submit-queue

Move /resetMetrics to DELETE /metrics

Reduces the surface area of the API server slightly and allows
downstream components to have deleteable metrics. After this change
genericapiserver will *not* have metrics unless the caller defines it
(allows different apiserver implementations to make that choice on their
own).

@wojtek-t
2016-04-17 05:58:26 -07:00
k8s-merge-robot a275a045d1 Merge pull request #23914 from sky-uk/make-etcd-cache-size-configurable
Automatic merge from submit-queue

Make etcd cache size configurable

Instead of the prior 50K limit, allow users to specify a more sensible size for their cluster.

I'm not sure what a sensible default is here. I'm still experimenting on my own clusters. 50 gives me a 270MB max footprint. 50K caused my apiserver to run out of memory as it exceeded >2GB. I believe that number is far too large for most people's use cases.

There are some other fundamental issues that I'm not addressing here:
- Old etcd items are cached and potentially never removed (it stores using modifiedIndex, and doesn't remove the old object when it gets updated)
- Cache isn't LRU, so there's no guarantee the cache remains hot. This makes its performance difficult to predict. More of an issue with a smaller cache size.
- 1.2 etcd entries seem to have a larger memory footprint (I never had an issue in 1.1, even though this cache existed there). I suspect that's due to image lists on the node status.

This is provided as a fix for #23323
2016-04-17 00:06:31 -07:00
k8s-merge-robot 822618afb5 Merge pull request #23912 from smarterclayton/watch_until
Automatic merge from submit-queue

Add watch.Until, a conditional watch mechanism

A more powerful tool than wait.Poll, allows a watch interface to drive conditionals to react to changes on a resource or resources. Provide a set of standard conditions that are in common use in the code, and updates e2e to use a few of these.

Extracted from #23567
2016-04-16 21:05:40 -07:00
k8s-merge-robot 85550921de Merge pull request #24354 from yifan-gu/fix_addon_update_test
Automatic merge from submit-queue

e2e: Fix 'Addon update' to support coreos distro.

cc @kubernetes/sig-testing @kubernetes/sig-node
2016-04-16 07:31:15 -07:00
k8s-merge-robot c528080248 Merge pull request #24339 from pwittrock/flaky
Automatic merge from submit-queue

Don't clean up files on image-based tests since the instance will be …

…deleted anyway and this is flaky on CoreOS.
2016-04-15 20:50:41 -07:00
Clayton Coleman 845e496572 Convert poll in e2e with watch.Until 2016-04-15 22:21:42 -04:00
k8s-merge-robot 03f48e730e Merge pull request #22810 from amygdala/cassandra2
Automatic merge from submit-queue

phase 2 of cassandra example overhaul

Here's the next iteration in overhauling this example, towards https://github.com/kubernetes/kubernetes/issues/20961.  This removes the pod adoption part, but doesn't (yet) otherwise change any of the resources used.

It also includes some README cleanup, and removes some explicit specification of labels in the rc yaml.

This PR doesn't yet add any commentary on how we're using the seed provider (re: https://github.com/kubernetes/kubernetes/issues/20961#issuecomment-190405959 etc.).  Maybe we should add that.

Also: LMK if this PR should include any changes to the links out to the docs.

cc @bgrant0607 @johndmulhausen
2016-04-15 15:41:01 -07:00
Janet Kuo eb96b28004 Avoid relying on events in deployment rollover e2e test 2016-04-15 15:20:41 -07:00
Phillip Wittrock d37222d1d9 kubelet e2e test - format command output as string, not byte string. 2016-04-15 15:17:59 -07:00
Huamin Chen ee9ed4dd7f enhancements to run volume e2e test:
in e2e/volumes.go: give time to allow pod cleanup and volume unmount happen before volume server exit;
skip cinder volume test if not running with openstack provider

comment on why pause before containerized server is stopped in volume e2e tests, fix #24100

updates NFS server image to 0.6, per #22529

fix persistent_volume e2e test: test cleanup doesn't expect client pod; delete PV after test

Signed-off-by: Huamin Chen <hchen@redhat.com>
2016-04-15 22:14:23 +00:00
Yifan Gu 656d05c5b4 e2e: Fix 'Addon update' to support coreos distro. 2016-04-15 14:29:08 -07:00
Isaac Hollander McCreery a96bb71bc9 Version-guard Kubectl client Guestbook application test against deployments 2016-04-15 10:51:10 -07:00
Phillip Wittrock 5155df0287 Don't clean up files on image-based tests since the instance will be deleted anyway and this is flaky on CoreOS. Fixed #24297 2016-04-15 10:32:26 -07:00
Clayton Coleman 0f95b91f96 Move /resetMetrics to DELETE /metrics
Reduces the surface area of the API server slightly and allows
downstream components to have deleteable metrics. After this change
genericapiserver will *not* have metrics unless the caller defines it
(allows different apiserver implementations to make that choice on their
own).
2016-04-15 11:44:17 -04:00
k8s-merge-robot 2382fec956 Merge pull request #23894 from vishh/oom-score-adj-error
Automatic merge from submit-queue

Do not throw creation errors for containers that fail immediately after being started

Fixes (hopefully) #23607 

cc @dchen1107
2016-04-14 22:29:14 -07:00
Amy Unruh 8846b313dc phase 2 of cassandra example overhaul 2016-04-14 21:55:23 -07:00
k8s-merge-robot 14d9764b77 Merge pull request #24271 from pwittrock/flaky
Automatic merge from submit-queue

Only output log files if tests fail
2016-04-14 21:42:58 -07:00
Daniel Smith 1d96b4c55b Merge pull request #23549 from deads2k/acting-as
add act-as powers
2016-04-14 15:12:45 -07:00
goltermann c226c9435b Fix misspellings in comments.
https://goreportcard.com/report/k8s.io/kubernetes#misspell
2016-04-14 13:57:45 -07:00
k8s-merge-robot d3102a3d8a Merge pull request #24216 from fejta/iperf
Automatic merge from submit-queue

Add Makefile for iperf

@jayunit100 will you please take a look at this? (closes #24204)
2016-04-14 12:37:20 -07:00
Phillip Wittrock e88fdb85c0 Only output log files if tests fail 2016-04-14 12:04:36 -07:00
deads2k ac4c545b91 add act-as powers 2016-04-14 12:49:10 -04:00
Zach Loafman 9b0a4bfdb0 Fix DNS test for larger clusters
On GKE, we scale the number of DNS pods based on the cluster size. For
testing on larger clusters, relax the DNS pod check.
2016-04-14 09:23:30 -07:00
k8s-merge-robot 98c255eb79 Merge pull request #22912 from Random-Liu/kubelet-perf
Automatic merge from submit-queue

Add generalized performance data type in e2e test

For kubernetes/contrib/issues/564 and #15554.

This PR added two files in e2e test:
1) `perftype/perftype.go`: This file contains generalized performance data type. The type can be pretty printed in Json format and analyzed by other performance analyzing tools, such as [Perfdash](https://github.com/kubernetes/contrib/tree/master/perfdash).
2) `perf_util.go`: This file contains functions which convert e2e performance test result into new performance data type.

The new performance data type is now used in *Density test, Load test and Kubelet resource tracking*. It's easy to support other e2e performance test by adding new convert function in `perf_util.go`.

@gmarek @yujuhong 
/cc @kubernetes/sig-testing
2016-04-14 03:47:24 -07:00