generic pod-per-node functionality for testing - 2 node test only
- update framework to decompose pod vs svc creation for composition.
- remove hard coded 2 and pointer to --scale
Automatic merge from submit-queue
Collect and expose runtime's image storage usage via Kubelet's /stats/summary endpoint
This information is useful to users since docker images are typically not stored on the root filesystem.
Kubelet will also consume this feature in the future to decide is evicting images will help with disk usage on the nodes.
cc @kubernetes/sig-node
Automatic merge from submit-queue
Move internal types of job from pkg/apis/extensions to pkg/apis/batch
This addressed the job part of #23216, this is still WIP. Will notify once finished. I'd like to have it in before starting working on ScheduledJob.
@lavalamp @erictune fyi
Automatic merge from submit-queue
Add node e2e test for mirror pod.
This is a node e2e test for mirror pod.
After this get merged, I'll revisit the mirror pod manager PR. #18638
/cc @yujuhong
Automatic merge from submit-queue
Use mCPU as CPU usage unit, add version in PerfData, and fix memory usage bug.
Partially addressed #24436.
This PR:
1) Change the CPU usage unit to "mCPU"
2) Add version in PerfData, and perfdash will only support the newest version now.
3) Fix stupid mistake when calculating the memory usage average.
/cc @vishh
Add tests to watch behavior in both protocols (http and websocket)
against all 3 media types. Adopt the
`application/vnd.kubernetes.protobuf;stream=watch` media type for the
content that comes back from a watch call so that it can be
distinguished from a Status result.
Automatic merge from submit-queue
Add timeout to e2e network connectivity checks
Some e2e tests use wget to check connectivity, and the default e2e
timeout is 900s. This change allows the timeout to be specified on a
check-by-check basis. This will also make the check useful for negative
checks (like those used by openshift to validate isolation) since a
short timeout is suggested where connectivity is not expected.
Automatic merge from submit-queue
make storage enablement, serialization, and location orthogonal
This allows a caller (command-line, config, code) to specify multiple separate pieces of config information regarding storage and have them properly composed at runtime. The information provided is exposed through interfaces to allow alternate implementations, which allows us to change the expression of the config moving forward. I also fixed up the types to be correct as I moved through.
The same options still exist, but they're composed slightly differently
1. specify target etcd servers per Group or per GroupResource
1. specify storage GroupVersions per Groups or per GroupResource
1. specify etcd prefixes per GroupVersion or per GroupResource
1. specify that multiple GroupResources share the same location in etcd
1. enable GroupResources by GroupVersion or by GroupResource whitelist or GroupResource blacklist
The `storage.Interface` is built per GroupResource by:
1. find the set of possible storage GroupResource based on the priority list of cohabitators
1. choose a GroupResource from the set by looking at which Groups have the resource enabled
1. find the target etcd server, etcd prefix, and storage encoding based on the GroupResource
The API server can have its resources separately enabled, but for now I've kept them linked.
@liggitt I think we need this (or something like it) to be able to go from config to these interfaces. Given another round of refactoring, we may be able to reshape these to be more forward driving.
@smarterclayton this is important for rebasing and for a seamless 1.2 to 1.3 migration for us.
Automatic merge from submit-queue
Increase provisioning test timeouts.
We've encountered flakes in our e2e infrastructure when kubelet took more than one minute to detach a volume used by a deleted pod.
Let's increase the wait period from 1 to 3 minutes. This slows down the test by 2 minutes, but it makes the test more stable.
In addition, when kubelet cannot detach a volume for 3 minutes, let the test wait for additional recycle controller retry interval (10 minutes) and hope the volume is deleted by then. This should not increase usual test time, it makes the test stable when kubelet is _extremely_ slow when releasing the volume.
Fixes: #24161
Automatic merge from submit-queue
Fix unintended change of Service.spec.ports[].nodePort during kubectl apply
Please refer #23551 for more detail. @bgrant0607 I think this simple fix should be ok to reuse nodePort. @thockin ptal.
Release note: Fix unintended change of `Service.spec.ports[].nodePort` during `kubectl apply`.
Automatic merge from submit-queue
Add RC and container pors to scheduler benchmark
Fix#23263
Ref #24408
However - scheduler throughput is still ~140 initially, whereas in reality we have 35-40. There are still significant difference we should understand.
@hongchaodeng @xiang90
Automatic merge from submit-queue
Cluster Verification Framework
I've spent the last few days looking at the general patterns of verification we have that we tend to reuse in the e2es. Basically, we need
- label filters
- forEach and WaitFor (where forEach doesn't necessarily waitFor anything).
- timeouts
- multiple phases (reusable definition of state)
- an extensible way to define cluster state that can evolve over time in a data object rather than as a set of parameters that have magic semantics
This PR
- implements the abstract above functionality declaratively, and w/o hidden semantics.
- addresses the sprawling duplicate methods in #23540, so that we can phase out the wrapper methods and replace them with well defined, extensible semantics for cluster state.
- fixes the recently discovered #23730 issue (where kubectl.go is relying on examples.go, which is obviously wacky) by using the new framework to implement forEachPod in just a couple of lines and migrating the wrapper function into framework.go.
There is some cleanup to do here, but this is seemingly working for a couple of use cases that are important (spark,cassandra,...,kubectl) tests. - i played with a few different ideas and this wound up seeming to be the most natural implementation from a usability standpoint...
in any case, just thought id push this up as a first iteration, open to feedback.
@kubernetes/sig-testing @timothysc
Automatic merge from submit-queue
Fix DNS test for larger clusters
On GKE, we scale the number of DNS pods based on the cluster size. For
testing on larger clusters, relax the DNS pod check.
Automatic merge from submit-queue
Incremental improvements to kubelet e2e tests
- Add keep-alive to ssh connection
- Don't try to stop services on image-based runs
- Increase jenkins ci timeout to 90 minutes to accomadate unpredictable go build times
- Remove spammy log statement
Automatic merge from submit-queue
Improve script for running scheduler benchmarks
Without this change, this script didn't work in my environment - it's making more consistent with other scripts.
@hongchaodeng @xiang90
- Add keep-alive to ssh connection
- Don't try to stop services on image-based runs
- Increase jenkins ci timeout to 90 minutes to accomadate unpredictable go build times
- Remove spammy log statement
Automatic merge from submit-queue
shared controller informers
Related to https://github.com/kubernetes/kubernetes/issues/14978
This demonstrates how controllers which use an `Informer`, would be able to share the same watch and store. A similar "setup and run" approach could be done for an `IndexInformer` to share that cache. I found adding listeners here to be easier than intercepting at the watch interface (problems with resourceVersion) or the reflector (same plumbing, but you have to fan out to multiple stores).
We could also use the cache we build here to back several of the admission plugins that currently run their own lookup caches today.
If there's interest, I can finish out the `SharedInformer` and switch the low hanging fruit over.
@kubernetes/rh-cluster-infra @smarterclayton @liggitt @wojtek-t
Automatic merge from submit-queue
Fix PullImage and add corresponding node e2e test
Fixes#24101. This is a bug introduced by #23506, since ref #23563.
The root cause of #24101 is described [here](https://github.com/kubernetes/kubernetes/issues/24101#issuecomment-208547623).
This PR
1) Fixes#24101 by decoding the messages returned during pulling image, and return error if any of the messages contains error.
2) Add the node e2e test to detect this kind of failure.
3) Get present check out of `ConformanceImage.Remove()` and `ConformanceImage.Pull()`. Because sometimes we may expect error to occur in `PullImage()` and `RemoveImage()`, but even that doesn't happen, the `Present()` check will still return error and let the test pass.
@yujuhong @freehan @liangchenye
Also /cc @resouer, because he is doing the image related functions refactoring.
- rebase: ForEach only on Running pods
- add waitFor step in guestbook describe and wrapper
- simplify logs in polling, make panic immediate, give rolluped stats in
the logs.
Improve logging for failure on ForEach
We've encountered flakes in our e2e infrastructure when kubelet took more than
one minute to detach a volume used by a deleted pod.
Let's increase the wait period from 1 to 3 minutes. This slows down the test
by 2 minutes, but it makes the test more stable.
In addition, when kubelet cannot detach a volume for 3 minutes, let the test
wait for additional recycle controller retry interval (10 minutes) and hope the
volume is deleted by then. This should not increase usual test time, it makes
the test stable when kubelet is _extremely_ slow when releasing the volume.
Automatic merge from submit-queue
e2e_node: port privileged pod tests from test/e2e/priviliged.go
The ported test is functionally the same as the original test.
The main difference between the two tests is that the original test relies on
`kubectl` to exec into the container, while the latter directly uses the REST
client of the apiserver. This avoids the need to copy kubectl to the node under
test.
Automatic merge from submit-queue
Move /resetMetrics to DELETE /metrics
Reduces the surface area of the API server slightly and allows
downstream components to have deleteable metrics. After this change
genericapiserver will *not* have metrics unless the caller defines it
(allows different apiserver implementations to make that choice on their
own).
@wojtek-t
Automatic merge from submit-queue
Make etcd cache size configurable
Instead of the prior 50K limit, allow users to specify a more sensible size for their cluster.
I'm not sure what a sensible default is here. I'm still experimenting on my own clusters. 50 gives me a 270MB max footprint. 50K caused my apiserver to run out of memory as it exceeded >2GB. I believe that number is far too large for most people's use cases.
There are some other fundamental issues that I'm not addressing here:
- Old etcd items are cached and potentially never removed (it stores using modifiedIndex, and doesn't remove the old object when it gets updated)
- Cache isn't LRU, so there's no guarantee the cache remains hot. This makes its performance difficult to predict. More of an issue with a smaller cache size.
- 1.2 etcd entries seem to have a larger memory footprint (I never had an issue in 1.1, even though this cache existed there). I suspect that's due to image lists on the node status.
This is provided as a fix for #23323
Automatic merge from submit-queue
Add watch.Until, a conditional watch mechanism
A more powerful tool than wait.Poll, allows a watch interface to drive conditionals to react to changes on a resource or resources. Provide a set of standard conditions that are in common use in the code, and updates e2e to use a few of these.
Extracted from #23567
Automatic merge from submit-queue
phase 2 of cassandra example overhaul
Here's the next iteration in overhauling this example, towards https://github.com/kubernetes/kubernetes/issues/20961. This removes the pod adoption part, but doesn't (yet) otherwise change any of the resources used.
It also includes some README cleanup, and removes some explicit specification of labels in the rc yaml.
This PR doesn't yet add any commentary on how we're using the seed provider (re: https://github.com/kubernetes/kubernetes/issues/20961#issuecomment-190405959 etc.). Maybe we should add that.
Also: LMK if this PR should include any changes to the links out to the docs.
cc @bgrant0607 @johndmulhausen
in e2e/volumes.go: give time to allow pod cleanup and volume unmount happen before volume server exit;
skip cinder volume test if not running with openstack provider
comment on why pause before containerized server is stopped in volume e2e tests, fix#24100
updates NFS server image to 0.6, per #22529
fix persistent_volume e2e test: test cleanup doesn't expect client pod; delete PV after test
Signed-off-by: Huamin Chen <hchen@redhat.com>
Reduces the surface area of the API server slightly and allows
downstream components to have deleteable metrics. After this change
genericapiserver will *not* have metrics unless the caller defines it
(allows different apiserver implementations to make that choice on their
own).
Automatic merge from submit-queue
Do not throw creation errors for containers that fail immediately after being started
Fixes (hopefully) #23607
cc @dchen1107
Some e2e tests use wget to check connectivity, and the default e2e
timeout is 900s. This change allows the timeout to be specified on a
check-by-check basis. This will also make the check useful for negative
checks (like those used by openshift to validate isolation) since a
short timeout is suggested where connectivity is not expected.
Automatic merge from submit-queue
Add generalized performance data type in e2e test
For kubernetes/contrib/issues/564 and #15554.
This PR added two files in e2e test:
1) `perftype/perftype.go`: This file contains generalized performance data type. The type can be pretty printed in Json format and analyzed by other performance analyzing tools, such as [Perfdash](https://github.com/kubernetes/contrib/tree/master/perfdash).
2) `perf_util.go`: This file contains functions which convert e2e performance test result into new performance data type.
The new performance data type is now used in *Density test, Load test and Kubelet resource tracking*. It's easy to support other e2e performance test by adding new convert function in `perf_util.go`.
@gmarek @yujuhong
/cc @kubernetes/sig-testing
Automatic merge from submit-queue
Don't remove kubemark binaries
If you want to experiment a bit with kubemark (e.g. relaunch the component with different flags), it's roughly impossible now. If we have binaries, this is much easier.
Automatic merge from submit-queue
Clientset release 1.3
This PR creates the release 1.3 client set. We'll keep updating this client set until we cut release 1.3. In the meantime, the release 1.2 client set will be locked.
@lavalamp
Automatic merge from submit-queue
IPerf container to support network perfomance testing
Simple iperf container.
Issue:
We want to run iperf from the e2e tests for a network baseline, but there are no gcr images for this.
Solution:
Curate our own iperf container from source in kubernetes and copy it as a top level microservice. So long as these are injected into GCR, we can then run this container from the e2e tests.
cc @sig-testing this can be used along side #22869
Automatic merge from submit-queue
e2e: adapt kubelet_perf.go to use the new summary metrics API
This commit switch most functions in kubelet_stats.go to use the new API.
However, the functions that perform one-time resource usage retrieval remain
unchanged to be compatible with reource_usage_gatherer.go. They should be
handled separately.
Also, the new summary API does not provide the RSS memory yet, so all memory
checking tests will *always* pass. We plan to add this metrics in the API and
restore the functionality of the test.
This commit switch most functions in kubelet_stats.go to use the new API.
However, the functions that perform one-time resource usage retrieval remain
unchanged to be compatible with reource_usage_gatherer.go. They should be
handled separately.
Also, the new summary API does not provide the RSS memory yet, so all memory
checking tests will *always* pass. We plan to add this metrics in the API and
restore the functionality of the test.
Automatic merge from submit-queue
Additional go vet fixes
Mostly:
- pass lock by value
- bad syntax for struct tag value
- example functions not formatted properly
The ported test is functionally the same as the original test.
The main difference between the two tests is that the original test relies on
`kubectl` to exec into the container, while the latter directly uses the REST
client of the apiserver. This avoids the need to copy kubectl to the node under
test.
Automatic merge from submit-queue
Ensure object returned by volume getCloudProvider incorporates cloud config
This PR addresses https://github.com/kubernetes/kubernetes/issues/23517.
**Problem**
The existing GCE PD and AWS EBS volume plugin code were fetching cloud provider without specifying a cloud config: `cloudprovider.GetCloudProvider("gce", nil)`
This caused the cloud provider to use default auth mechanism, which is not acceptable for the provisioning controller running on GKE master.
**Fix**
This PR does the following:
* Modifies the GCE PD and AWS EBS volume plugin code to use the cloud provider object pre-constructed by the binary with a cloud config.
* Enable provisioning E2E test for GKE (to catch future issues).
Thanks to @cjcullen for debugging and finding the root cause! 👍
This should be cherry-picked into the v1.2 branch for the next release.
Automatic merge from submit-queue
genericapiserver: Moving InstallSwaggerAPI to Run
Ref https://github.com/kubernetes/kubernetes/pull/21190#discussion_r57494673
Moving InstallSwaggerAPI() from InstallAPIGroups() to Run(). This allows the use of InstallAPIGroups() multiple times or using InstallAPIGroup() directly.
cc @jianhuiz @kubernetes/sig-api-machinery
Automatic merge from submit-queue
Update port forward e2e for go 1.6
Only close the stdout/stderr pipes from kubectl port-forward when we're truly done with the command,
instead of as soon as runPortForward exits.
Also try to gracefully stop kubectl port-forward via SIGINT, instead of always sending SIGKILL, as
this will help avoid spdy goroutine leaks in the Kubelet.
Ref #22149
cc @smarterclayton @kubernetes/rh-cluster-infra
Only close the stdout/stderr pipes from kubectl port-forward when we're truly done with the command,
instead of as soon as runPortForward exits.
Also try to gracefully stop kubectl port-forward via SIGINT, instead of always sending SIGKILL, as
this will help avoid spdy goroutine leaks in the Kubelet.
We have previously tried building a full cloudprovider in e2e for AWS;
this wasn't the best idea, because e2e runs on a different machine than
normal operations, and often doesn't even run in AWS. In turn, this
meant that the cloudprovider had to do extra work and have extra code,
which we would like to get rid of. Indeed, I got rid of some code which
tolerated not running in AWS, and this broke e2e.
Here are a list of changes along with an explanation of how they work:
1. Add a new string field called TargetSelector to the external version of
extensions Scale type (extensions/v1beta1.Scale). This is a serialized
version of either the map-based selector (in case of ReplicationControllers)
or the unversioned.LabelSelector struct (in case of Deployments and
ReplicaSets).
2. Change the selector field in the internal Scale type (extensions.Scale) to
unversioned.LabelSelector.
3. Add conversion functions to convert from two external selector fields to a
single internal selector field. The rules for conversion are as follows:
i. If the target resource that this scale targets supports LabelSelector
(Deployments and ReplicaSets), then serialize the LabelSelector and
store the string in the TargetSelector field in the external version
and leave the map-based Selector field as nil.
ii. If the target resource only supports a map-based selector
(ReplicationControllers), then still serialize that selector and
store the serialized string in the TargetSelector field. Also,
set the the Selector map field in the external Scale type.
iii. When converting from external to internal version, parse the
TargetSelector string into LabelSelector struct if the string isn't
empty. If it is empty, then check if the Selector map is set and just
assign that map to the MatchLabels component of the LabelSelector.
iv. When converting from internal to external version, serialize the
LabelSelector and store it in the TargetSelector field. If only
the MatchLabel component is set, then also copy that value to
the Selector map field in the external version.
4. HPA now just converts the LabelSelector field to a Selector interface
type to list the pods.
5. Scale Get and Update etcd methods for Deployments and ReplicaSets now
return extensions.Scale instead of autoscaling.Scale.
6. Consequently, SubresourceGroupVersion override and is "autoscaling"
enabled check is now removed from pkg/master/master.go
7. Other small changes to labels package, fuzzer and LabelSelector
helpers to piece this all together.
8. Add unit tests to HPA targeting Deployments and ReplicaSets.
9. Add an e2e test to HPA targeting ReplicaSets.