The image prepulling pod calls docker directly to pull images. If the pod
hasn't finished before running the resource usage tracking test, there'd be a
cpu spike in docker. We'd rather wait and fail if this is the case, before
running the test.
Automatic merge from submit-queue
Kubelet: Add docker operation timeout
For #23563.
Based on #24748, only the last 2 commits are new.
This PR:
1) Add timeout for all docker operations.
2) Add docker operation timeout metrics
3) Cleanup kubelet stats and add runtime operation error and timeout rate monitoring.
4) Monitor runtime operation error and timeout rate in kubelet perf.
@yujuhong
/cc @gmarek Because of the metrics change.
/cc @kubernetes/sig-node
This commit switch most functions in kubelet_stats.go to use the new API.
However, the functions that perform one-time resource usage retrieval remain
unchanged to be compatible with reource_usage_gatherer.go. They should be
handled separately.
Also, the new summary API does not provide the RSS memory yet, so all memory
checking tests will *always* pass. We plan to add this metrics in the API and
restore the functionality of the test.
This allows resource usage monitoring test to launch 100 test pods per node, in
addition to the add-on pods.
Also reduce the test time length since the results over the shorter period are
representative enough.
- remove skip list from conformance-test.sh and filter by the new tag
- remove experimental api tests from conformance test suite
- remove all tests from conformance test suite which are either
restricted to e.g. gce, gke, aws or require SSH
This scopes down the initially ambitious PR:
https://github.com/kubernetes/kubernetes/pull/14960 to replace just
`pause` and `fluentd-elasticsearch` to come through `beta.gcr.io`.
The v2 versions have been pushed under new tags, `pause:2.0` and
`fluentd-elastisearch:1.12`.
NOTE: `beta.gcr.io` will still serve images using v1 until they are repushed with v2. Pulls through `gcr.io` will still work after pushing through `beta.gcr.io`, but will be served over v1 (via compat logic).
Some tests in this test suite expects --max-pods (i.e. the maximum pod capacity
on kubelet) to be greater than default, which applies only to the GCE test
environment. Split the tests into two sets so that we can better categorize
the tests in the jenkins setup, without making the test itself aware of the
environment.