e2e-node tests may use custom system specs for validating nodes to
conform the specs. The functionality is switched on when the tests
are run with this command:
make SYSTEM_SPEC_NAME=gke test-e2e-node
Currently the command fails with the error:
F1228 16:12:41.568836 34514 e2e_node_suite_test.go:106] Failed to load system spec: open /home/rojkov/go/src/k8s.io/kubernetes/k8s.io/kubernetes/cmd/kubeadm/app/util/system/specs/gke.yaml: no such file or directory
Move the spec file under `test/e2e_node/system/specs` and introduce a single
public constant referring the file to use instead of multiple private constants.
- Move from the old github.com/golang/glog to k8s.io/klog
- klog as explicit InitFlags() so we add them as necessary
- we update the other repositories that we vendor that made a similar
change from glog to klog
* github.com/kubernetes/repo-infra
* k8s.io/gengo/
* k8s.io/kube-openapi/
* github.com/google/cadvisor
- Entirely remove all references to glog
- Fix some tests by explicit InitFlags in their init() methods
Change-Id: I92db545ff36fcec83afe98f550c9e630098b3135
This previously caused a panic when moving lastKnownGood between two
non-nil values, because we were comparing the interface wrapper instead
of comparing the NodeConfigSources. The case of moving from one non-nil
lastKnownGood config to another doesn't appear to be tested by the e2e
node tests. I added a unit test and an e2e node test to help catch bugs
with this case in the future.
Not all users of the E2E framework want to run cloud-provider specific
tests. By splitting out the code it becomes possible to decide in
a E2E test suite which providers are supported.
This is achieved in two ways:
- the framework calls certain functions through a provider
interface instead of calling specific cloud provider functions
directly
- tests that are cloud-provider specific directly import the
new provider packages
The ingress test utilities are only needed by a few tests. Splitting
them out into a separate package makes the framework simpler for test
suites not using those tests.
Fixes: #66649
Automatic merge from submit-queue (batch tested with PRs 65250, 68241). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Initial node performance testing framework.
This PR adds a framework for node performance testing.
Partially fixes: https://github.com/kubernetes/kubernetes/issues/65249.
Use the following command to run this test:
```sh
make test-e2e-node FOCUS="Node Performance Testing" SKIP="" PARALLELISM=1
```
It has been tested in the following environment:
- n1-standard-16
- Ubuntu 16.04
- docker 17.03.2
Note to reviewers:
This PR won't pass node e2e since the docker images in https://github.com/kubernetes/kubernetes/pull/65251 are required for this to function. The node e2e will fail when trying to pull the required images for testing.
Automatic merge from submit-queue (batch tested with PRs 67736, 68123, 68138). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Port security context NodeConformance e2e_node tests to e2e
**What this PR does / why we need it**:
Port all [NodeConformance] SecurityContext e2e_node tests to e2e/common.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#67032
**Special notes for your reviewer**:
- This PR is a continuing effort to close#67032.
- Removed ContainerRuntime constraint [as discussed](https://github.com/kubernetes/kubernetes/pull/67032#discussion_r214201870).
- Porting all [NodeConformance] tests to e2e/common which do not have node dependencies.
- Does it make sense to port [privileged test](https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/security_context_test.go#L558) to e2e/common and remove [NodeFeature:HostAccess] label from test name?
**Release note**:
```release-note
NONE
```
/area conformance
@kubernetes/sig-node-pr-reviews
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix error link in comment
**What this PR does / why we need it**:
Fix error link in comment
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
NONE
**Release note**:
```release-note
NONE
```
/sig node
manager change in commit 7b1ae66.
Also changes the test to make sure node is indeed ready after Kubelet
restart. The previous readiness check may use old API state but
didn't run into the issue due to the delay of waiting for pod restart.
Automatic merge from submit-queue (batch tested with PRs 67100, 67426). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
should check all error in ResourceCollector.Start()
Signed-off-by: yanxuean <yan.xuean@zte.com.cn>
**What this PR does / why we need it**:
1. We should check both errors.
test/e2e_node/resource_collector.go
```
func (r *ResourceCollector) Start() {
// Get the cgroup container names for kubelet and runtime
kubeletContainer, err := getContainerNameForProcess(kubeletProcessName, "")
runtimeContainer, err := getContainerNameForProcess(framework.TestContext.ContainerRuntimeProcessName, framework.TestContext.ContainerRuntimePidFile)
if err == nil {
systemContainers = map[string]string{
stats.SystemContainerKubelet: kubeletContainer,
stats.SystemContainerRuntime: runtimeContainer,
}
}
```
2. redundant compare
The Timestamp.Equal is unlikely to occur, because we have met Timestamp.Before.
```
if oldStats, ok := oldStatsMap[name]; ok && oldStats.Timestamp.Before(newStats.Timestamp) {
if oldStats.Timestamp.Equal(newStats.Timestamp) {
continue
}
r.buffers[name] = append(r.buffers[name], computeContainerResourceUsage(name, oldStats, newStats))
}
```
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
/sig-node
Automatic merge from submit-queue (batch tested with PRs 67100, 67426). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
porting e2e_node lifecycle testcases into e2e folder under common
a) Shifted (and renamed) file existing in e2e_node to e2e/common.
b) Added these tests to the conformance suite:
- "should execute poststart exec hook properly"
- "should execute prestop exec hook properly"
- "should execute poststart http hook properly"
- "should execute prestop http hook properly"
[reference issue](https://github.com/kubernetes/kubernetes/issues/67086) explaining the effort.
Automatic merge from submit-queue (batch tested with PRs 67071, 66906, 66722, 67276, 67039). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
#50102 Task 1: Move apimachinery/pkg/watch.Until into client-go/tools/watch.UntilWithoutRetry
**What this PR does / why we need it**:
This is a split off from https://github.com/kubernetes/kubernetes/pull/50102 to go in smaller pieces.
Moves `apimachinery/pkg/watch.Until` into `client-go/tools/watch.UntilWithoutRetry` and adds context so it is cancelable.
**Release note**:
```release-note
NONE
```
**Dev release note**:
```dev-release-note
`apimachinery/pkg/watch.Until` has been moved to `client-go/tools/watch.UntilWithoutRetry`.
While switching please consider using the new `client-go/tools/watch.UntilWithSync` or `client-go/tools/watch.Until`.
```
/cc @smarterclayton @kubernetes/sig-api-machinery-pr-reviews
/milestone v1.12
/priority important-soon
/kind bug
(bug after the main PR which is this split from)
All e2e test images are now using multi-arch manifests so we should stop
looking up and using images that are specific to runtime.GOARCH
Change-Id: I5f3fd6e9a42b9fb88891c19e28a2dfcf7a14be82
Automatic merge from submit-queue (batch tested with PRs 63572, 67084). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Remove [Conformance] from tests in test/e2e_node
Conformance tests live inside of test/e2e, none of the tests currently
tagged as `[Conformance]` in test/e2e_node actually get run when you run
the conformance tests with e2e.test (either directly or indirectly with
sonobuoy)
If these tests make sense as both `[NodeConformance]` and `[Conformance]`
tests, they should be ported to test/e2e/common
/kind cleanup
/area conformance
/sig architecture
/cc @bgrant0607 @smarterclayton @timothysc
/sig node
/cc @yujuhong
ref: https://github.com/kubernetes/kubernetes/issues/66875
```release-note
NONE
```
None of these tests actually run as part of e2e testing, which is
the only way conformance tests are kicked off. They should not be
included as part of the conformance suite unless they live in
test/e2e/common
Use the same pattern everywhere in the e2e test
harness, use busybox (from dockerhub) instead
of using the one from k8s.gcr.io registry.
Change-Id: I57c3b867408c1f9478a8909c26744ea0368ff003
Automatic merge from submit-queue (batch tested with PRs 58755, 66414). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Use probe based plugin watcher mechanism in Device Manager
**What this PR does / why we need it**:
Uses this probe based utility in the device plugin manager.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#56944
**Notes For Reviewers**:
Changes are backward compatible and existing device plugins will continue to work. At the same time, any new plugins that has required support for probing model (Identity service implementation), will also work.
**Release note**
```release-note
Add support kubelet plugin watcher in device manager.
```
/sig node
/area hw-accelerators
/cc /cc @jiayingz @RenaudWasTaken @vishh @ScorpioCPH @sjenning @derekwaynecarr @jeremyeder @lichuqiang @tengqm @saad-ali @chakri-nelluri @ConnorDoyle
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Remove --cadvisor-port - has been deprecated since v1.10
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#56523
**Special notes for your reviewer**:
- Deprecated in https://github.com/kubernetes/kubernetes/pull/59827 (v1.10)
- Disabled in https://github.com/kubernetes/kubernetes/pull/63881 (v1.11)
**Release note**:
```release-note
[action required] The formerly publicly-available cAdvisor web UI that the kubelet started using `--cadvisor-port` is now entirely removed in 1.12. The recommended way to run cAdvisor if you still need it, is via a DaemonSet.
```
Automatic merge from submit-queue (batch tested with PRs 65348, 65599, 65635, 65688, 65691). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
test/e2e_node/system/types_unix: support ZFS
**What this PR does / why we need it**:
Docker validation tests in the case of ZFS used as the graph driver
fail due to "zfs" not being present in the default Docker specification.
Add "zfs" in the GraphDriver slice.
kubeadm relies on the `DockerValidator` and pre-flight checks would fail if the user is using ZFS.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Updates kubernetes/kubeadm#930
**Special notes for your reviewer**:
NONE
/cc @kubernetes/sig-node-pr-reviews
/cc @kubernetes/sig-cluster-lifecycle-pr-reviews
/cc @kvaps (reported by)
/area node-e2e
/area kubeadm
**Release note**:
```release-note
Unix: support ZFS as a valid graph driver for Docker
```
Docker validation tests in the case of ZFS used as the graph driver
fail due to "zfs" not being present in the default Docker specification.
Add "zfs" in the GraphDriver slice.
Kubelet restart process seems to get a bit slower recently. From running
the gpu_device_plugin e2e_node test on GCE, I saw it took ~37 seconds
for kubelet to start CM DeviceManager after it restarts, and then took
~12 seconds for the gpu device plugin to re-register. As the result,
this e2e_node test fails because the current 10 sec waiting time is too
small. Restarting a container also seems to get slower that it sometimes
exceeds the current 2 min waiting time in ensurePodContainerRestart().
This change increase both waiting time to 5 min to leave enough space
on slower machines.
Automatic merge from submit-queue (batch tested with PRs 65449, 65373, 49410). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
add kernel config locations for fedora and atomic
**What this PR does / why we need it**:
* Fedora stores its kernel configs in /usr/lib/modules/$(uname -r)/config
* Fedora/CentOS/RHEL atomic hosts use /usr/lib/ostree-boot/$(uname -r), though this location is deprecated
* The lack of these locations in the validator is causing kubeadm to hang on "failed to parse kernel config" in its preflight checking on fedora and atomic host
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add metadata to kubelet eviction event annotations
**What this PR does / why we need it**:
Add annotations to kubelet eviction events. Annotations include
"offending_containers" : comma-seperated list of containers.
"offending_containers_usage": comma-seperated list of usage.
"starved_resource": v1.ResourceName of the starved resource
**Special notes for your reviewer**:
Adding annotations to events required changing the `EventRecorder` interface to add a `AnnotatedEventf` function, which can add annotations to an event.
**Release note**:
```release-note
NONE
```
/assign @dchen1107
cc @mwielgus @schylek @kgrygiel
Automatic merge from submit-queue (batch tested with PRs 64456, 64457, 64472). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
e2e node: mark pod cgroup test as [NodeConformance]
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 64456, 64457, 64472). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix dynamic kubelet config tests
**What this PR does / why we need it**:
The default for dynamic kubelet config is now true, so if it is unset, assume it is enabled for testing.
**Release note**:
```release-note
NONE
```
/sig node
/kind bug
/priority critical-urgent
/assign @mtaufen
Automatic merge from submit-queue (batch tested with PRs 64175, 63893). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
add colon separators to improve readability of test names
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 61963, 64279, 64130, 64125, 64049). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add node-exclusive tags to tests in test/e2e_node
Original issue: #59001
Depends on: #64128
Add node-exclusive tags based on the [proposal](https://docs.google.com/document/d/1BdNVUGtYO6NDx10x_fueRh_DLT-SVdlPC_SsXjYCHOE/edit#)
Follow-up PRs will:
- Tag the tests in `test/e2e/common`
- Change the test job configurations to use the new tests
- Remove the unused, non-node-exclusive tags in `test/e2e_node`
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 64013, 63896, 64139, 57527, 62102). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
add dynamic config metrics
This PR exports config-releated metrics from the Kubelet.
The Guages for active, assigned, and last-known-good config can be used
to identify config versions and produce aggregate counts across several
nodes. The error-reporting Gauge can be used to determine whether a node
is experiencing a config-related error, and to prodouce an aggregate
count of nodes in an error state.
https://github.com/kubernetes/features/issues/281
```release-note
The Kubelet now exports metrics that report the assigned (node_config_assigned), last-known-good (node_config_last_known_good), and active (node_config_active) config sources, and a metric indicating whether the node is experiencing a config-related error (node_config_error). The config source metrics always report the value 1, and carry the node_config_name, node_config_uid, node_config_resource_version, and node_config_kubelet_key labels, which identify the config version. The error metric reports 1 if there is an error, 0 otherwise.
```
Automatic merge from submit-queue (batch tested with PRs 64013, 63896, 64139, 57527, 62102). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Refactor test utils that deal with Kubelet metrics for clarity
I found these functions hard to understand, because the names did not
accurately reflect their behavior. For example, GetKubeletMetrics
assumed that all of the metrics passed in were measuring latency.
The caller of GetKubeletMetrics was implicitly making this assumption,
but it was not obvious at the call site.
```release-note
NONE
```
This PR exports config-releated metrics from the Kubelet.
The Guages for active, assigned, and last-known-good config can be used
to identify config versions and produce aggregate counts across several
nodes. The error-reporting Gauge can be used to determine whether a node
is experiencing a config-related error, and to prodouce an aggregate
count of nodes in an error state.
Automatic merge from submit-queue (batch tested with PRs 63881, 64046, 63409, 63402, 63221). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Kubelet responds to ConfigMap mutations for dynamic Kubelet config
This PR makes dynamic Kubelet config easier to reason about by leaving less room for silent skew scenarios. The new behavior is as follows:
- ConfigMap does not exist: Kubelet reports error status due to missing source
- ConfigMap is created: Kubelet starts using it
- ConfigMap is updated: Kubelet respects the update (but we discourage this pattern, in favor of incrementally migrating to a new ConfigMap)
- ConfigMap is deleted: Kubelet keeps using the config (non-disruptive), but reports error status due to missing source
- ConfigMap is recreated: Kubelet respects any updates (but, again, we discourage this pattern)
This PR also makes a small change to the config checkpoint file tree structure, because ResourceVersion is now taken into account when saving checkpoints. The new structure is as follows:
```
- dir named by --dynamic-config-dir (root for managing dynamic config)
| - meta
| - assigned (encoded kubeletconfig/v1beta1.SerializedNodeConfigSource object, indicating the assigned config)
| - last-known-good (encoded kubeletconfig/v1beta1.SerializedNodeConfigSource object, indicating the last-known-good config)
| - checkpoints
| - uid1 (dir for versions of object identified by uid1)
| - resourceVersion1 (dir for unpacked files from resourceVersion1)
| - ...
| - ...
```
fixes: #61643
```release-note
The dynamic Kubelet config feature will now update config in the event of a ConfigMap mutation, which reduces the chance for silent config skew. Only name, namespace, and kubeletConfigKey may now be set in Node.Spec.ConfigSource.ConfigMap. The least disruptive pattern for config management is still to create a new ConfigMap and incrementally roll out a new Node.Spec.ConfigSource.
```
I found these functions hard to understand, because the names did not
accurately reflect their behavior. For example, GetKubeletMetrics
assumed that all of the metrics passed in were measuring latency.
The caller of GetKubeletMetrics was implicitly making this assumption,
but it was not obvious at the call site.
Automatic merge from submit-queue (batch tested with PRs 63865, 57849, 63932, 63930, 63936). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Eviction Node e2e test checks for eviction reason
**What this PR does / why we need it**:
Currently, the eviction test simply ensures that pods are marked `Failed`. However, this could occur because of an OOM, rather than an eviction.
To ensure that pods are actually being evicted, check for the Reason in the pod status to ensure it is evicted.
**Release note**:
```release-note
NONE
```
cc @kubernetes/sig-node-pr-reviews