Not all users of the E2E framework want to run cloud-provider specific
tests. By splitting out the code it becomes possible to decide in
a E2E test suite which providers are supported.
This is achieved in two ways:
- the framework calls certain functions through a provider
interface instead of calling specific cloud provider functions
directly
- tests that are cloud-provider specific directly import the
new provider packages
The ingress test utilities are only needed by a few tests. Splitting
them out into a separate package makes the framework simpler for test
suites not using those tests.
Fixes: #66649
Automatic merge from submit-queue (batch tested with PRs 65250, 68241). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Initial node performance testing framework.
This PR adds a framework for node performance testing.
Partially fixes: https://github.com/kubernetes/kubernetes/issues/65249.
Use the following command to run this test:
```sh
make test-e2e-node FOCUS="Node Performance Testing" SKIP="" PARALLELISM=1
```
It has been tested in the following environment:
- n1-standard-16
- Ubuntu 16.04
- docker 17.03.2
Note to reviewers:
This PR won't pass node e2e since the docker images in https://github.com/kubernetes/kubernetes/pull/65251 are required for this to function. The node e2e will fail when trying to pull the required images for testing.
Automatic merge from submit-queue (batch tested with PRs 67736, 68123, 68138). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Port security context NodeConformance e2e_node tests to e2e
**What this PR does / why we need it**:
Port all [NodeConformance] SecurityContext e2e_node tests to e2e/common.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#67032
**Special notes for your reviewer**:
- This PR is a continuing effort to close#67032.
- Removed ContainerRuntime constraint [as discussed](https://github.com/kubernetes/kubernetes/pull/67032#discussion_r214201870).
- Porting all [NodeConformance] tests to e2e/common which do not have node dependencies.
- Does it make sense to port [privileged test](https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/security_context_test.go#L558) to e2e/common and remove [NodeFeature:HostAccess] label from test name?
**Release note**:
```release-note
NONE
```
/area conformance
@kubernetes/sig-node-pr-reviews
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix error link in comment
**What this PR does / why we need it**:
Fix error link in comment
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
NONE
**Release note**:
```release-note
NONE
```
/sig node
manager change in commit 7b1ae66.
Also changes the test to make sure node is indeed ready after Kubelet
restart. The previous readiness check may use old API state but
didn't run into the issue due to the delay of waiting for pod restart.
Automatic merge from submit-queue (batch tested with PRs 67100, 67426). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
should check all error in ResourceCollector.Start()
Signed-off-by: yanxuean <yan.xuean@zte.com.cn>
**What this PR does / why we need it**:
1. We should check both errors.
test/e2e_node/resource_collector.go
```
func (r *ResourceCollector) Start() {
// Get the cgroup container names for kubelet and runtime
kubeletContainer, err := getContainerNameForProcess(kubeletProcessName, "")
runtimeContainer, err := getContainerNameForProcess(framework.TestContext.ContainerRuntimeProcessName, framework.TestContext.ContainerRuntimePidFile)
if err == nil {
systemContainers = map[string]string{
stats.SystemContainerKubelet: kubeletContainer,
stats.SystemContainerRuntime: runtimeContainer,
}
}
```
2. redundant compare
The Timestamp.Equal is unlikely to occur, because we have met Timestamp.Before.
```
if oldStats, ok := oldStatsMap[name]; ok && oldStats.Timestamp.Before(newStats.Timestamp) {
if oldStats.Timestamp.Equal(newStats.Timestamp) {
continue
}
r.buffers[name] = append(r.buffers[name], computeContainerResourceUsage(name, oldStats, newStats))
}
```
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
/sig-node
Automatic merge from submit-queue (batch tested with PRs 67100, 67426). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
porting e2e_node lifecycle testcases into e2e folder under common
a) Shifted (and renamed) file existing in e2e_node to e2e/common.
b) Added these tests to the conformance suite:
- "should execute poststart exec hook properly"
- "should execute prestop exec hook properly"
- "should execute poststart http hook properly"
- "should execute prestop http hook properly"
[reference issue](https://github.com/kubernetes/kubernetes/issues/67086) explaining the effort.
Automatic merge from submit-queue (batch tested with PRs 67071, 66906, 66722, 67276, 67039). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
#50102 Task 1: Move apimachinery/pkg/watch.Until into client-go/tools/watch.UntilWithoutRetry
**What this PR does / why we need it**:
This is a split off from https://github.com/kubernetes/kubernetes/pull/50102 to go in smaller pieces.
Moves `apimachinery/pkg/watch.Until` into `client-go/tools/watch.UntilWithoutRetry` and adds context so it is cancelable.
**Release note**:
```release-note
NONE
```
**Dev release note**:
```dev-release-note
`apimachinery/pkg/watch.Until` has been moved to `client-go/tools/watch.UntilWithoutRetry`.
While switching please consider using the new `client-go/tools/watch.UntilWithSync` or `client-go/tools/watch.Until`.
```
/cc @smarterclayton @kubernetes/sig-api-machinery-pr-reviews
/milestone v1.12
/priority important-soon
/kind bug
(bug after the main PR which is this split from)
All e2e test images are now using multi-arch manifests so we should stop
looking up and using images that are specific to runtime.GOARCH
Change-Id: I5f3fd6e9a42b9fb88891c19e28a2dfcf7a14be82
Automatic merge from submit-queue (batch tested with PRs 63572, 67084). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Remove [Conformance] from tests in test/e2e_node
Conformance tests live inside of test/e2e, none of the tests currently
tagged as `[Conformance]` in test/e2e_node actually get run when you run
the conformance tests with e2e.test (either directly or indirectly with
sonobuoy)
If these tests make sense as both `[NodeConformance]` and `[Conformance]`
tests, they should be ported to test/e2e/common
/kind cleanup
/area conformance
/sig architecture
/cc @bgrant0607 @smarterclayton @timothysc
/sig node
/cc @yujuhong
ref: https://github.com/kubernetes/kubernetes/issues/66875
```release-note
NONE
```
None of these tests actually run as part of e2e testing, which is
the only way conformance tests are kicked off. They should not be
included as part of the conformance suite unless they live in
test/e2e/common
Use the same pattern everywhere in the e2e test
harness, use busybox (from dockerhub) instead
of using the one from k8s.gcr.io registry.
Change-Id: I57c3b867408c1f9478a8909c26744ea0368ff003
Automatic merge from submit-queue (batch tested with PRs 58755, 66414). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Use probe based plugin watcher mechanism in Device Manager
**What this PR does / why we need it**:
Uses this probe based utility in the device plugin manager.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#56944
**Notes For Reviewers**:
Changes are backward compatible and existing device plugins will continue to work. At the same time, any new plugins that has required support for probing model (Identity service implementation), will also work.
**Release note**
```release-note
Add support kubelet plugin watcher in device manager.
```
/sig node
/area hw-accelerators
/cc /cc @jiayingz @RenaudWasTaken @vishh @ScorpioCPH @sjenning @derekwaynecarr @jeremyeder @lichuqiang @tengqm @saad-ali @chakri-nelluri @ConnorDoyle
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Remove --cadvisor-port - has been deprecated since v1.10
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#56523
**Special notes for your reviewer**:
- Deprecated in https://github.com/kubernetes/kubernetes/pull/59827 (v1.10)
- Disabled in https://github.com/kubernetes/kubernetes/pull/63881 (v1.11)
**Release note**:
```release-note
[action required] The formerly publicly-available cAdvisor web UI that the kubelet started using `--cadvisor-port` is now entirely removed in 1.12. The recommended way to run cAdvisor if you still need it, is via a DaemonSet.
```
Automatic merge from submit-queue (batch tested with PRs 65348, 65599, 65635, 65688, 65691). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
test/e2e_node/system/types_unix: support ZFS
**What this PR does / why we need it**:
Docker validation tests in the case of ZFS used as the graph driver
fail due to "zfs" not being present in the default Docker specification.
Add "zfs" in the GraphDriver slice.
kubeadm relies on the `DockerValidator` and pre-flight checks would fail if the user is using ZFS.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Updates kubernetes/kubeadm#930
**Special notes for your reviewer**:
NONE
/cc @kubernetes/sig-node-pr-reviews
/cc @kubernetes/sig-cluster-lifecycle-pr-reviews
/cc @kvaps (reported by)
/area node-e2e
/area kubeadm
**Release note**:
```release-note
Unix: support ZFS as a valid graph driver for Docker
```
Docker validation tests in the case of ZFS used as the graph driver
fail due to "zfs" not being present in the default Docker specification.
Add "zfs" in the GraphDriver slice.
Kubelet restart process seems to get a bit slower recently. From running
the gpu_device_plugin e2e_node test on GCE, I saw it took ~37 seconds
for kubelet to start CM DeviceManager after it restarts, and then took
~12 seconds for the gpu device plugin to re-register. As the result,
this e2e_node test fails because the current 10 sec waiting time is too
small. Restarting a container also seems to get slower that it sometimes
exceeds the current 2 min waiting time in ensurePodContainerRestart().
This change increase both waiting time to 5 min to leave enough space
on slower machines.
Automatic merge from submit-queue (batch tested with PRs 65449, 65373, 49410). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
add kernel config locations for fedora and atomic
**What this PR does / why we need it**:
* Fedora stores its kernel configs in /usr/lib/modules/$(uname -r)/config
* Fedora/CentOS/RHEL atomic hosts use /usr/lib/ostree-boot/$(uname -r), though this location is deprecated
* The lack of these locations in the validator is causing kubeadm to hang on "failed to parse kernel config" in its preflight checking on fedora and atomic host
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add metadata to kubelet eviction event annotations
**What this PR does / why we need it**:
Add annotations to kubelet eviction events. Annotations include
"offending_containers" : comma-seperated list of containers.
"offending_containers_usage": comma-seperated list of usage.
"starved_resource": v1.ResourceName of the starved resource
**Special notes for your reviewer**:
Adding annotations to events required changing the `EventRecorder` interface to add a `AnnotatedEventf` function, which can add annotations to an event.
**Release note**:
```release-note
NONE
```
/assign @dchen1107
cc @mwielgus @schylek @kgrygiel
Automatic merge from submit-queue (batch tested with PRs 64456, 64457, 64472). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
e2e node: mark pod cgroup test as [NodeConformance]
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 64456, 64457, 64472). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix dynamic kubelet config tests
**What this PR does / why we need it**:
The default for dynamic kubelet config is now true, so if it is unset, assume it is enabled for testing.
**Release note**:
```release-note
NONE
```
/sig node
/kind bug
/priority critical-urgent
/assign @mtaufen
Automatic merge from submit-queue (batch tested with PRs 64175, 63893). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
add colon separators to improve readability of test names
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 61963, 64279, 64130, 64125, 64049). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add node-exclusive tags to tests in test/e2e_node
Original issue: #59001
Depends on: #64128
Add node-exclusive tags based on the [proposal](https://docs.google.com/document/d/1BdNVUGtYO6NDx10x_fueRh_DLT-SVdlPC_SsXjYCHOE/edit#)
Follow-up PRs will:
- Tag the tests in `test/e2e/common`
- Change the test job configurations to use the new tests
- Remove the unused, non-node-exclusive tags in `test/e2e_node`
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 64013, 63896, 64139, 57527, 62102). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
add dynamic config metrics
This PR exports config-releated metrics from the Kubelet.
The Guages for active, assigned, and last-known-good config can be used
to identify config versions and produce aggregate counts across several
nodes. The error-reporting Gauge can be used to determine whether a node
is experiencing a config-related error, and to prodouce an aggregate
count of nodes in an error state.
https://github.com/kubernetes/features/issues/281
```release-note
The Kubelet now exports metrics that report the assigned (node_config_assigned), last-known-good (node_config_last_known_good), and active (node_config_active) config sources, and a metric indicating whether the node is experiencing a config-related error (node_config_error). The config source metrics always report the value 1, and carry the node_config_name, node_config_uid, node_config_resource_version, and node_config_kubelet_key labels, which identify the config version. The error metric reports 1 if there is an error, 0 otherwise.
```
Automatic merge from submit-queue (batch tested with PRs 64013, 63896, 64139, 57527, 62102). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Refactor test utils that deal with Kubelet metrics for clarity
I found these functions hard to understand, because the names did not
accurately reflect their behavior. For example, GetKubeletMetrics
assumed that all of the metrics passed in were measuring latency.
The caller of GetKubeletMetrics was implicitly making this assumption,
but it was not obvious at the call site.
```release-note
NONE
```
This PR exports config-releated metrics from the Kubelet.
The Guages for active, assigned, and last-known-good config can be used
to identify config versions and produce aggregate counts across several
nodes. The error-reporting Gauge can be used to determine whether a node
is experiencing a config-related error, and to prodouce an aggregate
count of nodes in an error state.
Automatic merge from submit-queue (batch tested with PRs 63881, 64046, 63409, 63402, 63221). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Kubelet responds to ConfigMap mutations for dynamic Kubelet config
This PR makes dynamic Kubelet config easier to reason about by leaving less room for silent skew scenarios. The new behavior is as follows:
- ConfigMap does not exist: Kubelet reports error status due to missing source
- ConfigMap is created: Kubelet starts using it
- ConfigMap is updated: Kubelet respects the update (but we discourage this pattern, in favor of incrementally migrating to a new ConfigMap)
- ConfigMap is deleted: Kubelet keeps using the config (non-disruptive), but reports error status due to missing source
- ConfigMap is recreated: Kubelet respects any updates (but, again, we discourage this pattern)
This PR also makes a small change to the config checkpoint file tree structure, because ResourceVersion is now taken into account when saving checkpoints. The new structure is as follows:
```
- dir named by --dynamic-config-dir (root for managing dynamic config)
| - meta
| - assigned (encoded kubeletconfig/v1beta1.SerializedNodeConfigSource object, indicating the assigned config)
| - last-known-good (encoded kubeletconfig/v1beta1.SerializedNodeConfigSource object, indicating the last-known-good config)
| - checkpoints
| - uid1 (dir for versions of object identified by uid1)
| - resourceVersion1 (dir for unpacked files from resourceVersion1)
| - ...
| - ...
```
fixes: #61643
```release-note
The dynamic Kubelet config feature will now update config in the event of a ConfigMap mutation, which reduces the chance for silent config skew. Only name, namespace, and kubeletConfigKey may now be set in Node.Spec.ConfigSource.ConfigMap. The least disruptive pattern for config management is still to create a new ConfigMap and incrementally roll out a new Node.Spec.ConfigSource.
```
I found these functions hard to understand, because the names did not
accurately reflect their behavior. For example, GetKubeletMetrics
assumed that all of the metrics passed in were measuring latency.
The caller of GetKubeletMetrics was implicitly making this assumption,
but it was not obvious at the call site.
Automatic merge from submit-queue (batch tested with PRs 63865, 57849, 63932, 63930, 63936). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Eviction Node e2e test checks for eviction reason
**What this PR does / why we need it**:
Currently, the eviction test simply ensures that pods are marked `Failed`. However, this could occur because of an OOM, rather than an eviction.
To ensure that pods are actually being evicted, check for the Reason in the pod status to ensure it is evicted.
**Release note**:
```release-note
NONE
```
cc @kubernetes/sig-node-pr-reviews
Updates dynamic Kubelet config to use a structured status, rather than a
node condition. This makes the status machine-readable, and thus more
useful for config orchestration.
Fixes: #56896
Automatic merge from submit-queue (batch tested with PRs 63624, 59847). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
explicit kubelet config key in Node.Spec.ConfigSource.ConfigMap
This makes the Kubelet config key in the ConfigMap an explicit part of
the API, so we can stop using magic key names.
As part of this change, we are retiring ConfigMapRef for ConfigMap.
```release-note
You must now specify Node.Spec.ConfigSource.ConfigMap.KubeletConfigKey when using dynamic Kubelet config to tell the Kubelet which key of the ConfigMap identifies its config file.
```
The names were made invalid for the CgroupName refactor in #62541, so
update them here.
Furthermore, as the new names are now compatible with what
EnforceNodeAllocatable wants, reuse the constants there as well.
Tested:
$ make test-e2e-node REMOTE=true HOSTS=test-cos-beta-67-10575-27-0 FOCUS='Validate Node Allocatable' SKIP='' TEST_ARGS='--feature-gates=DynamicKubeletConfig=true'
• [SLOW TEST:39.488 seconds]
[k8s.io] Node Container Manager [Serial]
Validate Node Allocatable
set's up the node and runs the test
Ran 1 of 261 Specs in 57.348 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 260 Skipped
This makes the Kubelet config key in the ConfigMap an explicit part of
the API, so we can stop using magic key names.
As part of this change, we are retiring ConfigMapRef for ConfigMap.
Automatic merge from submit-queue (batch tested with PRs 61455, 63346, 63130, 63404). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
[Device-Plugin]: Extend e2e test to cover node allocatables
**What this PR does / why we need it**:
Extends device plugin e2e to cover node allocatable
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
None
```
/sig node
/area hw-accelerators
/cc @jiayingz @vishh @RenaudWasTaken
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Use a []string for CgroupName, which is a more accurate internal representation
**What this PR does / why we need it**:
This is purely a refactoring and should bring no essential change in behavior.
It does clarify the cgroup handling code quite a bit.
It is preparation for further changes we might want to do in the cgroup hierarchy. (But it's useful on its own, so even if we don't do any, it should still be considered.)
**Special notes for your reviewer**:
The slice of strings more precisely captures the hierarchic nature of the cgroup paths we use to represent pods and their groupings.
It also ensures we're reducing the chances of passing an incorrect path format to a cgroup driver that requires a different path naming, since now explicit conversions are always needed.
The new constructor `NewCgroupName` starts from an existing `CgroupName`, which enforces a hierarchy where a root is always needed. It also performs checking on the component names to ensure invalid characters ("/" and "_") are not in use.
A `RootCgroupName` for the top of the cgroup hierarchy tree is introduced.
This refactor results in a net reduction of around 30 lines of code,
mainly with the demise of ConvertCgroupNameToSystemd which had fairly
complicated logic in it and was doing just too many things.
There's a small TODO in a helper `updateSystemdCgroupInfo` that was introduced to make this commit possible. That logic really belongs in libcontainer, I'm planning to send a PR there to include it there. (The API already takes a field with that information, only that field is only processed in cgroupfs and not systemd driver, we should fix that.)
Tested: By running the e2e-node tests on both Ubuntu 16.04 (with cgroupfs driver) and CentOS 7 (with systemd driver.)
**NOTE**: I only tested this with dockershim, we should double-check that this works with the CRI endpoints too, both in cgroupfs and systemd modes.
/assign @derekwaynecarr
/assign @dashpole
/assign @Random-Liu
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Update all script shebangs to use /usr/bin/env interpreter instead of /bin/interpreter
This is required to support systems where bash doesn't reside in /bin (such as NixOS, or the *BSD family) and allow users to specify a different interpreter version through $PATH manipulation.
https://www.cyberciti.biz/tips/finding-bash-perl-python-portably-using-env.html
```release-note
Use /usr/bin/env in all script shebangs to increase portability.
```
The slice of strings more precisely captures the hierarchic nature of
the cgroup paths we use to represent pods and their groupings.
It also ensures we're reducing the chances of passing an incorrect path
format to a cgroup driver that requires a different path naming, since
now explicit conversions are always needed.
The new constructor NewCgroupName starts from an existing CgroupName,
which enforces a hierarchy where a root is always needed. It also
performs checking on the component names to ensure invalid characters
("/" and "_") are not in use.
A RootCgroupName for the top of the cgroup hierarchy tree is introduced.
This refactor results in a net reduction of around 30 lines of code,
mainly with the demise of ConvertCgroupNameToSystemd which had fairly
complicated logic in it and was doing just too many things.
There's a small TODO in a helper updateSystemdCgroupInfo that was
introduced to make this commit possible. That logic really belongs in
libcontainer, I'm planning to send a PR there to include it there.
(The API already takes a field with that information, only that field is
only processed in cgroupfs and not systemd driver, we should fix that.)
Tested by running the e2e-node tests on both Ubuntu 16.04 (with cgroupfs
driver) and CentOS 7 (with systemd driver.)
Automatic merge from submit-queue (batch tested with PRs 58474, 60034, 62101, 63198). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix wrong usage of kubelet option
**What this PR does / why we need it**:
"--allow-privileged true" is incorrect usage of boolean option.
It means setting '--allow-priviledged' to its default value plus
non-existing subcommand 'true'.
"--allow-privileged false" is even more confusing as it sets
allow-priviledged flag to its default value 'true'
This is true for any boolean command line option.
Fixed this by using correct syntax --allow-priviledged=true
**Special notes for your reviewer**:
This is a show-stopper for PR #61833
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
E2e path utils
**What this PR does / why we need it**:
A bunch of useful methods for getting k8s paths and stuff are secreted away in `e2e_node`. This PR pulls them out so they can be used in other E2E method.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
**Special notes for your reviewer**:
This is motivated by the upcoming kubeadm-specific E2E tests. Those tests will be added in a follow-up to this PR.
**Release note**:
```release-note
NONE
```
https://github.com/kubernetes/kubernetes/pull/62913 switched from using a client pool, where each groupVersionResource got its own rest client, to a single client.
This increases the QPS to account for increased requests using a single rest client rate limiter.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
unpack dynamic kubelet config payloads to files
This PR unpacks the downloaded ConfigMap to a set of files on the node.
This enables other config files to ride alongside the
KubeletConfiguration, and the KubeletConfiguration to refer to these
cohabitants with relative paths.
This PR also stops storing dynamic config metadata (e.g. current,
last-known-good config records) in the same directory as config
checkpoints. Instead, it splits the storage into `meta` and
`checkpoints` dirs.
The current store dir structure is as follows:
```
- dir named by --dynamic-config-dir (root for managing dynamic config)
| - meta (dir for metadata, e.g. which config source is currently assigned, last-known-good)
| - current (a serialized v1 NodeConfigSource object, indicating the assigned config)
| - last-known-good (a serialized v1 NodeConfigSource object, indicating the last-known-good config)
| - checkpoints (dir for config checkpoints)
| - uid1 (dir for unpacked config, identified by uid1)
| - file1
| - file2
| - ...
| - uid2
| - ...
```
There are some likely changes to the above structure before dynamic config goes beta, such as renaming "current" to "assigned" for clarity, and extending the checkpoint identifier to include a resource version, as part of resolving #61643.
```release-note
NONE
```
/cc @luxas @smarterclayton
This PR unpacks the downloaded ConfigMap to a set of files on the node.
This enables other config files to ride alongside the
KubeletConfiguration, and the KubeletConfiguration to refer to these
cohabitants with relative paths.
This PR also stops storing dynamic config metadata (e.g. current,
last-known-good config records) in the same directory as config
checkpoints. Instead, it splits the storage into `meta` and
`checkpoints` dirs.
"--allow-privileged true" is incorrect usage of boolean option.
It means setting '--allow-priviledged' to its default value plus
non-existing subcommand 'true'.
"--allow-privileged false" is even more confusing as it sets
allow-priviledged flag to its default value 'true'
This is true for any boolean command line option.
Fixed this by using correct syntax --allow-priviledged=true
Fixed generating of kubelet command line in addKubeletConfigFlags
function.
Automatic merge from submit-queue (batch tested with PRs 62192, 61866, 62206, 62360). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Remove rkt references in the codebase
```release-note
None
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fixes restartKubelet in test/e2e_node failure.
Looks like there is some recent change on how we start kubelet service
in test_e2e_node. Fixes restartKubelet() to get right kubelet service
name to cope with the change.
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes kubelet-serial-gce-e2e test failure:
https://k8s-testgrid.appspot.com/wg-resource-management#kubelet-serial-gce-e2e
Thanks a lot to @mindprince for noticing this!
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Looks like there is some recent change on how we start kubelet service
in test_e2e_node. Fixes restartKubelet() to get right kubelet service
name to cope with the change.
Automatic merge from submit-queue (batch tested with PRs 61498, 62030). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Delete in-tree support for NVIDIA GPUs.
This removes the alpha Accelerators feature gate which was deprecated in 1.10 (#57384).
The alternative feature DevicePlugins went beta in 1.10 (#60170).
Fixes#54012
```release-note
Support for "alpha.kubernetes.io/nvidia-gpu" resource which was deprecated in 1.10 is removed. Please use the resource exposed by DevicePlugins instead ("nvidia.com/gpu").
```
Automatic merge from submit-queue (batch tested with PRs 61482, 61740). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Make systemd service name for kubelet use a timestamp in e2e-node tests.
**What this PR does / why we need it**:
This makes it easier to figure out which execution was last when looking at the output of `systemd list-units kubelet-*.service`.
We try to find the name of the /tmp/node-e2e-* directory and use the same timestamp if we can. Otherwise, we just call Now() again, which isn't as nice (as the unit name and directory name will not match) but will still produce unit names that will be ordered when launching multiple subsequent executions on the same host.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
N/A
**Special notes for your reviewer**:
Tested using `make test-e2e-node REMOTE=true` and then checking `systemctl list-units kubelet-*.service` on the target host.
```
$ systemctl list-units kubelet-*.service
kubelet-20180326T142016.service loaded active exited /tmp/node-e2e-20180326T142016/kubelet --kubeconfig /tmp/node-e2e-20180326T142016/kubeconfig --root-dir /var/lib/kubelet ...
kubelet-20180326T143550.service loaded active exited /tmp/node-e2e-20180326T143550/kubelet --kubeconfig /tmp/node-e2e-20180326T143550/kubeconfig --root-dir /var/lib/kubelet ...
```
The units are sorted in the order they were launched.
**Release note**:
```release-note
NONE
```
The numbers will only be available when docker.service has its own
memory and cpu cgroups, which doesn't necessarily happen unless the unit
has Delegate=yes configured.
Let's work around that by checking the status of Delegate, in the case
where we are:
* running Docker
* running Systemd
* able to check the status through systemctl
* the status is explicitly Delegate=no (the default)
If all of those are true, let's make CPU and Memory expectations
optional.
Tested: make test-e2e-node REMOTE=true HOSTS=centos-e2e-node FOCUS="Summary API"
This is necessary to show any RootFs usage on systems where the backing
filesystem of overlay2 is xfs.
The current test only created directories (for mount points) in the
upper layer of the overlay. Outside of the mount namespace, only the
directories are visible. When running `du` on those, usually filesystems
will show some usage, but not xfs, which shows a disk usage of 0 for
directories.
Fix this by creating a file in the root directory, outside the volumes,
in order to trigger some disk usage that can be measured by `du`.
Tested: make test-e2e-node REMOTE=true HOSTS=centos-e2e-node FOCUS="Summary API"
Automatic merge from submit-queue (batch tested with PRs 61829, 61908, 61307, 61872, 60100). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
node authorizer sets up access rules for dynamic config
This PR makes the node authorizer automatically set up access rules for
dynamic Kubelet config.
I also added some validation to the node strategy, which I discovered we
were missing while writing this.
This PR is based on another WIP from @liggitt.
```release-note
The node authorizer now automatically sets up rules for Node.Spec.ConfigSource when the DynamicKubeletConfig feature gate is enabled.
```
This makes it easier to figure out which execution was last when looking
at the output of `systemd list-units kubelet-*.service`.
We try to find the name of the /tmp/node-e2e-* directory and use the
same timestamp if we can. Otherwise, we just call Now() again, which
isn't as nice (as the unit name and directory name will not match) but
will still produce unit names that will be ordered when launching
multiple subsequent executions on the same host.
Curl is more ubiquitous than wget. For instance, the GCE centos-7 and
rhel-7 image families ship curl by default, but not wget.
Looking at the shell scripts under cluster/, they tend to use curl more
than wget. (The ones that use wget, such as get-kube.sh, try curl first
and only fallback to wget if it's not available.)
Tested: by running node-e2e-test on Ubuntu, COS and CentOS.
This PR makes the node authorizer automatically set up access rules for
dynamic Kubelet config.
I also added some validation to the node strategy, which I discovered we
were missing while writing this.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Replace package "golang.org/x/net/context" with "context"
**What this PR does / why we need it**:
Replace package "golang.org/x/net/context" with "context"
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#60560
**Special notes for your reviewer**:
As of Go 1.7 this package(golang.org/x/net/context) is available in the standard library under the name context. see (https://godoc.org/golang.org/x/net/context)
It is almost machinery replace.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 61378, 60915, 61499, 61507, 61478). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Capture pod startup phases as metrics
Learning from https://github.com/kubernetes/kubernetes/issues/60589, we should also start collecting and graphing sub-parts of pod-startup latency.
/sig scalability
/kind feature
/priority important-soon
/cc @wojtek-t
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 59740, 59728, 60080, 60086, 58714). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
more concise to merge the slice
**What this PR does / why we need it**:
more concise to merge the slice
**Special notes for your reviewer**:
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fixes the races around devicemanager Allocate() and endpoint deletion.
There is a race in predicateAdmitHandler Admit() that getNodeAnyWayFunc()
could get Node with non-zero deviceplugin resource allocatable for a
non-existing endpoint. That race can happen when a device plugin fails,
but is more likely when kubelet restarts as with the current registration
model, there is a time gap between kubelet restart and device plugin
re-registration. During this time window, even though devicemanager could
have removed the resource initially during GetCapacity() call, Kubelet
may overwrite the device plugin resource capacity/allocatable with the
old value when node update from the API server comes in later. This
could cause a pod to be started without proper device runtime config set.
To solve this problem, introduce endpointStopGracePeriod. When a device
plugin fails, don't immediately remove the endpoint but set stopTime in
its endpoint. During kubelet restart, create endpoints with stopTime set
for any checkpointed registered resource. The endpoint is considered to be
in stopGracePeriod if its stoptime is set. This allows us to track what
resources should be handled by devicemanager during the time gap.
When an endpoint's stopGracePeriod expires, we remove the endpoint and
its resource. This allows the resource to be exported through other channels
(e.g., by directly updating node status through API server) if there is such
use case. Currently endpointStopGracePeriod is set as 5 minutes.
Given that an endpoint is no longer immediately removed upon disconnection,
mark all its devices unhealthy so that we can signal the resource allocatable
change to the scheduler to avoid scheduling more pods to the node.
When a device plugin endpoint is in stopGracePeriod, pods requesting the
corresponding resource will fail admission handler.
Tested:
Ran GPUDevicePlugin e2e_node test 100 times and all passed now.
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/kubernetes/issues/60176
**Special notes for your reviewer**:
**Release note**:
```release-note
Fixes the races around devicemanager Allocate() and endpoint deletion.
```
There is a race in predicateAdmitHandler Admit() that getNodeAnyWayFunc()
could get Node with non-zero deviceplugin resource allocatable for a
non-existing endpoint. That race can happen when a device plugin fails,
but is more likely when kubelet restarts as with the current registration
model, there is a time gap between kubelet restart and device plugin
re-registration. During this time window, even though devicemanager could
have removed the resource initially during GetCapacity() call, Kubelet
may overwrite the device plugin resource capacity/allocatable with the
old value when node update from the API server comes in later. This
could cause a pod to be started without proper device runtime config set.
To solve this problem, introduce endpointStopGracePeriod. When a device
plugin fails, don't immediately remove the endpoint but set stopTime in
its endpoint. During kubelet restart, create endpoints with stopTime set
for any checkpointed registered resource. The endpoint is considered to be
in stopGracePeriod if its stoptime is set. This allows us to track what
resources should be handled by devicemanager during the time gap.
When an endpoint's stopGracePeriod expires, we remove the endpoint and
its resource. This allows the resource to be exported through other channels
(e.g., by directly updating node status through API server) if there is such
use case. Currently endpointStopGracePeriod is set as 5 minutes.
Given that an endpoint is no longer immediately removed upon disconnection,
mark all its devices unhealthy so that we can signal the resource allocatable
change to the scheduler to avoid scheduling more pods to the node.
When a device plugin endpoint is in stopGracePeriod, pods requesting the
corresponding resource will fail admission handler.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add node-e2e test for ShareProcessNamespace
**What this PR does / why we need it**: Adds a node-e2e test for kubernetes/features#495
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#59554
**Special notes for your reviewer**: This requires a feature gate to be enabled in both the kubelet and API server. I'm not sure which jenkins configs need to be updated (or if these are even still used) so I just updated a pile of them.
opened kubernetes/test-infra#7030 for https://github.com/kubernetes/test-infra/blob/master/jobs/config.json
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fix todo:Move function readinessCheck to util
**What this PR does / why we need it**:
fix todo:Move function readinessCheck to util in test/e2e_node/services
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 60376, 55584, 60358, 54631, 60291). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
remove gcloud docker -- since it's deprecated
docker handles this now and it raises an error.
try 3
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 60236, 60332, 57375, 60451, 57408). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Update device plugin e2e_node test to not changing Kubelet config
as DevicePlugins feature is enabled by default now.
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 59159, 60318, 60079, 59371, 57415). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Made a couple API changes to deviceplugin/v1beta1 to avoid future
incompatible API changes:
- Add GetDevicePluginOptions rpc call. This is needed when we switch
from Registration service to probe-based plugin watcher.
- Change AllocateRequest and AllocateResponse to allow device requests
from multiple containers in a pod. Currently only made mechanical
change on the devicemanager and test code to cope with the API but
still issues an Allocate call per container. We can modify the
devicemanager in 1.11 to issue a single Allocate call per pod.
The change will also facilitate incremental API change to communicate
pod level information through Allocate rpc if there is such future
need.
**What this PR does / why we need it**:
Made a couple API changes to deviceplugin/v1beta1 to avoid future incompatible API changes.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/kubernetes/issues/59370
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 60324, 60269, 59771, 60314, 59941). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
expunge the word 'manifest' from Kubelet's config API
The word 'manifest' technically refers to a container-group specification
that predated the Pod abstraction. We should avoid using this legacy
terminology where possible. Fortunately, the Kubelet's config API will
be beta in 1.10 for the first time, so we still had the chance to make
this change.
I left the flags alone, since they're deprecated anyway.
I changed a few var names in files I touched too, but this PR is the
just the first shot, not the whole campaign
(`git grep -i manifest | wc -l -> 1248`).
```release-note
Some field names in the Kubelet's now v1beta1 config API differ from the v1alpha1 API: PodManifestPath is renamed to PodPath, ManifestURL is renamed to PodURL, ManifestURLHeader is renamed to PodURLHeader.
```