Automatic merge from submit-queue (batch tested with PRs 50381, 51307, 49645, 50995, 51523)
Remove deprecated and experimental fields from KubeletConfiguration
As we work towards providing a stable (v1) kubeletconfig API,
we cannot afford to have deprecated or "experimental" (alpha) fields
living in the KubeletConfiguration struct. This removes all existing
experimental or deprecated fields, and places them in KubeletFlags
instead.
I'm going to send another PR after this one that organizes the remaining
fields into substructures for readability. Then, we should try to move
to v1 ASAP (maybe not v1 in 1.8, given how close we are, but definitely in 1.9).
It makes far more sense to focus on a clean API in kubeletconfig v2,
than to try and further clean up the existing "API" that everyone
already depends on.
fixes: #51657
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51628, 51637, 51490, 51279, 51302)
Fix pod local ephemeral storage usage calculation
We use podDiskUsage to calculate pod local ephemeral storage which is not correct, because podDiskUsage also contains HostPath volume which is considered as persistent storage
This pr fixes it
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#51489
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
/assign @jingxu97 @vishh
cc @ddysher
Automatic merge from submit-queue (batch tested with PRs 51513, 51515, 50570, 51482, 51448)
Add PVCRef to VolumeStats
**What this PR does / why we need it**:
For pod volumes that reference a PVC, add a PVCRef to the corresponding
volume stat. This allows metrics to be indexed/queried by PVC name
which is more user-friendly than Pod reference
**Which issue this PR fixes** : [#363](https://github.com/kubernetes/features/issues/363)
**Special notes for your reviewer**:
**Release note**:
```
`VolumeStats` reported by the kubelet stats summary API
(http://<node>:10255/stats/summary) now include a PVCRef
field describing the PVC referenced by the volume (if any).
```
As we work towards providing a stable (v1) kubeletconfig API,
we cannot afford to have deprecated or "experimental" (alpha) fields
living in the KubeletConfiguration struct. This removes all existing
experimental or deprecated fields, and places them in KubeletFlags
instead.
I'm going to send another PR after this one that organizes the remaining
fields into substructures for readability. Then, we should try to move
to v1 ASAP.
It makes far more sense to focus on a clean API in kubeletconfig v2,
than to try and further clean up the existing "API" that everyone
already depends on.
For pod volumes that reference a PVC, add a PVCRef to the corresponding
volume stat. This allows metrics to be indexed/queried by PVC name
which is more user-friendly than Pod reference
Automatic merge from submit-queue
Make coreos test images sshd not allow password login.
This will prevent security scanners from triggering.
Configuration is verbatim from:
https://coreos.com/os/docs/latest/customizing-sshd.html
```release-note
NONE
```
Configuration is based on:
https://coreos.com/os/docs/latest/customizing-sshd.html
The specific SSHD config is:
# Use most defaults for sshd configuration.
UsePrivilegeSeparation sandbox
Subsystem sftp internal-sftp
ClientAliveInterval 180
UseDNS no
UsePAM yes
PrintLastLog no # handled by PAM
PrintMotd no # handled by PAM
AuthenticationMethods publickey
This will prevent security scanners from triggering.
Automatic merge from submit-queue (batch tested with PRs 51224, 51191, 51158, 50669, 51222)
Enable overlay2 on cos-m60 in node e2e tests
Ref: https://github.com/kubernetes/kubernetes/issues/42926
- Restart docker with `-s overlay2` in cloud-init before running all node e2e tests. I have to copy the systemd unit file to `/etc/systemd/system` because the `/usr/lib/systemd/system/` is read only.
- Updated node e2e tests to use the new cos-m60 image.
- The name of the cloud init file (`cos-init-live-restore.yaml`) does not indicate overlay2 will be enabled, but I can't just change the name in this PR, since it's referenced in test-infra.
**Release note**:
```
None
```
/assign @Random-Liu
Automatic merge from submit-queue (batch tested with PRs 51113, 46597, 50397, 51052, 51166)
implement proposal 34058: hostPath volume type
**What this PR does / why we need it**:
implement proposal #34058
**Which issue this PR fixes** : fixes#46549
**Special notes for your reviewer**:
cc @thockin @luxas @euank PTAL
Automatic merge from submit-queue (batch tested with PRs 50257, 50247, 50665, 50554, 51077)
Replace hard-code "cpu" and "memory" to consts
**What this PR does / why we need it**:
There are many places using hard coded "cpu" and "memory" as resource name. This PR replace them to consts.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
/kind cleanup
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50967, 50505, 50706, 51033, 51028)
Revert "Merge pull request #51008 from kubernetes/revert-50789-fix-scheme"
I'm spinning up a cluster right now to test this fix, but I'm pretty sure this was the problem.
There doesn't seem to be a way to confirm from logs, because AFAICT the logs from the hollow kubelet containers are not collected as part of the kubemark test.
**What this PR does / why we need it**:
This reverts commit f4afdecef8, reversing
changes made to e633a1604f.
This also fixes a bug where Kubemark was still using the core api scheme
to manipulate the Kubelet's types, which was the cause of the initial
revert.
**Which issue this PR fixes**: fixes#51007
**Release note**:
```release-note
NONE
```
/cc @shyamjvs @wojtek-t
Automatic merge from submit-queue (batch tested with PRs 50693, 50831, 47506, 49119, 50871)
Add instance metadata from flag even when using image config.
Also add instance metadata from flag even when we are using image config.
* Sometimes we need to dynamically generate instance metadata, it's troublesome to put them into image config.
* Sometimes we want to apply instance metadata to all images, it's duplicated to add them to each image in the image config.
/assign @yguo0905 Could you help me review this?
This reverts commit f4afdecef8, reversing
changes made to e633a1604f.
This also fixes a bug where Kubemark was still using the core api scheme
to manipulate the Kubelet's types, which was the cause of the initial
revert.
Automatic merge from submit-queue (batch tested with PRs 49847, 49743, 49853, 50225, 50479)
Add node benchmark tests for cos-m60 with docker 1.12.6
Ref: https://github.com/kubernetes/kubernetes/issues/42926
This PR adds a benchmark tests against cos-m60 with docker 1.12.6 on http://node-perf-dash.k8s.io. This test is useful for docker validation -- we can compare the performance of different dockers on the same OS.
cos-m60 comes with docker 1.13.1 by default, so we need to use cloud-init to downgrade the version to 1.12.6.
**Release note**:
```
None
```
/assign @dchen1107
Automatic merge from submit-queue (batch tested with PRs 49725, 50367, 50391, 48857, 50181)
Add e2e test for privileged containers
**What this PR does / why we need it**:
This PR adds node e2e test for privileged containers.
**Which issue this PR fixes**
Part of #44118.
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
/assign @Random-Liu
Automatic merge from submit-queue (batch tested with PRs 50016, 49583, 49930, 46254, 50337)
Alpha Dynamic Kubelet Configuration
Feature: https://github.com/kubernetes/features/issues/281
This proposal contains the alpha implementation of the Dynamic Kubelet Configuration feature proposed in ~#29459~ [community/contributors/design-proposals/dynamic-kubelet-configuration.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/dynamic-kubelet-configuration.md).
Please note:
- ~The proposal doc is not yet up to date with this implementation, there are some subtle differences and some more significant ones. I will update the proposal doc to match by tomorrow afternoon.~
- ~This obviously needs more tests. I plan to write several O(soon). Since it's alpha and feature-gated, I'm decoupling this review from the review of the tests.~ I've beefed up the unit tests, though there is still plenty of testing to be done.
- ~I'm temporarily holding off on updating the generated docs, api specs, etc, for the sake of my reviewers 😄~ these files now live in a separate commit; the first commit is the one to review.
/cc @dchen1107 @vishh @bgrant0607 @thockin @derekwaynecarr
```release-note
Adds (alpha feature) the ability to dynamically configure Kubelets by enabling the DynamicKubeletConfig feature gate, posting a ConfigMap to the API server, and setting the spec.configSource field on Node objects. See the proposal at https://github.com/kubernetes/community/blob/master/contributors/design-proposals/dynamic-kubelet-configuration.md for details.
```
Automatic merge from submit-queue
Add waitForFailure for e2e test framework
**What this PR does / why we need it**:
Add waitForFailure for e2e test framework, this could reduce the reliance on logs.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
Part of #44118. Refer https://github.com/kubernetes/kubernetes/pull/48858#discussion_r128331726
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50087, 39587, 50042, 50241, 49914)
Add node e2e test for Docker's shared PID namespace
Ref: https://github.com/kubernetes/kubernetes/issues/42926
This PR adds a simple test for the shared PID namespace that's enabled when Docker is 1.13.1+.
/sig node
/area node-e2e
/assign @yujuhong
**Release note**:
```
None
```
Automatic merge from submit-queue (batch tested with PRs 49916, 50050)
Update images used in the node e2e benchmark tests
Ref: https://github.com/kubernetes/kubernetes/issues/42926
- Update the cosbeta image since the new version contains a 'du' command fix that affects Docker performance.
- Add the coreos and ubuntu image that run Docker 1.12.6 so that we will have more data to compare.
**Release note**:
```
None
```
Automatic merge from submit-queue (batch tested with PRs 49651, 49707, 49662, 47019, 49747)
Add support for `no_new_privs` via AllowPrivilegeEscalation
**What this PR does / why we need it**:
Implements kubernetes/community#639
Fixes#38417
Adds `AllowPrivilegeEscalation` and `DefaultAllowPrivilegeEscalation` to `PodSecurityPolicy`.
Adds `AllowPrivilegeEscalation` to container `SecurityContext`.
Adds the proposed behavior to `kuberuntime`, `dockershim`, and `rkt`. Adds a bunch of unit tests to ensure the desired default behavior and that when `DefaultAllowPrivilegeEscalation` is explicitly set.
Tests pass locally with docker and rkt runtimes. There are also a few integration tests with a `setuid` binary for sanity.
**Release note**:
```release-note
Adds AllowPrivilegeEscalation to control whether a process can gain more privileges than it's parent process
```
Automatic merge from submit-queue (batch tested with PRs 45813, 49594, 49443, 49167, 47539)
Add node e2e tests for GKE environment
Ref: https://github.com/kubernetes/kubernetes/issues/46891
This PR adds node e2e tests for validating images used on GKE.
- We pass the `SYSTEM_SPEC_NAME` to the node e2e test process via the flag `--system-spec-name` so that we can skip the environment specific tests using `RunIfSystemSpecNameIs()`.
- Also added `SkipIfContainerRuntimeIs()` as the opposite of `RunIfContainerRuntimeIs()`.
**Release note**:
```
None
```
Automatic merge from submit-queue (batch tested with PRs 46913, 48910, 48858, 47160)
Add e2e test for readOnlyRootFilesystem containers
**What this PR does / why we need it**:
This PR adds node e2e test for readOnlyRootFilesystem containers.
**Which issue this PR fixes**
Part of #44118.
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 48976, 49474, 40050, 49426, 49430)
Use presence of kubeconfig file to toggle standalone mode
Fixes#40049
```release-note
The deprecated --api-servers flag has been removed. Use --kubeconfig to provide API server connection information instead. The --require-kubeconfig flag is now deprecated. The default kubeconfig path is also deprecated. Both --require-kubeconfig and the default kubeconfig path will be removed in Kubernetes v1.10.0.
```
/cc @kubernetes/sig-cluster-lifecycle-misc @kubernetes/sig-node-misc
Automatic merge from submit-queue (batch tested with PRs 48224, 45431, 45946, 48775, 49396)
Update cos-dev image in benchmark tests to cos-dev-61-9759-0-0
Ref: https://github.com/kubernetes/kubernetes/issues/42926
`cos-dev-61-9759-0-0` contains a fix in Linux utility `du` that would affect the measurement of docker performance in kubelet. I'd like to update the benchmark to use the new image.
**Release note**:
```
None
```
/assign @tallclair
/cc @kewu1992 @abgworrall
Automatic merge from submit-queue (batch tested with PRs 48636, 49088, 49251, 49417, 49494)
Fix issues for local storage allocatable feature
This PR fixes the following issues:
1. Use ResourceStorageScratch instead of ResourceStorage API to represent
local storage capacity
2. In eviction manager, use container manager instead of node provider
(kubelet) to retrieve the node capacity and reserved resources. Node
provider (kubelet) has a feature gate so that storagescratch information
may not be exposed if feature gate is not set. On the other hand,
container manager has all the capacity and allocatable resource
information.
This PR fixes issue #47809
Replaces use of --api-servers with --kubeconfig in Kubelet args across
the turnup scripts. In many cases this involves generating a kubeconfig
file for the Kubelet and placing it in the correct location on the node.
Automatic merge from submit-queue
remove redundant param in e2e_node/remote
**What this PR does / why we need it**:
* remove redundant param in e2e_node/remote/remote.go
* fix a small typo
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
None
```
Automatic merge from submit-queue (batch tested with PRs 49316, 46117, 49064, 48073, 49323)
Test Ubuntu image using GKE image spec on master
Ref: https://github.com/kubernetes/kubernetes/issues/46891
This PR changes the files referenced in test-infra for running Ubuntu image tests against GKE system spec on master.
The two properties files are shared by the tests against all k8s branches but the `SYSTEM_SPEC_NAME` is only available on master. This should be fine because the tests in the non master branches will just ignore the unknown env variable.
**Release note**:
```
None
```
/assign @yujuhong
Automatic merge from submit-queue (batch tested with PRs 48377, 48940, 49144, 49062, 49148)
Add cos-dev-61-9733-0-0 to the benchmark tests
Ref: https://github.com/kubernetes/kubernetes/issues/42926
m60 has docker 1.13.1 while m61 has 17.03. This PR adds m61 to the benchmark tests so that we will have more data to compare.
PS: We will support fetching the latest image in an image family in the node e2e tests in the future.
**Release note**:
```
None
```
/assign @yujuhong
/cc @kewu1992 @abgworrall
Automatic merge from submit-queue (batch tested with PRs 48890, 46893, 48872, 48896)
Support customized system spec in the node conformance test and create the GKE system spec
ref: https://github.com/kubernetes/kubernetes/issues/46891
- System specs are located in `test/e2e_node/system/specs`. Created one for validating GKE images in `test/e2e_node/system/specs/gke.yaml`.
- `--image-spec-name` can be used to specify a system spec in node e2e and conformance tests. This option maps to `SYSTEM_SPEC_NAME` in a test properties file, which is the user facing configuration. So, users can specify `SYSTEM_SPEC_NAME=gke` to run the image validation using the GKE system spec.
- If `SYSTEM_SPEC_NAME` is unspecified, the default spec (`system.DefaultSysSpec`) will be used.
- We can also use `make test-e2e-node SYSTEM_SPEC_NAME=gke` to run tests using GKE image spec.
**Release note**:
`None`
Automatic merge from submit-queue
add dockershim checkpoint node e2e test
Add a bunch of disruptive cases to test kubelet/dockershim's checkpoint work flow.
Some steps are quite hacky. Not sure if there is better ways to do things.
This PR fixes the following issues:
1. Use ResourceStorageScratch instead of ResourceStorage API to represent
local storage capacity
2. In eviction manager, use container manager instead of node provider
(kubelet) to retrieve the node capacity and reserved resources. Node
provider (kubelet) has a feature gate so that storagescratch information
may not be exposed if feature gate is not set. On the other hand,
container manager has all the capacity and allocatable resource
information.
Automatic merge from submit-queue (batch tested with PRs 48698, 48712, 48516, 48734, 48735)
Name change: s/timstclair/tallclair/
I changed my name, and I'm migrating my user name to be consistent.
Automatic merge from submit-queue (batch tested with PRs 46865, 48661, 48598, 48658, 48614)
Move metrics_grabbert to test/e2e
cc @aleksandra-malinowska
Automatic merge from submit-queue
Changes node e2e tests to use the new Ubuntu image
ref: https://github.com/kubernetes/kubernetes/issues/46891
`ubuntu-docker10` and `ubuntu-docker12` images are deprecated in favor of the new one.
**Release note**:
```
None
```
/sig node
/area node-e2e
/assign @dchen1107
Automatic merge from submit-queue (batch tested with PRs 48074, 47971, 48044, 47514, 47647)
e2e: bump kubelet's resurce usage limit
We don't have per-OS image limits. Bumping these to more generous
numbers to not fail the tests.
Automatic merge from submit-queue (batch tested with PRs 47993, 47892, 47591, 47469, 47845)
Bump up npd version to v0.4.1
```
Bump up npd version to v0.4.1
```
Fixes#47219
Automatic merge from submit-queue (batch tested with PRs 47922, 47195, 47241, 47095, 47401)
Run cAdvisor on the same interface as kubelet
**What this PR does / why we need it**:
cAdvisor currently binds to all interfaces. Currently the only
solution is to use iptables to block access to the port. We
are better off making cAdvisor to bind to the interface that
kubelet uses for better security.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Fixes#11710
**Special notes for your reviewer**:
**Release note**:
```release-note
cAdvisor binds only to the interface that kubelet is running on instead of all interfaces.
```
Automatic merge from submit-queue (batch tested with PRs 46327, 47166)
mark --network-plugin-dir deprecated for kubelet
**What this PR does / why we need it**:
**Which issue this PR fixes** : fixes#43967
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 47530, 47679)
Use cos-stable-59-9460-64-0 instead of cos-beta-59-9460-20-0.
Remove dead code that has now moved to another repo as part of #47467
**Release note**:
```release-note
NONE
```
/sig node
- It contains a fix for ipaliasing.
- It contains a fix which decouples GPU driver installation from kernel
version.
Remove dead code that has now moved to another repo as part of #47467
Automatic merge from submit-queue
Shorten eviction tests, and increase test suite timeout
After #43590, the eviction manager is less aggressive when evicting pods. Because of that, many runs in the flaky suite time out.
To shorten the inode eviction test, I have lowered the eviction threshold.
To shorten the allocatable eviction test, I now set KubeReserved = NodeMemoryCapacity - 200Mb, so that any pod using 200Mb will be evicted. This shortens this test from 40 minutes, to 10 minutes.
While this should be enough to not hit the flaky suite timeout anymore, it is better to keep lower individual test timeouts than a lower suite timeout, since hitting the suite timeout means that even successful test runs are not reported.
/assign @Random-Liu @mtaufen
issue: #31362
When no nvidia device was attached, the -ne check had a syntax error:
sh: -ne: argument expected
This resulted in 'Success' being echoed and the test passing incorrectly.
This was found while debugging issue #47216
Automatic merge from submit-queue
Kubelet: rename cri package name to pkg/kubelet/apis/cri/v1alpha1/runtime
**What this PR does / why we need it**:
We have moved CRI from api/v1alpha1/runtime to apis/cri/v1alpha1, which changed the package name of CRI. This would cause a significant problem: old-versioned runtime (based on CRI in v1.6) doesn't work with latest kubelet v1.7, and vice versa.
This PR renames cri package name to `pkg/kubelet/apis/cri/v1alpha1/runtime` for fixing the problem.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
fixes#47012
**Special notes for your reviewer**:
Should be included in v1.7.
**Release note**:
```release-note
CRI has been moved to package `pkg/kubelet/apis/cri/v1alpha1/runtime`.
```
Automatic merge from submit-queue
Move the nvidia installer to the beginning.
When the installer runs for the first time, it disables loadpin and restarts
the node. So, it is better to run it in the beginning so that we can avoid
redoing the later steps. One of the later steps include downloading a tar file
and untarring it. Doing that only once saves around 1m30s in test runtime for
the gci image.
/sig node
/area node-e2e
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 46885, 47197)
Let COS docker validation node test against gci-next-canary
**What this PR does / why we need it**:
This is for COS docker validation node test. We plan to use family gci-next-canary in container-vm-image-staging for future Docker upgration and validation.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#47134
**Special notes for your reviewer**:
**Release note**:
```release-note
```
cAdvisor currently binds to all interfaces. Currently the only
solution is to use iptables to block access to the port. We
are better off making cAdvisor to bind to the interface that
kubelet uses for better security.
Fixes#11710
When the installer runs for the first time, it disables loadpin and restarts
the node. So, it is better to run it in the beginning so that we can avoid
redoing the later steps. One of the later steps include downloading a tar file
and untarring it. Doing that only once saves around 1m30s in test runtime for
the gci image.
Automatic merge from submit-queue
Bump up npd version to v0.4.0
Fixes#47070.
Bump up npd version to [v0.4.0](https://github.com/kubernetes/node-problem-detector/releases/tag/v0.4.0).
```release-note
Bump up Node Problem Detector version to v0.4.0, which added support of parsing log from /dev/kmsg and ABRT.
```
/cc @dchen1107 @ajitak
Automatic merge from submit-queue (batch tested with PRs 43005, 46660, 46385, 46991, 47103)
add e2e node test for Pod hostAliases feature
**What this PR does / why we need it**: adds node e2e test for #45148
tests requested in https://github.com/kubernetes/kubernetes/issues/43632#issuecomment-298434125
**Release note**:
```release-note
NONE
```
@yujuhong @thockin
Automatic merge from submit-queue (batch tested with PRs 46718, 46828, 46988)
Update docs/ links to point to main site
**What this PR does / why we need it**:
This updates various links to either point to kubernetes.io or to the kubernetes/community repo instead of the legacy docs/ tree in k/k
Pre-requisite for #46813
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
@kubernetes/sig-docs-maintainers @chenopis @ahmetb @thockin
Automatic merge from submit-queue (batch tested with PRs 46897, 46899, 46864, 46854, 46875)
Wait for cloud-init to finish before starting tests.
This fixes#46889.
**Release note**:
```release-note
NONE
```
This PR adds two features:
1. add support for isolating the emptyDir volume use. If user
sets a size limit for emptyDir volume, kubelet's eviction manager
monitors its usage
and evict the pod if the usage exceeds the limit.
2. add support for isolating the local storage for container overlay. If
the container's overly usage exceeds the limit defined in container
spec, eviction manager will evict the pod.
Automatic merge from submit-queue
Add local storage (scratch space) allocatable support
This PR adds the support for allocatable local storage (scratch space).
This feature is only for root file system which is shared by kubernetes
componenets, users' containers and/or images. User could use
--kube-reserved flag to reserve the storage for kube system components.
If the allocatable storage for user's pods is used up, some pods will be
evicted to free the storage resource.
This feature is part of local storage capacity isolation and described in the proposal https://github.com/kubernetes/community/pull/306
**Release note**:
```release-note
This feature exposes local storage capacity for the primary partitions, and supports & enforces storage reservation in Node Allocatable
```
Automatic merge from submit-queue
[Flaky PR Test] Fix summary test
fixes issue: #46797
As we can see in the [example failure build log](https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-node-kubelet/4319/build-log.txt), the summary containers are pinging google 100s of times a second. This causes the summary container to be killed occasionally, and fail the test. The summary containers are only supposed to ping every 10 seconds according to the current test. As it turns out, we were missing a semicolon, and were not sleeping between pings. For background, we ping google to generate network traffic, so that the summary test can validate network metrics.
This PR adds the semicolon to make the container sleep between calls, and decreases the sleep time from 10 seconds to 1 second, as 1 call / 10 seconds did not produce enough activity.
cc @kubernetes/kubernetes-build-cops @dchen1107
Automatic merge from submit-queue (batch tested with PRs 46648, 46500, 46238, 46668, 46557)
Support validating package versions in node conformance test
**What this PR does / why we need it**:
This PR adds a package validator in node conformance test for checking whether the locally installed packages meet the image spec.
**Special notes for your reviewer**:
The image spec for GKE (which has the package spec) will be in a separate PR. Then we will publish a new node conformance test image for GKE whose name should use the convention in https://github.com/kubernetes/kubernetes/issues/45760 and have `gke` in it.
**Release note**:
```
NONE
```
This PR adds the check for local storage request when admitting pods. If
the local storage request exceeds the available resource, pod will be
rejected.
This PR adds the support for allocatable local storage (scratch space).
This feature is only for root file system which is shared by kubernetes
componenets, users' containers and/or images. User could use
--kube-reserved flag to reserve the storage for kube system components.
If the allocatable storage for user's pods is used up, some pods will be
evicted to free the storage resource.
Automatic merge from submit-queue (batch tested with PRs 46429, 46308, 46395, 45867, 45492)
Summary Test looks at pods that have containers that restart.
Occasionally, the node can report extra containers that had been restarted through the summary API.
This test change tests a pod that restarts, and hopefully should allow us to reproduce and debug this behavior.
/assign @dchen1107
/release-note-none
Automatic merge from submit-queue (batch tested with PRs 46299, 46309, 46311, 46303, 46150)
Dont attach a GPU to ubuntu test machines for node e2e serial tests
This should fix flakes in the e2e_node serial suite.
@vishh I think this is what you were asking for...
/assign @vishh
Automatic merge from submit-queue (batch tested with PRs 46299, 46309, 46311, 46303, 46150)
Move docker validation test to separate project.
Docker validation test is leaking VMs because new docker version `DOCKER_VERSION=17.05.0-c` totally breaks the new gci image `GCE_IMAGES=gci-test-60-9579-0-0` with the `gci-docker-version` metadata specified.
The test successfully created the instance, but timed out when checking VM aliveness, and leaked the VM.
I've cleaned up all leaked VMs. This PR moves docker validation node e2e test into a separate project to not influencing other node e2e test.
@kewu1992 We should fix the docker automated validation test.
/cc @dchen1107 @yujuhong @abgworrall
Packaged the script as a docker container stored in gcr.io/google-containers
A daemonset deployment is included to make it easy to consume the installer
A cluster e2e has been added to test the installation daemonset along with verifying installation
by using a sample CUDA application.
Node e2e for GPUs updated to avoid running on nodes without GPU devices.
Signed-off-by: Vishnu kannan <vishnuk@google.com>
Automatic merge from submit-queue (batch tested with PRs 46033, 46122, 46053, 46018, 45981)
Log age of stats used for evictions during eviction tests
I recently added prometheus metrics for the age of the metrics used for evictions #43031. It would be nice to surface these during eviction tests, so I can better assess how old stats are, and whether or not the age of stats causes extra evictions.
This isnt super-high priority, and can be done after code-freeze, since it is a testing improvement. Feel free to take a look whenever either of you has time.
/assign @mtaufen
/assign @Random-Liu
Automatic merge from submit-queue
Add node e2e tests for hostNetwork
**What this PR does / why we need it**:
Add node e2e tests for hostNetwork.
**Which issue this PR fixes**
Part of #44118.
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
/assign @Random-Liu @yujuhong
Automatic merge from submit-queue (batch tested with PRs 44520, 45253, 45838, 44685, 45901)
Ensure ordering of using dynamic kubelet config and setting up tests.
This PR simply places the body of the eviction test within its own context. This ensures that the kubelet config is set before the pods are created, and that the kubelet config is reverted only after the pods are deleted.
Automatic merge from submit-queue
Reorganize kubelet tree so apis can be independently versioned
@yujuhong @lavalamp @thockin @bgrant0607
This is an example of how we might reorganize `pkg/kubelet` so the apis it exposes can be independently versioned. This would also provide a logical place to put the `KubeletConfiguration` type, which currently lives in `pkg/apis/componentconfig`; it could live in e.g. `pkg/kubelet/apis/config` instead.
Take a look when you have a chance and let me know what you think. The most significant change in this PR is reorganizing `pkg/kubelet/api` to `pkg/kubelet/apis`, the rest is pretty much updating import paths and `BUILD` files.
Automatic merge from submit-queue
Fix flag formatting errors in the node tests
There were three problems:
- Lack of a trailing space after prepending flags.
- Passing multiple flags in a string to --kubelet-flags seems to confuse
the flag parser; it stops parsing ALL flags as soon as it sees the
second kubelet flag. Fortunately, all instances of --kubelet-flags are
combined together, so we can just pass two of those.
- --feature-gates should be passed to the test framework, which then
forwards it to the kubelet, instead of using --kubelet-flags.
This hopefully fixes the dynamic config test failures on COS, which
started after #45602. (See: https://k8s-testgrid.appspot.com/google-node#kubelet-serial-gce-e2e)
Automatic merge from submit-queue
Add properties file for cos-docker-validation test job
**What this PR does / why we need it**:
This is forked from test/e2e_node/jenkins/docker_validation/jenkins-validation.properties. It is used for COS docker validation test.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```NONE
```
There were three problems:
- Lack of a trailing space after prepending flags.
- Passing multiple flags in a string to --kubelet-flags seems to confuse
the flag parser; it stops parsing ALL flags as soon as it sees the
second kubelet flag. Fortunately, all instances of --kubelet-flags are
combined together, so we can just pass two of those.
- --feature-gates should be passed to the test framework, which then
forwards it to the kubelet, instead of using --kubelet-flags.
This hopefully fixes the dynamic config test failures on COS, which
started after #45602.
Automatic merge from submit-queue (batch tested with PRs 45569, 45602, 45604, 45478, 45550)
Enable kernel memcg notification for node and cluster GCI/COS testing.
Sets --experimental-kernel-memcg-notification=true when running on the GCI/COS image. It sets this for master and nodes for cluster e2e tests, and for the node in node e2e tests.
Issue #42676
cc @dchen1107 @Random-Liu
Automatic merge from submit-queue (batch tested with PRs 45453, 45307, 44987)
Migrate the docker client code from dockertools to dockershim
Move docker client code from dockertools to dockershim/libdocker. This includes
DockerInterface (renamed to Interface), FakeDockerClient, etc.
This is part of #43234
Automatic merge from submit-queue (batch tested with PRs 45052, 44983, 41254)
Non-controversial part of #44523
For easier review of #44523, i extracted the non-controversial part out to this PR.
Automatic merge from submit-queue (batch tested with PRs 42740, 44980, 45039, 41627, 45044)
Cleanup some of the tarball producing code for e2e node tests
This is some e2e node cleanup work I found sitting in a local branch while deleting old local git branches. It looks like it's still useful.
Automatic merge from submit-queue (batch tested with PRs 43395, 44960)
e2e-node: refactor lifecycle test to avoid selinux issues
Fixes#42905
Previously, the exec hook tests mounted a HostPath volume from /tmp and touched a file as a indicator that the hook had run. This is prohibited by selinux policy on Fedora/RHEL/Centos.
This PR refactors the test to avoid filesystem indication and use the same indication that the HTTP hooks use; a GET to a http endpoint. The exec hooks run `curl` to hit this endpoint and trigger the indication. This simplifies this test quite a bit as well, removing over 85 lines of code.
REVIEWER NOTE: The diff is a mess on this one. Probably better to just review the new version of the file.
@derekwaynecarr
Automatic merge from submit-queue (batch tested with PRs 43000, 44500, 44457, 44553, 44267)
Updates e2e_node test to allow both kubenet and cni to be specified f…
…or the network plugin.
This adds a simple CNI configuration which is added to the node during test setup.
This also modifies the default flags in services/kubelet.go to specify the "cni-bin-dir"
and the "cni-conf-dir" and removes the "network-plugin-dir" flag. This leaves the default
network plugin to kubenet.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
This adds a simple CNI configuration which is added to the node during test setup.
This also modifies the default flags in services/kubelet.go to specify the "cni-bin-dir"
and the "cni-conf-dir" and removes the "network-plugin-dir" flag. This leaves the default
network plugin to kubenet.
Automatic merge from submit-queue
test/e2e*: add/update README.md files
**What this PR does / why we need it**:
This PR is adding `README.md` files with a link to the documentation to all E2E tests.
Automatic merge from submit-queue
node e2e: improve the validate OOM score test for infra containers
The test blindly checked all "pause" processes on the node, assuming
they were all infra containers. This change takes a snapshot of all
existing "pause" processes on the node, and exclude them in the
validation. The test still relies on the fact that it runs exclusively
on the node. If that assumption changes, we will need other methods to
locate the PIDs of the infra containers.
This fixes#37580
Automatic merge from submit-queue
test/e2e_node: prepull images with CRI
Part of https://github.com/kubernetes/kubernetes/issues/40739
- This PR builds on top of #40525 (and contains one commit from #40525)
- The second commit contains a tiny change in the `Makefile`.
- Third commit is a patch to be able to prepull images using the CRI (as opposed to run `docker` to pull images which doesn't make sense if you're using CRI most of the times)
Marked WIP till #40525 makes its way into master
@Random-Liu @lucab @yujuhong @mrunalp @rhatdan
Automatic merge from submit-queue
tests: e2e-node: refactor node-problem-detector test to avoid selinux…
Fixes https://github.com/kubernetes/kubernetes/issues/43401
This test creates a file in /tmp on the host and creates a HostPath volume in the container to so that the host can inject messages into the logfile being read by the node problem detector in the container. However, selinux prohibits the container from reading files out of /tmp on the host.
This PR modifies the test to create the log file in an EmptyDir volume instead, which will be properly labeled for container access.
@derekwaynecarr
Automatic merge from submit-queue (batch tested with PRs 43378, 43216, 43384, 43083, 43428)
Darwin won't build: syscall.Sysinfo issue.
**What this PR does / why we need it**: On darwin had problems building and testing because of syscall.Sysinfo_t etc which is a linux specific command.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
**Special notes for your reviewer**: Definitely would like another set of eyes on the bootTime function, it will have to be inaccurate but open to suggestions about improving this for darwin.
**Release note**:
```
NONE
```
Automatic merge from submit-queue (batch tested with PRs 42237, 42297, 42279, 42436, 42551)
should replace errors.New(fmt.Sprintf(...)) with fmt.Errorf(...)
Signed-off-by: yupengzte <yu.peng36@zte.com.cn>
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
bazel update
added new files to reflect that only one method has changed between arch types.
forgot to add changes to a commit.
changes made and gfmt run.
changed node_problem_detector to node_problem_detector_linux and made it linux only.
updated bazel
The test blindly checked all "pause" processes on the node, assuming
they were all infra containers. This change takes a snapshot of all
existing "pause" processes on the node, and exclude them in the
validation. The test still relies on the fact that it runs exclusively
on the node. If that assumption changes, we will need other methods to
locate the PIDs of the infra containers.
Automatic merge from submit-queue
Update npd to the official v0.3.0 release.
Update npd to the official release v0.3.0.
This also fixes a npd bug https://github.com/kubernetes/node-problem-detector/pull/98.
@dchen1107 @kubernetes/node-problem-detector-reviewers
Automatic merge from submit-queue (batch tested with PRs 42734, 42745, 42758, 42814, 42694)
Create DefaultPodDeletionTimeout for e2e tests
In our e2e and e2e_node tests, we had a number of different timeouts for deletion.
Recent changes to the way deletion works (#41644, #41456) have resulted in some timeouts in e2e tests. #42661 was the most recent fix for this.
Most of these tests are not meant to test pod deletion latency, but rather just to clean up pods after a test is finished.
For this reason, we should change all these tests to use a standard, fairly high timeout for deletion.
cc @vishh @Random-Liu
Automatic merge from submit-queue (batch tested with PRs 42762, 42739, 42425, 42778)
kubeadm: update docker version for CE and EE
**What this PR does / why we need it**: Update regex for docker version to also capture new CE and EE versions.
**Which issue this PR fixes**: fixes #https://github.com/kubernetes/kubeadm/issues/189
**Special notes for your reviewer**: /cc @jbeda @luxas
**Release note**:
```release-note
NONE
```
This change allows validators to pass warnings as well as errors. This
was needed because of how support for docker 1.13+ and the new EE and CE
versions is currently being handled.
Automatic merge from submit-queue
New e2e node test suite with memcg turned on
The flag --experimental-kernal-memcg-notification was initially added to allow disabling an eviction feature which used memcg notifications to make memory evictions more reactive.
As documented in #37853, memcg notifications increased the likelihood of encountering soft lockups, especially on CVM.
This feature would valuable to turn on, at least for GCI, since soft lockup issues were less prevalent on GCI and appeared (at the time) to be unrelated to memcg notifications.
In the interest of caution, I would like to monitor serial tests on GCI with --experimental-kernal-memcg-notification=true.
cc @vishh @Random-Liu @dchen1107 @kubernetes/sig-node-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 42664, 42687)
[Fix Flaky Tests] E2e Node Flaky test suite runs serially
The [e2e Node Flaky Test Suite](https://k8s-testgrid.appspot.com/google-node#kubelet-flaky-gce-e2e&width=20) has been failing with strange errors.
This is because the tests in that suite are meant to be run serially, but are running in parallel, since that was left out of the config. This PR fixes this by changing the Flaky test suite to serial
cc @Random-Liu
Automatic merge from submit-queue (batch tested with PRs 42456, 42457, 42414, 42480, 42370)
node e2e: apparmor test should fail instead of panicking
This doesn't fix#42420, but at least stop the test from panicking.