Automatic merge from submit-queue
turn on dynamic config for flaky tests
Added dynamic config to inode eviction node e2e tests in #39546, but did not enable it for flaky tests. This PR enables this feature for the flaky test suite
Automatic merge from submit-queue
Build release tars using bazel
**What this PR does / why we need it**: builds equivalents of the various kubernetes release tarballs, solely using bazel.
For example, you can now do
```console
$ make bazel-release
$ hack/e2e.go -v -up -test -down
```
**Special notes for your reviewer**: this is currently dependent on 3b29803eb5, which I have yet to turn into a pull request, since I'm still trying to figure out if this is the best approach.
Basically, the issue comes up with the way we generate the various server docker image tarfiles and load them on nodes:
* we `md5sum` the binary being encapsulated (e.g. kube-proxy) and save that to `$binary.docker_tag` in the server tarball
* we then build the docker image and tag using that md5sum (e.g. `gcr.io/google_containers/kube-proxy:$MD5SUM`)
* we `docker save` this image, which embeds the full tag in the `$binary.tar` file.
* on cluster startup, we `docker load` these tarballs, which are loaded with the tag that we'd created at build time. the nodes then use the `$binary.docker_tag` file to find the right image.
With the current bazel `docker_build` rule, the tag isn't saved in the docker image tar, so the node is unable to find the image after `docker load`ing it.
My changes to the rule save the tag in the docker image tar, though I don't know if there are subtle issues with it. (Maybe we want to only tag when `--stamp` is given?)
Also, the docker images produced by bazel have the timestamp set to the unix epoch, which is not great for debugging. Might be another thing to change with a `--stamp`.
Long story short, we probably need to follow up with bazel folks on the best way to solve this problem.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 39807, 37505, 39844, 39525, 39109)
Made cache.Controller to be interface.
**What this PR does / why we need it**:
#37504
Automatic merge from submit-queue
replace global registry in apimachinery with global registry in k8s.io/kubernetes
We'd like to remove all globals, but our immediate problem is that a shared registry between k8s.io/kubernetes and k8s.io/client-go doesn't work. Since client-go makes a copy, we can actually keep a global registry with other globals in pkg/api for now.
@kubernetes/sig-api-machinery-misc @lavalamp @smarterclayton @sttts
Automatic merge from submit-queue (batch tested with PRs 39684, 39577, 38989, 39534, 39702)
Set PodStatus QOSClass field
This PR continues the work for https://github.com/kubernetes/kubernetes/pull/37968
It converts all local usage of the `qos` package class types to the new API level types (first commit) and sets the pod status QOSClass field in the at pod creation time on the API server in `PrepareForCreate` and in the kubelet in the pod status update path (second commit). This way the pod QOS class is set even if the pod isn't scheduled yet.
Fixes#33255
@ConnorDoyle @derekwaynecarr @vishh
Automatic merge from submit-queue (batch tested with PRs 39695, 37054, 39627, 39546, 39615)
Use Dynamic Config in e2e_node inode eviction test
Alternative solution to #39249. Similar to solution proposed by @vishh in #36828.
@Random-Liu @mtaufen
Automatic merge from submit-queue (batch tested with PRs 36229, 39450)
Bump etcd to 3.0.14 and switch to v3 API in etcd.
Ref #20504
**Release note**:
```release-note
Switch default etcd version to 3.0.14.
Switch default storage backend flag in apiserver to `etcd3` mode.
```
Automatic merge from submit-queue
Node E2E: Set user with `--ssh-user` flag when running remote node e2e.
This PR unblocks https://github.com/kubernetes/test-infra/issues/1348.
In our test environment, we must login test instance as user `jenkins` because of the service account. Node e2e is always using the default user on the host, which works fine till now, because it is always run as `jenkins` in our test environment.
However, now we moved the test runner into a docker container, inside the container user is `root` by default, which will cause error:
```
Permission denied (publickey)
```
This PR added a flag `--ssh-user` to explicitly specify the user used to ssh into test instance. The dockerized test runner can set user to `jenkins` with this flag.
@krzyzacy @ixdy
Automatic merge from submit-queue (batch tested with PRs 38788, 38821, 38829)
Node Conformance Test: Fix node conformance test.
The test suite could build on my desktop. However it is failing on jenkins.
https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-node-kubelet-conformance/1
It turns out that `docker save $IMAGE -o $FILE` only works for docker 1.12. (My desktop is 1.12) For older version docker, we should use `docker save -o $FILE $IMAGE instead`. (Jenkins is using 1.9.1)
@timstclair Could you help me review this short PR? :)
Automatic merge from submit-queue (batch tested with PRs 38154, 38502)
Rename "release_1_5" clientset to just "clientset"
We used to keep multiple releases in the main repo. Now that [client-go](https://github.com/kubernetes/client-go) does the versioning, there is no need to keep releases in the main repo. This PR renames the "release_1_5" clientset to just "clientset", clientset development will be done in this directory.
@kubernetes/sig-api-machinery @deads2k
```release-note
The main repository does not keep multiple releases of clientsets anymore. Please find previous releases at https://github.com/kubernetes/client-go
```
Automatic merge from submit-queue
Inode Eviction Test is Flaky
This Pull Request:
Marks the InodeEviciton test as flaky
Increases the timeout for disk pressure because coreos has nearly 2 million inodes.
Decreases the status polling interval so we can see eviction ordering better.
@Random-Liu
Automatic merge from submit-queue
Fix useless uuid in container log path node e2e
@timstclair pointed out there're nits in original PR, ref: https://github.com/kubernetes/kubernetes/pull/34877
So this patch:
1. removed useless uuid
2. change all those strings to const
Thanks. 🐱
Automatic merge from submit-queue
Build vendored copy of go-bindata and use that in go generate step
**What this PR does / why we need it**: as the title says, uses the vendored version of `go-bindata` rather than expecting developers to `go get` it (when building outside docker).
**Which issue this PR fixes**: fixes#34067, partially addresses #36655
**Special notes for your reviewer**: we still call `go generate` far too many times:
```console
~/.../src/k8s.io/kubernetes $ which go-bindata
~/.../src/k8s.io/kubernetes $ make
+++ [1116 17:35:28] Building the toolchain targets:
k8s.io/kubernetes/hack/cmd/teststale
k8s.io/kubernetes/vendor/github.com/jteeuwen/go-bindata/go-bindata
+++ [1116 17:35:29] Generating bindata:
test/e2e/framework/gobindata_util.go
+++ [1116 17:35:30] Building go targets for linux/amd64:
cmd/libs/go2idl/deepcopy-gen
+++ [1116 17:35:35] Building the toolchain targets:
k8s.io/kubernetes/hack/cmd/teststale
k8s.io/kubernetes/vendor/github.com/jteeuwen/go-bindata/go-bindata
+++ [1116 17:35:35] Generating bindata:
test/e2e/framework/gobindata_util.go
+++ [1116 17:35:36] Building go targets for linux/amd64:
cmd/libs/go2idl/defaulter-gen
+++ [1116 17:35:41] Building the toolchain targets:
k8s.io/kubernetes/hack/cmd/teststale
k8s.io/kubernetes/vendor/github.com/jteeuwen/go-bindata/go-bindata
+++ [1116 17:35:41] Generating bindata:
test/e2e/framework/gobindata_util.go
+++ [1116 17:35:42] Building go targets for linux/amd64:
cmd/libs/go2idl/conversion-gen
+++ [1116 17:35:47] Building the toolchain targets:
k8s.io/kubernetes/hack/cmd/teststale
k8s.io/kubernetes/vendor/github.com/jteeuwen/go-bindata/go-bindata
+++ [1116 17:35:47] Generating bindata:
test/e2e/framework/gobindata_util.go
+++ [1116 17:35:48] Building go targets for linux/amd64:
cmd/libs/go2idl/openapi-gen
+++ [1116 17:35:56] Building the toolchain targets:
k8s.io/kubernetes/hack/cmd/teststale
k8s.io/kubernetes/vendor/github.com/jteeuwen/go-bindata/go-bindata
+++ [1116 17:35:56] Generating bindata:
test/e2e/framework/gobindata_util.go
```
Fixing that is a separate effort, though.
cc @sebgoa @ZhangBanger
Automatic merge from submit-queue
Add the system verification test to the kubeadm preflight checks
And refactor the system verification test to accept to write to a specific writer in order to customize the output
This PR is targeting v1.5, PTAL
cc @Random-Liu @dchen1107 @kubernetes/sig-cluster-lifecycle
Automatic merge from submit-queue
move parts of the mega generic run struct out
This splits the main `ServerRunOptions` into composeable pieces that are bindable separately and adds easy paths for composing servers to run delegating authentication and authorization.
@sttts @ncdc alright, I think this is as far as I need to go to make the composing servers reasonable to write. I'll try leaving it here
Automatic merge from submit-queue
Node Conformance Test: Final cleanup for node conformance test.
This PR fits node conformance test with recent change.
* Remove `--manifest-path` because the test will get kubelet configuration through `/configz` now. https://github.com/kubernetes/kubernetes/pull/36919
* Add `$TEST_ARGS` so that we can override arguments inside the container.
* Fix a bug in garbage_collector_test.go which will cause the framework tries to connect docker no matter running the test or not. @dashpole
* Add `${REGISTRY}/node-test:${VERSION}` for convenience.
* Bump up the image version to `0.2`. (the one released with v1.4 is `v0.1`)
I've run the test both with `run_test.sh` script and directly `docker run`. Both of them passed.
After this gets merged, I'll build and push the new test image.
@dchen1107
/cc @kubernetes/sig-node
Automatic merge from submit-queue
Modify GCI mounter to enable NFSv3
In order to make NFSv3 work, mounter needs to start rpcbind daemon. This
change modify mounter's Dockerfile and mounter script to start the
rpcbind daemon if it is not running on the host.
After this change, need to make push the image and update the sha number in Changelog.
Automatic merge from submit-queue
Use netexec container in http lifecycle hook test.
Fixes https://github.com/kubernetes/kubernetes/issues/33636.
The original test is using `"echo -e \"HTTP/1.1 200 OK\n\" | nc -l -p 1234` as a simple http server.
However, it seems that this is not very reliable, which may response before golang thinks it should.
So we get the error:
```
I1106 06:14:13.325397 2096 logs.go:41] Unsolicited response received on idle HTTP channel starting with "HTTP/1.1 200 OK\n\n"; err=<nil>
```
This PR changes the test to use the `netexec` container which is a simple http server written by golang and used in many of our networking e2e test. It should be more reliable.
Mark 1.5 since this is fixing a 1.5 release blocking issue. Mark P0 to match the original issue.
@dchen1107
In order to make NFSv3 work, mounter needs to start rpcbind daemon. This
change modify mounter's Dockerfile and mounter script to start the
rpcbind daemon if it is not running on the host.
After this change, need to make push the image and update the sha number in Changelog.
Automatic merge from submit-queue
fix leaking memory backed volumes of terminated pods
Currently, we allow volumes to remain mounted on the node, even though the pod is terminated. This creates a vector for a malicious user to exhaust memory on the node by creating memory backed volumes containing large files.
This PR removes memory backed volumes (emptyDir w/ medium Memory, secrets, configmaps) of terminated pods from the node.
@saad-ali @derekwaynecarr
Automatic merge from submit-queue
Node E2E: Avoid printing test result twice.
This is a problem since long time ago.
`RunSshCommand` includes the command output to the error. If the command running the test fails, the test output will also be included in the error. [The runner prints both the test output and the error](https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/runner/remote/run_remote.go#L270), which leads the test result to be printed twice. (See the [test result](https://storage.googleapis.com/kubernetes-jenkins/logs/kubelet-gce-e2e-ci/10968/build-log.txt) on node tmp-node-e2e-af900a4d-e2e-node-ubuntu-trusty-docker9-v1-image)
This PR changes `RunSshCommand` not to put command output into the error, and leave the caller to decide how to deal with command output when the command fails.
Automatic merge from submit-queue
Garbage collection tests the MaxPerPodContainers and MaxContainers constraints
This is the first version of this test. It tests that containers are garbage collected according to the default configuration.
Automatic merge from submit-queue
Node Conformance & E2E: Get node name from node object.
This PR changes the node e2e test framework to get node name from apiserver instead of test flags.
When a user tried out the node conformance test, he found that node conformance test will not work properly if kubelet is started with `hostname-override`.
The reason is that node conformance test is using [the default node name - `os.Hostname`](https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/e2e_node_suite_test.go#L124), which may be different from `hostname-override`. This will cause test pods not scheduled, and eventually test timeout.
We can expose a flag from node conformance test, and let user set node name themselves if they are using `hostname-override` on kubelet. However, let the framework automatically detect it from apiserver is more user friendly.
/cc @kubernetes/sig-node
This PR 1) only changes node e2e test framework; 2) fixes a problem in node conformance test which is a 1.5 feature. @saad-ali Can we have this in 1.5?
Automatic merge from submit-queue
Add e2e node test for log path
fixes#34661
A node e2e test to check if container logs files are properly created with right content.
Since the log files under `/var/log/containers` are actually symbolic of docker containers log files, we can not use a pod to mount them in and do check (symbolic doesn't supported by docker volume).
cc @Random-Liu
Automatic merge from submit-queue
Make GCI nodes mount non tmpfs, ext* & bind mounts using an external mounter
This PR downloads the stage1 & gci-mounter ACIs as part of cluster bring up instead of downloading them dynamically from gcr.io, which was the cause for #36206.
I have also optimized the containerized mounter to pre-load the mounter image once to avoid fetch latency while using it.
Original PR which got reverted: https://github.com/kubernetes/kubernetes/pull/35821
```release-note
GCI nodes use an external mounter script to mount NFS & GlusterFS storage volumes
```
@mtaufen Node e2e is not re-enabled in this PR.
cc @jingxu97
Automatic merge from submit-queue
Node Conformance Test: Containerize the node e2e test
For #30122, #30174.
Based on #32427, #32454.
**Please only review the last 3 commits.**
This PR packages the node e2e test into a docker image:
- 1st commit: Add `NodeConformance` flag in the node e2e framework to avoid starting kubelet and collecting system logs. We do this because:
- There are all kinds of ways to manage kubelet and system logs, for different situation we need to mount different things into the container, run different commands. It is hard and unnecessary to handle the complexity inside the test suite.
- 2nd commit: Remove all `sudo` in the test container. We do this because:
- In most container, there is no `sudo` command, and there is no need to use `sudo` inside the container.
- It introduces some complexity to use `sudo` inside the test. (https://github.com/kubernetes/kubernetes/issues/29211, https://github.com/kubernetes/kubernetes/issues/26748) In fact we just need to run the test suite with `sudo`.
- 3rd commit: Package the test into a docker container with corresponding `Makefile` and `Dockerfile`. We also added a `run_test.sh` script to start kubelet and run the test container. The script is only for demonstration purpose and we'll also use the script in our node e2e framework. In the future, we should update the script to start kubelet in production way (maybe with `systemd` or `supervisord`).
@dchen1107 @vishh
/cc @kubernetes/sig-node @kubernetes/sig-testing
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
``` release-note
Release alpha version node test container gcr.io/google_containers/node-test-ARCH:0.1 for users to verify their node setup.
```
Automatic merge from submit-queue
Rename experimental-runtime-integration-type to experimental-cri
Also rename the field in the component config to `EnableCRI`
Automatic merge from submit-queue
Node Conformance Test: Add system verification
For #30122 and #29081.
This PR introduces system verification test in node e2e and conformance test. It will run before the real test. Once the system verification fails, the test will just fail. The output of the system verification is like this:
```
I0909 23:33:20.622122 2717 validators.go:45] Validating os...
OS: Linux
I0909 23:33:20.623274 2717 validators.go:45] Validating kernel...
I0909 23:33:20.624037 2717 kernel_validator.go:79] Validating kernel version
KERNEL_VERSION: 3.16.0-4-amd64
I0909 23:33:20.624146 2717 kernel_validator.go:93] Validating kernel config
CONFIG_NAMESPACES: enabled
CONFIG_NET_NS: enabled
CONFIG_PID_NS: enabled
CONFIG_IPC_NS: enabled
CONFIG_UTS_NS: enabled
CONFIG_CGROUPS: enabled
CONFIG_CGROUP_CPUACCT: enabled
CONFIG_CGROUP_DEVICE: enabled
CONFIG_CGROUP_FREEZER: enabled
CONFIG_CGROUP_SCHED: enabled
CONFIG_CPUSETS: enabled
CONFIG_MEMCG: enabled
I0909 23:33:20.679328 2717 validators.go:45] Validating cgroups...
CGROUPS_CPU: enabled
CGROUPS_CPUACCT: enabled
CGROUPS_CPUSET: enabled
CGROUPS_DEVICES: enabled
CGROUPS_FREEZER: enabled
CGROUPS_MEMORY: enabled
I0909 23:33:20.679454 2717 validators.go:45] Validating docker...
DOCKER_GRAPH_DRIVER: aufs
```
It verifies the system following a predefined `SysSpec`:
``` go
// DefaultSysSpec is the default SysSpec.
var DefaultSysSpec = SysSpec{
OS: "Linux",
KernelVersion: []string{`3\.[1-9][0-9].*`, `4\..*`}, // Requires 3.10+ or 4+
// TODO(random-liu): Add more config
KernelConfig: KernelConfig{
Required: []string{
"NAMESPACES", "NET_NS", "PID_NS", "IPC_NS", "UTS_NS",
"CGROUPS", "CGROUP_CPUACCT", "CGROUP_DEVICE", "CGROUP_FREEZER",
"CGROUP_SCHED", "CPUSETS", "MEMCG",
},
Forbidden: []string{},
},
Cgroups: []string{"cpu", "cpuacct", "cpuset", "devices", "freezer", "memory"},
RuntimeSpec: RuntimeSpec{
DockerSpec: &DockerSpec{
Version: []string{`1\.(9|\d{2,})\..*`}, // Requires 1.9+
GraphDriver: []string{"aufs", "overlay", "devicemapper"},
},
},
}
```
Currently, it only supports:
- Kernel validation: version validation and kernel configuration validation
- Cgroup validation: validating whether required cgroups subsystems are enabled.
- Runtime Validation: currently, only validates docker graph driver.
The validating framework is ready. The specific validation items could be added over time.
@dchen1107
/cc @kubernetes/sig-node
Automatic merge from submit-queue
Per Volume Inode Accounting
Collects volume inode stats using the same find command as cadvisor. The command is "find _path_ -xdev -printf '.' | wc -c". The output is passed to the summary api, and will be consumed by the eviction manager.
This cannot be merged yet, as it depends on changes adding the InodesUsed field to the summary api, and the eviction manager consuming this. Expect tests to fail until this happens.
DEPENDS ON #35137
Automatic merge from submit-queue
[AppArmor] Hold bad AppArmor pods in pending rather than rejecting
Fixes https://github.com/kubernetes/kubernetes/issues/32837
Overview of the fix:
If the Kubelet needs to reject a Pod for a reason that the control plane doesn't understand (e.g. which AppArmor profiles are installed on the node), then it might contiinuously try to run the pod on the same rejecting node. This change adds a concept of "soft rejection", in which the Pod is admitted, but not allowed to run (and therefore held in a pending state). This prevents the pod from being retried on other nodes, but also prevents the high churn. This is consistent with how other missing local resources (e.g. volumes) is handled.
A side effect of the change is that Pods which are not initially runnable will be retried. This is desired behavior since it avoids a race condition when a new node is brought up but the AppArmor profiles have not yet been loaded on it.
``` release-note
Pods with invalid AppArmor configurations will be held in a Pending state, rather than rejected (failed). Check the pod status message to find out why it is not running.
```
@kubernetes/sig-node @timothysc @rrati @davidopp