Commit Graph

55 Commits (1e77594958e346e26e4d29fa0d018729fdce1d86)

Author SHA1 Message Date
David Ashpole 0bd0d705e3 log age of stats used for evictions during eviction tests 2017-05-18 13:51:23 -07:00
Michael Taufen cbad320205 Reorganize kubelet tree so apis can be independently versioned 2017-05-12 10:02:33 -07:00
Yu-Ju Hong cf3635c876 Update bazel BUID files 2017-05-05 11:48:08 -07:00
Chao Xu 958903509c bazel 2017-04-27 09:41:53 -07:00
Chao Xu 4f9591b1de move pkg/api/v1/ref.go and pkg/api/v1/resource.go to subpackages. move some functions in resource.go to pkg/api/v1/node and pkg/api/v1/pod 2017-04-17 11:38:11 -07:00
Mike Danese a05c3c0efd autogenerated 2017-04-14 10:40:57 -07:00
Pengfei Ni a696e86bb0 Add node e2e tests for hostPid 2017-04-07 13:41:58 +08:00
Kubernetes Submit Queue 0a1385178d Merge pull request #43248 from yujuhong/pause_proc
Automatic merge from submit-queue

node e2e: improve the validate OOM score test for infra containers

The test blindly checked all "pause" processes on the node, assuming
they were all infra containers. This change takes a snapshot of all
existing "pause" processes on the node, and exclude them in the
validation. The test still relies on the fact that it runs exclusively
on the node. If that assumption changes, we will need other methods to
locate the PIDs of the infra containers.

This fixes #37580
2017-04-03 20:20:53 -07:00
Kubernetes Submit Queue 25a87fa19c Merge pull request #40804 from runcom/prepull-cri
Automatic merge from submit-queue

test/e2e_node: prepull images with CRI

Part of https://github.com/kubernetes/kubernetes/issues/40739

- This PR builds on top of #40525 (and contains one commit from #40525)
- The second commit contains a tiny change in the `Makefile`.
- Third commit is a patch to be able to prepull images using the CRI (as opposed to run `docker` to pull images which doesn't make sense if you're using CRI most of the times)

Marked WIP till #40525 makes its way into master

@Random-Liu @lucab @yujuhong @mrunalp @rhatdan
2017-04-01 03:08:35 -07:00
Antonio Murdaca 2634f57f7f
test/e2e_node: prepull images with CRI
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2017-04-01 10:18:56 +02:00
Alejandro Escobar ee741f23af added a space
bazel update

added new files to reflect that only one method has changed between arch types.

forgot to add changes to a commit.

changes made and gfmt run.

changed node_problem_detector to node_problem_detector_linux and made it linux only.

updated bazel
2017-03-20 06:40:27 -07:00
Yu-Ju Hong 056e343e03 node e2e: improve the validate OOM score test for infra containers
The test blindly checked all "pause" processes on the node, assuming
they were all infra containers. This change takes a snapshot of all
existing "pause" processes on the node, and exclude them in the
validation. The test still relies on the fact that it runs exclusively
on the node. If that assumption changes, we will need other methods to
locate the PIDs of the infra containers.
2017-03-16 15:39:03 -07:00
Kubernetes Submit Queue 2d319bd406 Merge pull request #42204 from dashpole/allocatable_eviction
Automatic merge from submit-queue

Eviction Manager Enforces Allocatable Thresholds

This PR modifies the eviction manager to enforce node allocatable thresholds for memory as described in kubernetes/community#348.
This PR should be merged after #41234. 

cc @kubernetes/sig-node-pr-reviews @kubernetes/sig-node-feature-requests @vishh 

** Why is this a bug/regression**

Kubelet uses `oom_score_adj` to enforce QoS policies. But the `oom_score_adj` is based on overall memory requested, which means that a Burstable pod that requested a lot of memory can lead to OOM kills for Guaranteed pods, which violates QoS. Even worse, we have observed system daemons like kubelet or kube-proxy being killed by the OOM killer.
Without this PR, v1.6 will have node stability issues and regressions in an existing GA feature `out of Resource` handling.
2017-03-03 20:20:12 -08:00
Kubernetes Submit Queue 1d97472361 Merge pull request #41928 from Random-Liu/move-npd-test-to-node-e2e
Automatic merge from submit-queue (batch tested with PRs 41984, 41682, 41924, 41928)

Move node problem detector test into node e2e.

Move current NPD e2e test into node e2e.

In fact, current NPD e2e test is only a functionality test for NPD. It creates test NPD pod, sets test configuration, generates test logs and verifies test result.
It doesn't actually test the NPD really deployed in the cluster.

So it doesn't actually need to run in cluster e2e. Running it in node e2e will:
1) Make it easier to run the test.
2) Make it more light weight to introduce this as a pre/post submit test in NPD repo in the future.

Except this, I'm working on a cluster e2e to run some basic functionality test and benchmark test against the real NPD deployed in the cluster. Will send the PR later.

/cc @dchen1107 @kubernetes/node-problem-detector-reviewers
2017-03-02 10:51:18 -08:00
David Ashpole ac612eab8e eviction manager changes for allocatable 2017-03-02 07:36:24 -08:00
Vishnu kannan 318f4e102a adding an e2e for GPUs
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-02-28 13:42:08 -08:00
Vishnu Kannan cc5f5474d5 add support for node allocatable phase 2 to kubelet
Signed-off-by: Vishnu Kannan <vishnuk@google.com>
2017-02-27 21:24:44 -08:00
David Ashpole c58970e47c critical pods can preempt other pods to be admitted 2017-02-23 10:31:20 -08:00
Random-Liu 1c8e127973 Move node problem detector test into node e2e. 2017-02-22 14:35:46 -08:00
deads2k c9a008dff3 move util/intstr to apimachinery 2017-01-30 12:46:59 -05:00
Dr. Stefan Schimanski 44ea6b3f30 Update generated files 2017-01-29 21:41:45 +01:00
Dr. Stefan Schimanski a0137e9b28 Update generated files 2017-01-25 19:49:45 +01:00
deads2k b0b156b381 make tools/cache authoritative 2017-01-25 08:29:45 -05:00
Kubernetes Submit Queue b29d9cdbcf Merge pull request #39898 from ixdy/bazel-release-tars
Automatic merge from submit-queue

Build release tars using bazel

**What this PR does / why we need it**: builds equivalents of the various kubernetes release tarballs, solely using bazel.

For example, you can now do
```console
$ make bazel-release
$ hack/e2e.go -v -up -test -down
```

**Special notes for your reviewer**: this is currently dependent on 3b29803eb5, which I have yet to turn into a pull request, since I'm still trying to figure out if this is the best approach.

Basically, the issue comes up with the way we generate the various server docker image tarfiles and load them on nodes:
* we `md5sum` the binary being encapsulated (e.g. kube-proxy) and save that to `$binary.docker_tag` in the server tarball
* we then build the docker image and tag using that md5sum (e.g. `gcr.io/google_containers/kube-proxy:$MD5SUM`)
* we `docker save` this image, which embeds the full tag in the `$binary.tar` file.
* on cluster startup, we `docker load` these tarballs, which are loaded with the tag that we'd created at build time. the nodes then use the `$binary.docker_tag` file to find the right image.

With the current bazel `docker_build` rule, the tag isn't saved in the docker image tar, so the node is unable to find the image after `docker load`ing it.

My changes to the rule save the tag in the docker image tar, though I don't know if there are subtle issues with it. (Maybe we want to only tag when `--stamp` is given?)

Also, the docker images produced by bazel have the timestamp set to the unix epoch, which is not great for debugging. Might be another thing to change with a `--stamp`.

Long story short, we probably need to follow up with bazel folks on the best way to solve this problem.

**Release note**:

```release-note
NONE
```
2017-01-18 14:24:48 -08:00
deads2k 77b4d55982 mechanical 2017-01-16 09:35:12 -05:00
Jeff Grafton 14dd0d3bef Add genrule to produce e2e_node.test binary artifact 2017-01-13 14:46:26 -08:00
deads2k 6a4d5cd7cc start the apimachinery repo 2017-01-11 09:09:48 -05:00
Seth Jennings e2402b781b set qos class field in pod status 2017-01-10 16:31:52 -06:00
Jeff Grafton 20d221f75c Enable auto-generating sources rules 2017-01-05 14:14:13 -08:00
Mike Danese 161c391f44 autogenerated 2016-12-29 13:04:10 -08:00
Chao Xu 03d8820edc rename /release_1_5 to /clientset 2016-12-14 12:39:48 -08:00
Mike Danese c87de85347 autoupdate BUILD files 2016-12-12 13:30:07 -08:00
Clayton Coleman 5df8cc39c9
refactor: generated 2016-12-03 19:10:46 -05:00
Tim St. Clair bcf5e434fa
Verify misc container in summary test 2016-12-01 14:02:39 -08:00
David Ashpole a232c15a45 InodeEviction test tests that when some pods create many empty files, both in their container and in volumes, they are evicted before pods that act normally. 2016-11-28 13:09:40 -08:00
Clayton Coleman 35a6bfbcee
generated: refactor 2016-11-23 22:30:47 -06:00
Chao Xu bcc783c594 run hack/update-all.sh 2016-11-23 15:53:09 -08:00
Kubernetes Submit Queue eca9e989a3 Merge pull request #36779 from sjenning/fix-memory-leak-via-terminated-pods
Automatic merge from submit-queue

fix leaking memory backed volumes of terminated pods

Currently, we allow volumes to remain mounted on the node, even though the pod is terminated.  This creates a vector for a malicious user to exhaust memory on the node by creating memory backed volumes containing large files.

This PR removes memory backed volumes (emptyDir w/ medium Memory, secrets, configmaps) of terminated pods from the node.

@saad-ali @derekwaynecarr
2016-11-17 21:29:51 -08:00
Seth Jennings b80bea4a62 fix leaking memory backed volumes of terminated pods 2016-11-16 10:17:22 -06:00
David Ashpole f6224590f7 Test Container Garbage Collection 2016-11-15 09:15:31 -08:00
Harry Zhang fad1990eaa Fixe verify bazel
Remove rootfs and chroot in scripts
2016-11-08 13:01:28 -05:00
Kubernetes Submit Queue 356230f8a1 Merge pull request #36299 from Random-Liu/mark-more-conformance-test
Automatic merge from submit-queue

Node Conformance Test: Mark more conformance test

For https://github.com/kubernetes/kubernetes/issues/30122.

This PR:
1) Removes unused image test.
2) Marks more conformance tests based on https://docs.google.com/spreadsheets/d/1yib6ypfdWuq8Ikyo-rTcBGHe76Xur7tGqCKD9dkzx0Y/edit?usp=sharing.

Notice that 2 tests are not marked conformance for now:
1. **OOM score test:** The test is serial and is verifying host PID directly. The test should start a pod with PID=host and verify inside the pod. @vishh 
2. **Summary api test:** The assumption made in the test doesn't always make sense for arbitrary image, for example: The fs capacity bounds is only [(100mb, 100gb)](https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/summary_test.go#L62). @timstclair 
3. We should consider mark **[cgroup manager test](https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/cgroup_manager_test.go)** as conformance test. 

@dchen1107 @vishh @timstclair 
/cc @kubernetes/sig-node
2016-11-07 12:45:40 -08:00
Kubernetes Submit Queue 9534c4f563 Merge pull request #32427 from Random-Liu/system-verification
Automatic merge from submit-queue

Node Conformance Test: Add system verification

For #30122 and #29081.

This PR introduces system verification test in node e2e and conformance test. It will run before the real test. Once the system verification fails, the test will just fail. The output of the system verification is like this:

```
I0909 23:33:20.622122    2717 validators.go:45] Validating os...
OS: Linux
I0909 23:33:20.623274    2717 validators.go:45] Validating kernel...
I0909 23:33:20.624037    2717 kernel_validator.go:79] Validating kernel version
KERNEL_VERSION: 3.16.0-4-amd64
I0909 23:33:20.624146    2717 kernel_validator.go:93] Validating kernel config
CONFIG_NAMESPACES: enabled
CONFIG_NET_NS: enabled
CONFIG_PID_NS: enabled
CONFIG_IPC_NS: enabled
CONFIG_UTS_NS: enabled
CONFIG_CGROUPS: enabled
CONFIG_CGROUP_CPUACCT: enabled
CONFIG_CGROUP_DEVICE: enabled
CONFIG_CGROUP_FREEZER: enabled
CONFIG_CGROUP_SCHED: enabled
CONFIG_CPUSETS: enabled
CONFIG_MEMCG: enabled
I0909 23:33:20.679328    2717 validators.go:45] Validating cgroups...
CGROUPS_CPU: enabled
CGROUPS_CPUACCT: enabled
CGROUPS_CPUSET: enabled
CGROUPS_DEVICES: enabled
CGROUPS_FREEZER: enabled
CGROUPS_MEMORY: enabled
I0909 23:33:20.679454    2717 validators.go:45] Validating docker...
DOCKER_GRAPH_DRIVER: aufs
```

It verifies the system following a predefined `SysSpec`:

``` go
// DefaultSysSpec is the default SysSpec.
 var DefaultSysSpec = SysSpec{
    OS:            "Linux",
    KernelVersion: []string{`3\.[1-9][0-9].*`, `4\..*`}, // Requires 3.10+ or 4+
    // TODO(random-liu): Add more config
    KernelConfig: KernelConfig{
        Required: []string{
            "NAMESPACES", "NET_NS", "PID_NS", "IPC_NS", "UTS_NS",
            "CGROUPS", "CGROUP_CPUACCT", "CGROUP_DEVICE", "CGROUP_FREEZER",
            "CGROUP_SCHED", "CPUSETS", "MEMCG",
        },
        Forbidden: []string{},
    },
    Cgroups: []string{"cpu", "cpuacct", "cpuset", "devices", "freezer", "memory"},
    RuntimeSpec: RuntimeSpec{
        DockerSpec: &DockerSpec{
            Version: []string{`1\.(9|\d{2,})\..*`}, // Requires 1.9+
            GraphDriver: []string{"aufs", "overlay", "devicemapper"},
        },
    },
 }
```

Currently, it only supports:
- Kernel validation: version validation and kernel configuration validation
- Cgroup validation: validating whether required cgroups subsystems are enabled.
- Runtime Validation: currently, only validates docker graph driver.

The validating framework is ready. The specific validation items could be added over time.

@dchen1107 
/cc @kubernetes/sig-node
2016-11-06 17:12:39 -08:00
Kubernetes Submit Queue 649c0ddd0e Merge pull request #35342 from timstclair/rejected
Automatic merge from submit-queue

[AppArmor] Hold bad AppArmor pods in pending rather than rejecting

Fixes https://github.com/kubernetes/kubernetes/issues/32837

Overview of the fix:

If the Kubelet needs to reject a Pod for a reason that the control plane doesn't understand (e.g. which AppArmor profiles are installed on the node), then it might contiinuously try to run the pod on the same rejecting node. This change adds a concept of "soft rejection", in which the Pod is admitted, but not allowed to run (and therefore held in a pending state). This prevents the pod from being retried on other nodes, but also prevents the high churn. This is consistent with how other missing local resources (e.g. volumes) is handled.

A side effect of the change is that Pods which are not initially runnable will be retried. This is desired behavior since it avoids a race condition when a new node is brought up but the AppArmor profiles have not yet been loaded on it.

``` release-note
Pods with invalid AppArmor configurations will be held in a Pending state, rather than rejected (failed). Check the pod status message to find out why it is not running.
```

@kubernetes/sig-node @timothysc @rrati @davidopp
2016-11-05 22:52:26 -07:00
Random-Liu 150a04d2fc Remove unused image test. 2016-11-05 22:19:43 -07:00
Kubernetes Submit Queue 56526043d5 Merge pull request #32530 from mtaufen/dynamic-settings-tests
Automatic merge from submit-queue

Utility functions for using dynamic Kubelet configuration from a test

/cc @vishh @dchen1107
2016-11-04 20:24:03 -07:00
Random-Liu f9b50f0949 Update bazel. 2016-11-03 20:38:29 -07:00
Tim St. Clair ec9111d942
Hold bad AppArmor pods in pending rather than rejecting 2016-11-02 11:05:16 -07:00
Michael Taufen 5190a7d72d Add dynamic kubelet configuration utilities to node e2e tests
Also modify dynamic kubelet configuration test to rely on new utility functions
2016-11-02 10:02:21 -07:00
derekwaynecarr 42289c2758 pod and qos level cgroup support 2016-11-02 08:07:04 -04:00