Commit Graph

1465 Commits (b01e2a387d015b36527fa5dc6d4fc6afe764f571)

Author SHA1 Message Date
Michael Taufen fcc1f8e7b6 Move to a structured status for dynamic Kubelet config
Updates dynamic Kubelet config to use a structured status, rather than a
node condition. This makes the status machine-readable, and thus more
useful for config orchestration.

Fixes: #56896
2018-05-15 11:25:12 -07:00
Kubernetes Submit Queue b2fe2a0a6d
Merge pull request #59847 from mtaufen/dkcfg-explicit-keys
Automatic merge from submit-queue (batch tested with PRs 63624, 59847). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

explicit kubelet config key in Node.Spec.ConfigSource.ConfigMap

This makes the Kubelet config key in the ConfigMap an explicit part of
the API, so we can stop using magic key names.
    
As part of this change, we are retiring ConfigMapRef for ConfigMap.


```release-note
You must now specify Node.Spec.ConfigSource.ConfigMap.KubeletConfigKey when using dynamic Kubelet config to tell the Kubelet which key of the ConfigMap identifies its config file.
```
2018-05-09 17:55:13 -07:00
Filipe Brandenburger 48d052fae4 Fix cgroup names in node_container_manager_test.
The names were made invalid for the CgroupName refactor in #62541, so
update them here.

Furthermore, as the new names are now compatible with what
EnforceNodeAllocatable wants, reuse the constants there as well.

Tested:
  $ make test-e2e-node REMOTE=true HOSTS=test-cos-beta-67-10575-27-0 FOCUS='Validate Node Allocatable' SKIP='' TEST_ARGS='--feature-gates=DynamicKubeletConfig=true'
  • [SLOW TEST:39.488 seconds]
  [k8s.io] Node Container Manager [Serial]
    Validate Node Allocatable
      set's up the node and runs the test
  Ran 1 of 261 Specs in 57.348 seconds
  SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 260 Skipped
2018-05-08 16:15:26 -07:00
David Ashpole a5df208866 eviction test ensures failed pods are evicted 2018-05-08 16:08:35 -07:00
Michael Taufen c41cf55a2c explicit kubelet config key in Node.Spec.ConfigSource.ConfigMap
This makes the Kubelet config key in the ConfigMap an explicit part of
the API, so we can stop using magic key names.

As part of this change, we are retiring ConfigMapRef for ConfigMap.
2018-05-08 15:37:26 -07:00
Kubernetes Submit Queue a244d8a48f
Merge pull request #63130 from vikaschoudhary16/dp_e2e_alloc
Automatic merge from submit-queue (batch tested with PRs 61455, 63346, 63130, 63404). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

[Device-Plugin]: Extend e2e test to cover node allocatables

**What this PR does / why we need it**:
 Extends device plugin e2e to cover node allocatable
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
None
```
/sig node
/area hw-accelerators
/cc @jiayingz @vishh @RenaudWasTaken
2018-05-03 14:24:10 -07:00
vikaschoudhary16 b953f852f5 [Device-Plugin]: Extend e2e test to cover node allocatables 2018-05-03 14:19:29 -04:00
Kubernetes Submit Queue 592c39bccc
Merge pull request #62541 from filbranden/cgroupname1
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Use a []string for CgroupName, which is a more accurate internal representation

**What this PR does / why we need it**:

This is purely a refactoring and should bring no essential change in behavior.

It does clarify the cgroup handling code quite a bit.

It is preparation for further changes we might want to do in the cgroup hierarchy. (But it's useful on its own, so even if we don't do any, it should still be considered.)

**Special notes for your reviewer**:

The slice of strings more precisely captures the hierarchic nature of the cgroup paths we use to represent pods and their groupings.

It also ensures we're reducing the chances of passing an incorrect path format to a cgroup driver that requires a different path naming, since now explicit conversions are always needed.

The new constructor `NewCgroupName` starts from an existing `CgroupName`, which enforces a hierarchy where a root is always needed. It also performs checking on the component names to ensure invalid characters ("/" and "_") are not in use.

A `RootCgroupName` for the top of the cgroup hierarchy tree is introduced.

This refactor results in a net reduction of around 30 lines of code,
mainly with the demise of ConvertCgroupNameToSystemd which had fairly
complicated logic in it and was doing just too many things.

There's a small TODO in a helper `updateSystemdCgroupInfo` that was introduced to make this commit possible. That logic really belongs in libcontainer, I'm planning to send a PR there to include it there. (The API already takes a field with that information, only that field is only processed in cgroupfs and not systemd driver, we should fix that.)

Tested: By running the e2e-node tests on both Ubuntu 16.04 (with cgroupfs driver) and CentOS 7 (with systemd driver.)

**NOTE**: I only tested this with dockershim, we should double-check that this works with the CRI endpoints too, both in cgroupfs and systemd modes.

/assign @derekwaynecarr 
/assign @dashpole 
/assign @Random-Liu 

**Release note**:

```release-note
NONE
```
2018-05-03 08:16:45 -07:00
Kubernetes Submit Queue b5f61ac129
Merge pull request #62657 from matthyx/master
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Update all script shebangs to use /usr/bin/env interpreter instead of /bin/interpreter

This is required to support systems where bash doesn't reside in /bin (such as NixOS, or the *BSD family) and allow users to specify a different interpreter version through $PATH manipulation.
https://www.cyberciti.biz/tips/finding-bash-perl-python-portably-using-env.html
```release-note
Use /usr/bin/env in all script shebangs to increase portability.
```
2018-05-02 19:44:32 -07:00
Filipe Brandenburger b230fb8ac4 Use a []string for CgroupName, which is a more accurate internal representation
The slice of strings more precisely captures the hierarchic nature of
the cgroup paths we use to represent pods and their groupings.

It also ensures we're reducing the chances of passing an incorrect path
format to a cgroup driver that requires a different path naming, since
now explicit conversions are always needed.

The new constructor NewCgroupName starts from an existing CgroupName,
which enforces a hierarchy where a root is always needed. It also
performs checking on the component names to ensure invalid characters
("/" and "_") are not in use.

A RootCgroupName for the top of the cgroup hierarchy tree is introduced.

This refactor results in a net reduction of around 30 lines of code,
mainly with the demise of ConvertCgroupNameToSystemd which had fairly
complicated logic in it and was doing just too many things.

There's a small TODO in a helper updateSystemdCgroupInfo that was
introduced to make this commit possible. That logic really belongs in
libcontainer, I'm planning to send a PR there to include it there.
(The API already takes a field with that information, only that field is
only processed in cgroupfs and not systemd driver, we should fix that.)

Tested by running the e2e-node tests on both Ubuntu 16.04 (with cgroupfs
driver) and CentOS 7 (with systemd driver.)
2018-05-01 08:29:06 -07:00
Kubernetes Submit Queue 452b8c9e0d
Merge pull request #62101 from bart0sh/PR0010-e2e_node-kubelet-command-line-fix
Automatic merge from submit-queue (batch tested with PRs 58474, 60034, 62101, 63198). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix wrong usage of kubelet option

**What this PR does / why we need it**:

"--allow-privileged true" is incorrect usage of boolean option.
It means setting '--allow-priviledged' to its default value plus
non-existing subcommand 'true'.

"--allow-privileged false" is even more confusing as it sets
allow-priviledged flag to its default value 'true'

This is true for any boolean command line option.

Fixed this by using correct syntax --allow-priviledged=true

**Special notes for your reviewer**:
This is a show-stopper for PR #61833 

**Release note**:

```release-note
NONE
```
2018-04-30 13:24:12 -07:00
Kubernetes Submit Queue e01858c595
Merge pull request #63252 from liztio/e2e_node_utils
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

E2e path utils

**What this PR does / why we need it**:

A bunch of useful methods for getting k8s paths and stuff are secreted away in `e2e_node`. This PR pulls them out so they can be used in other E2E method.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:



**Special notes for your reviewer**:
This is motivated by the upcoming kubeadm-specific E2E tests. Those tests will be added in a follow-up to this PR.

**Release note**:

```release-note
NONE
```
2018-04-27 11:43:15 -07:00
liz 1ec02b1cd5
Move path management from e2e_node to common test/utils directory
enables reuse of these methods for other e2e tests
2018-04-27 11:12:10 -04:00
liz 432b542218
Generated artefacts 2018-04-27 11:11:45 -04:00
Jordan Liggitt 1bddcdcf44
Bump QPS on namespace controller
https://github.com/kubernetes/kubernetes/pull/62913 switched from using a client pool, where each groupVersionResource got its own rest client, to a single client.

This increases the QPS to account for increased requests using a single rest client rate limiter.
2018-04-27 10:11:14 -04:00
David Eads 3632037e60 add easy to use dynamic client 2018-04-25 08:55:26 -04:00
Kubernetes Submit Queue 44b57338d5
Merge pull request #59692 from mtaufen/dkcfg-unpack-configmaps
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

unpack dynamic kubelet config payloads to files

This PR unpacks the downloaded ConfigMap to a set of files on the node.

This enables other config files to ride alongside the
KubeletConfiguration, and the KubeletConfiguration to refer to these
cohabitants with relative paths.

This PR also stops storing dynamic config metadata (e.g. current,
last-known-good config records) in the same directory as config
checkpoints. Instead, it splits the storage into `meta` and
`checkpoints` dirs.

The current store dir structure is as follows:
```
- dir named by --dynamic-config-dir (root for managing dynamic config)
| - meta (dir for metadata, e.g. which config source is currently assigned, last-known-good)
  | - current (a serialized v1 NodeConfigSource object, indicating the assigned config)
  | - last-known-good (a serialized v1 NodeConfigSource object, indicating the last-known-good config)
| - checkpoints (dir for config checkpoints)
  | - uid1 (dir for unpacked config, identified by uid1)
    | - file1
    | - file2
    | - ...
  | - uid2
  | - ...
```

There are some likely changes to the above structure before dynamic config goes beta, such as renaming "current" to "assigned" for clarity, and extending the checkpoint identifier to include a resource version, as part of resolving #61643.

```release-note
NONE
```

/cc @luxas @smarterclayton
2018-04-24 12:01:37 -07:00
Michael Taufen c9d398d01e unpack dynamic kubelet config payloads to files
This PR unpacks the downloaded ConfigMap to a set of files on the node.

This enables other config files to ride alongside the
KubeletConfiguration, and the KubeletConfiguration to refer to these
cohabitants with relative paths.

This PR also stops storing dynamic config metadata (e.g. current,
last-known-good config records) in the same directory as config
checkpoints. Instead, it splits the storage into `meta` and
`checkpoints` dirs.
2018-04-19 09:18:53 -07:00
Matthias Bertschy 9b15af19b2 Update all script to use /usr/bin/env bash in shebang 2018-04-19 13:20:13 +02:00
Kubernetes Submit Queue dd8f8819e4
Merge pull request #62768 from krzyzacy/clean-up-jenkins
Automatic merge from submit-queue (batch tested with PRs 62445, 62768, 60633). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

clean up *.properties files

ref https://github.com/kubernetes/kubernetes/issues/62754

to double check, is any of the node config yaml files are still being used outside of CI? I'll make a follow up one to clean them up as well.

/assign @Random-Liu @mindprince @yujuhong
2018-04-18 12:25:08 -07:00
Kubernetes Submit Queue 1ddb0e05e5
Merge pull request #62761 from Random-Liu/lower-usage-nano-cores-in-summary
Automatic merge from submit-queue (batch tested with PRs 62761, 62715). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Lower UsageNanoCores boundary in summary api test.

We recently switched to use `p2p` instead of `bridge` in containerd https://github.com/containerd/cri/pull/742.

However, after that switch, the `UsageNanoCores`  becomes lower, and constantly fails the test. An example failure:
* https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/containerd_cri/740/pull-cri-containerd-node-e2e/690/

This is probably because:
1) The test container used in summary test does `ping`. https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/summary_test.go#L352
2) `p2p` is simpler than `bridge`, "Maybe cycles are saved from waiving Mac learning" - @jingax10.

This PR lowers the boundary by 1 magnitude.

Signed-off-by: Lantao Liu <lantaol@google.com>

**Release note**:

```release-note
none
```
2018-04-17 22:38:10 -07:00
Sen Lu 854132fdcc clean up *.properties files 2018-04-17 21:44:32 -07:00
Lantao Liu 002483fe72 Lower UsageNanoCores boundary in summary api test.
Signed-off-by: Lantao Liu <lantaol@google.com>
2018-04-17 18:37:51 -07:00
Lantao Liu c86e85c420 Fix extra-log flag for node e2e.
Signed-off-by: Lantao Liu <lantaol@google.com>
2018-04-17 21:48:26 +00:00
Lantao Liu 27105c90ec Fix kubelet flags.
Signed-off-by: Lantao Liu <lantaol@google.com>
2018-04-16 20:42:40 +00:00
Yu-Ju Hong 9a47bd0b67 Node E2E: Remove the simple mount test
There are EmptyDir volume tests in test/e2e/common already. The test
does not add any more coverage.
2018-04-12 17:05:28 -07:00
Ed Bartosh 7e3d28b30f Fix wrong usage of kubelet options
"--allow-privileged true" is incorrect usage of boolean option.
It means setting '--allow-priviledged' to its default value plus
non-existing subcommand 'true'.

"--allow-privileged false" is even more confusing as it sets
allow-priviledged flag to its default value 'true'

This is true for any boolean command line option.

Fixed this by using correct syntax --allow-priviledged=true

Fixed generating of kubelet command line in addKubeletConfigFlags
function.
2018-04-12 15:19:49 +03:00
Kubernetes Submit Queue 1dc6e87f57
Merge pull request #62206 from yujuhong/rm-rkt-refs
Automatic merge from submit-queue (batch tested with PRs 62192, 61866, 62206, 62360). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Remove rkt references in the codebase

```release-note
None
```
2018-04-10 23:52:21 -07:00
Kubernetes Submit Queue 3bc1a0a1d0
Merge pull request #60900 from dashpole/eviction_test_no_pressure
Automatic merge from submit-queue (batch tested with PRs 60900, 62215, 62196). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

[Flaky test fix] Use memory.force_empty before and after eviction tests

**What this PR does / why we need it**:
(copied from https://github.com/kubernetes/kubernetes/pull/60720):
MemoryAllocatableEviction tests have been somewhat flaky: https://k8s-testgrid.appspot.com/sig-node-kubelet#kubelet-serial-gce-e2e&include-filter-by-regex=MemoryAllocatable
The failure on the flakes is ["Pod ran to completion"](https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-node-kubelet-serial/3785#k8sio-memoryallocatableeviction-slow-serial-disruptive-when-we-run-containers-that-should-cause-memorypressure-should-eventually-evict-all-of-the-correct-pods).
Looking at [an example log](https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-node-kubelet-serial/3785/artifacts/tmp-node-e2e-6070a774-cos-stable-63-10032-71-0/kubelet.log) (and search for memory-hog-pod, we can see that this pod fails admission because the allocatable memory threshold has already been crossed.
`eviction manager: thresholds - ignoring grace period: threshold [signal=allocatableMemory.available, quantity=250Mi] observed 242404Ki`

https://github.com/kubernetes/kubernetes/pull/60720 wasn't effective.  To clean-up after each eviction test, and prepare for the next, use memory.force_empty to make the kernel reclaim memory in the allocatable cgroup before and after eviction tests.

**Special notes for your reviewer**:
I tested to make sure this doesn't break Cgroup Manager tests.
It should work on both cgroupfs and systemd based systems, although I have only tested in on cgroupfs.

**Release note**:
```release-note
NONE
```

/assign @yujuhong @Random-Liu 
/sig node
/priority important-soon
/kind bug

its getting a little late in the release cycle, so we can probably wait until after code freeze is lifted for this.
2018-04-06 21:30:06 -07:00
Kubernetes Submit Queue 1e767ddf60
Merge pull request #62135 from jiayingz/kubelet-restart-fix
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fixes restartKubelet in test/e2e_node failure.

Looks like there is some recent change on how we start kubelet service
in test_e2e_node. Fixes restartKubelet() to get right kubelet service
name to cope with the change.



**What this PR does / why we need it**:

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes kubelet-serial-gce-e2e test failure:
https://k8s-testgrid.appspot.com/wg-resource-management#kubelet-serial-gce-e2e
Thanks a lot to @mindprince for noticing this!

**Special notes for your reviewer**:

**Release note**:

```release-note

```
2018-04-06 15:46:19 -07:00
David Ashpole 3254bdc1a4 use memory.force_empty before and after eviction tests 2018-04-06 14:01:11 -07:00
Yu-Ju Hong 59741bdfbd Remove rkt references in the codebase 2018-04-06 12:02:11 -07:00
Manjunath A Kumatagi 1bb810e749 Use pause manifest image 2018-04-06 11:00:50 +05:30
Jiaying Zhang 0138007bdd Fixes restartKubelet in test/e2e_node failure.
Looks like there is some recent change on how we start kubelet service
in test_e2e_node. Fixes restartKubelet() to get right kubelet service
name to cope with the change.
2018-04-04 13:18:08 -07:00
hzxuzhonghu 8cce8bdc85 make kube-apiserver ServerRunOptions setdefault and Validate before use 2018-04-04 11:19:55 +08:00
Kubernetes Submit Queue 043204b1e5
Merge pull request #61498 from mindprince/delete-in-tree-gpu
Automatic merge from submit-queue (batch tested with PRs 61498, 62030). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Delete in-tree support for NVIDIA GPUs.

This removes the alpha Accelerators feature gate which was deprecated in 1.10 (#57384).
The alternative feature DevicePlugins went beta in 1.10 (#60170).

Fixes #54012

```release-note
Support for "alpha.kubernetes.io/nvidia-gpu" resource which was deprecated in 1.10 is removed. Please use the resource exposed by DevicePlugins instead ("nvidia.com/gpu").
```
2018-04-03 02:02:04 -07:00
Rohit Agarwal 87dda3375b Delete in-tree support for NVIDIA GPUs.
This removes the alpha Accelerators feature gate which was deprecated in 1.10.
The alternative feature DevicePlugins went beta in 1.10.
2018-04-02 20:17:01 -07:00
Christoph Blecker 710c8563b4
Fix go vet errors 2018-04-02 17:57:44 -07:00
Kubernetes Submit Queue 99fd98a893
Merge pull request #61740 from filbranden/nodetest1
Automatic merge from submit-queue (batch tested with PRs 61482, 61740). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Make systemd service name for kubelet use a timestamp in e2e-node tests.

**What this PR does / why we need it**:

This makes it easier to figure out which execution was last when looking at the output of `systemd list-units kubelet-*.service`.

We try to find the name of the /tmp/node-e2e-* directory and use the same timestamp if we can. Otherwise, we just call Now() again, which isn't as nice (as the unit name and directory name will not match) but will still produce unit names that will be ordered when launching multiple subsequent executions on the same host.


**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
N/A

**Special notes for your reviewer**:

Tested using `make test-e2e-node REMOTE=true` and then checking `systemctl list-units kubelet-*.service` on the target host.

```
$ systemctl list-units kubelet-*.service
kubelet-20180326T142016.service loaded active exited /tmp/node-e2e-20180326T142016/kubelet --kubeconfig /tmp/node-e2e-20180326T142016/kubeconfig --root-dir /var/lib/kubelet ...
kubelet-20180326T143550.service loaded active exited /tmp/node-e2e-20180326T143550/kubelet --kubeconfig /tmp/node-e2e-20180326T143550/kubeconfig --root-dir /var/lib/kubelet ...
```

The units are sorted in the order they were launched.

**Release note**:

```release-note
NONE
```
2018-03-29 21:10:03 -07:00
Filipe Brandenburger b8c39b7055 In summary_test, make Docker cpu/memory checks optional if unavailable.
The numbers will only be available when docker.service has its own
memory and cpu cgroups, which doesn't necessarily happen unless the unit
has Delegate=yes configured.

Let's work around that by checking the status of Delegate, in the case
where we are:
* running Docker
* running Systemd
* able to check the status through systemctl
* the status is explicitly Delegate=no (the default)

If all of those are true, let's make CPU and Memory expectations
optional.

Tested: make test-e2e-node REMOTE=true HOSTS=centos-e2e-node FOCUS="Summary API"
2018-03-29 18:12:30 -07:00
Filipe Brandenburger 351a70b60e In summary_test, create a file outside the test volume too.
This is necessary to show any RootFs usage on systems where the backing
filesystem of overlay2 is xfs.

The current test only created directories (for mount points) in the
upper layer of the overlay. Outside of the mount namespace, only the
directories are visible. When running `du` on those, usually filesystems
will show some usage, but not xfs, which shows a disk usage of 0 for
directories.

Fix this by creating a file in the root directory, outside the volumes,
in order to trigger some disk usage that can be measured by `du`.

Tested: make test-e2e-node REMOTE=true HOSTS=centos-e2e-node FOCUS="Summary API"
2018-03-29 18:12:29 -07:00
Kubernetes Submit Queue 5ae7bba496
Merge pull request #60100 from mtaufen/node-authz-nodeconfigsource
Automatic merge from submit-queue (batch tested with PRs 61829, 61908, 61307, 61872, 60100). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

node authorizer sets up access rules for dynamic config

This PR makes the node authorizer automatically set up access rules for
dynamic Kubelet config.

I also added some validation to the node strategy, which I discovered we
were missing while writing this.

This PR is based on another WIP from @liggitt.

```release-note
The node authorizer now automatically sets up rules for Node.Spec.ConfigSource when the DynamicKubeletConfig feature gate is enabled.
```
2018-03-29 17:37:18 -07:00
Filipe Brandenburger 76ef9c9074 Make systemd service name for kubelet use a timestamp in e2e-node tests.
This makes it easier to figure out which execution was last when looking
at the output of `systemd list-units kubelet-*.service`.

We try to find the name of the /tmp/node-e2e-* directory and use the
same timestamp if we can. Otherwise, we just call Now() again, which
isn't as nice (as the unit name and directory name will not match) but
will still produce unit names that will be ordered when launching
multiple subsequent executions on the same host.
2018-03-29 11:17:42 -07:00
Filipe Brandenburger 451faff4ef Use curl instead of wget to fetch the CNI tarball in e2e-node test
Curl is more ubiquitous than wget. For instance, the GCE centos-7 and
rhel-7 image families ship curl by default, but not wget.

Looking at the shell scripts under cluster/, they tend to use curl more
than wget. (The ones that use wget, such as get-kube.sh, try curl first
and only fallback to wget if it's not available.)

Tested: by running node-e2e-test on Ubuntu, COS and CentOS.
2018-03-27 09:41:09 -07:00
Michael Taufen ab8dc12333 node authorizer sets up access rules for dynamic config
This PR makes the node authorizer automatically set up access rules for
dynamic Kubelet config.

I also added some validation to the node strategy, which I discovered we
were missing while writing this.
2018-03-27 08:49:45 -07:00
Kubernetes Submit Queue 915798d229
Merge pull request #60563 from hzxuzhonghu/replace-context
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Replace package "golang.org/x/net/context" with "context"

**What this PR does / why we need it**:
Replace package "golang.org/x/net/context" with "context"

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #60560

**Special notes for your reviewer**:
As of Go 1.7 this package(golang.org/x/net/context) is available in the standard library under the name context. see (https://godoc.org/golang.org/x/net/context)

It is almost machinery replace. 

**Release note**:

```release-note
NONE
```
2018-03-23 16:34:23 -07:00
Kubernetes Submit Queue 1b6b2ee790
Merge pull request #61478 from shyamjvs/capture-pod-startup-phases-as-metrics
Automatic merge from submit-queue (batch tested with PRs 61378, 60915, 61499, 61507, 61478). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Capture pod startup phases as metrics

Learning from https://github.com/kubernetes/kubernetes/issues/60589, we should also start collecting and graphing sub-parts of pod-startup latency.

/sig scalability
/kind feature
/priority important-soon
/cc @wojtek-t 

```release-note
NONE
```
2018-03-22 07:15:33 -07:00
hzxuzhonghu 70e45eccf2 Replace "golang.org/x/net/context" with "context" 2018-03-22 20:57:14 +08:00
Shyam Jeedigunta 0f0c754eb4 Get rid of duplicate VerifyPodStartupLatency util in node density tests 2018-03-21 16:58:31 +01:00
Shyam Jeedigunta b0dd166fa3 Capture different parts of pod-startup latency as metrics 2018-03-21 16:58:25 +01:00
Lantao Liu 9fc2795d55 Change pods memory boundary.
Signed-off-by: Lantao Liu <lantaol@google.com>
2018-03-20 23:24:16 +00:00
Kubernetes Submit Queue c64f19dd1b
Merge pull request #59728 from wgliang/master.append
Automatic merge from submit-queue (batch tested with PRs 59740, 59728, 60080, 60086, 58714). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

more concise to merge the slice

**What this PR does / why we need it**:
more concise to merge the slice

**Special notes for your reviewer**:
2018-03-19 21:34:30 -07:00
Kubernetes Submit Queue a3f40dd8df
Merge pull request #60856 from jiayingz/race-fix
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fixes the races around devicemanager Allocate() and endpoint deletion.

There is a race in predicateAdmitHandler Admit() that getNodeAnyWayFunc()
could get Node with non-zero deviceplugin resource allocatable for a
non-existing endpoint. That race can happen when a device plugin fails,
but is more likely when kubelet restarts as with the current registration
model, there is a time gap between kubelet restart and device plugin
re-registration. During this time window, even though devicemanager could
have removed the resource initially during GetCapacity() call, Kubelet
may overwrite the device plugin resource capacity/allocatable with the
old value when node update from the API server comes in later. This
could cause a pod to be started without proper device runtime config set.

To solve this problem, introduce endpointStopGracePeriod. When a device
plugin fails, don't immediately remove the endpoint but set stopTime in
its endpoint. During kubelet restart, create endpoints with stopTime set
for any checkpointed registered resource. The endpoint is considered to be
in stopGracePeriod if its stoptime is set. This allows us to track what
resources should be handled by devicemanager during the time gap.
When an endpoint's stopGracePeriod expires, we remove the endpoint and
its resource. This allows the resource to be exported through other channels
(e.g., by directly updating node status through API server) if there is such
use case. Currently endpointStopGracePeriod is set as 5 minutes.

Given that an endpoint is no longer immediately removed upon disconnection,
mark all its devices unhealthy so that we can signal the resource allocatable
change to the scheduler to avoid scheduling more pods to the node.
When a device plugin endpoint is in stopGracePeriod, pods requesting the
corresponding resource will fail admission handler.

Tested:
Ran GPUDevicePlugin e2e_node test 100 times and all passed now.



**What this PR does / why we need it**:

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/kubernetes/issues/60176

**Special notes for your reviewer**:

**Release note**:

```release-note
Fixes the races around devicemanager Allocate() and endpoint deletion.
```
2018-03-12 02:50:13 -07:00
Jiaying Zhang 5514a1f4dd Fixes the races around devicemanager Allocate() and endpoint deletion.
There is a race in predicateAdmitHandler Admit() that getNodeAnyWayFunc()
could get Node with non-zero deviceplugin resource allocatable for a
non-existing endpoint. That race can happen when a device plugin fails,
but is more likely when kubelet restarts as with the current registration
model, there is a time gap between kubelet restart and device plugin
re-registration. During this time window, even though devicemanager could
have removed the resource initially during GetCapacity() call, Kubelet
may overwrite the device plugin resource capacity/allocatable with the
old value when node update from the API server comes in later. This
could cause a pod to be started without proper device runtime config set.

To solve this problem, introduce endpointStopGracePeriod. When a device
plugin fails, don't immediately remove the endpoint but set stopTime in
its endpoint. During kubelet restart, create endpoints with stopTime set
for any checkpointed registered resource. The endpoint is considered to be
in stopGracePeriod if its stoptime is set. This allows us to track what
resources should be handled by devicemanager during the time gap.
When an endpoint's stopGracePeriod expires, we remove the endpoint and
its resource. This allows the resource to be exported through other channels
(e.g., by directly updating node status through API server) if there is such
use case. Currently endpointStopGracePeriod is set as 5 minutes.

Given that an endpoint is no longer immediately removed upon disconnection,
mark all its devices unhealthy so that we can signal the resource allocatable
change to the scheduler to avoid scheduling more pods to the node.
When a device plugin endpoint is in stopGracePeriod, pods requesting the
corresponding resource will fail admission handler.
2018-03-09 17:00:57 -08:00
Kubernetes Submit Queue ae7be34c32
Merge pull request #60509 from verb/pid-e2e
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add node-e2e test for ShareProcessNamespace

**What this PR does / why we need it**: Adds a node-e2e test for kubernetes/features#495

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #59554

**Special notes for your reviewer**: This requires a feature gate to be enabled in both the kubelet and API server. I'm not sure which jenkins configs need to be updated (or if these are even still used) so I just updated a pile of them.

opened kubernetes/test-infra#7030 for https://github.com/kubernetes/test-infra/blob/master/jobs/config.json

**Release note**:

```release-note
NONE
```
2018-03-05 14:20:14 -08:00
David Ashpole 395bea9d83 increase amount of memory filled by memory allocatable eviction test 2018-03-02 10:00:03 -08:00
Jiaying Zhang 6d7e6599f1 I forgot the fact that the DevicePlugin test itself restarts Kubelet
for testing purpose. Move that test back to Serial but constructs
a smaller test without kubelet restart that we may run during presubmit.
2018-03-01 14:02:09 -08:00
Kubernetes Submit Queue 5d26ef96a8
Merge pull request #59345 from hanxiaoshuai/fixtodo02051
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

fix todo:Move function readinessCheck to util

**What this PR does / why we need it**:
fix todo:Move function readinessCheck to util in test/e2e_node/services
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2018-02-28 08:06:34 -08:00
Kubernetes Submit Queue 5be121aca7
Merge pull request #60376 from mikedanese/fixup
Automatic merge from submit-queue (batch tested with PRs 60376, 55584, 60358, 54631, 60291). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

remove gcloud docker -- since it's deprecated

docker handles this now and it raises an error.

try 3

```release-note
NONE
```
2018-02-28 03:37:21 -08:00
Kubernetes Submit Queue 2023c019eb
Merge pull request #60451 from jiayingz/e2e_node_enable
Automatic merge from submit-queue (batch tested with PRs 60236, 60332, 57375, 60451, 57408). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Update device plugin e2e_node test to not changing Kubelet config

as DevicePlugins feature is enabled by default now.



**What this PR does / why we need it**:

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note

```
2018-02-28 01:12:32 -08:00
Mike Danese c0b7364563 remove gcloud docker -- since it's deprecated 2018-02-28 00:24:27 -08:00
Lee Verberne b02f1f2ce3 Add node-e2e test for ShareProcessNamespace 2018-02-28 09:15:56 +01:00
Ryan Hitchman 8aa3ca3cbb Add a few "+build linux" tags where appropriate. 2018-02-27 13:53:32 -08:00
Ryan Hitchman e04b91facf Remove unused variables (only assigned to) from test code.
This is revealed by the go/types package, which is stricter than
the Go compiler about unused variables. See also: golang/go#8560
2018-02-27 13:45:31 -08:00
Jiaying Zhang fee083feac Update device plugin e2e_node test to not changing Kubelet config
as DevicePlugins feature is enabled by default now.
2018-02-26 22:45:44 -08:00
Kubernetes Submit Queue e31c8a2252
Merge pull request #60318 from jiayingz/api-change
Automatic merge from submit-queue (batch tested with PRs 59159, 60318, 60079, 59371, 57415). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Made a couple API changes to deviceplugin/v1beta1 to avoid future

incompatible API changes:
- Add GetDevicePluginOptions rpc call. This is needed when we switch
  from Registration service to probe-based plugin watcher.
- Change AllocateRequest and AllocateResponse to allow device requests
  from multiple containers in a pod. Currently only made mechanical
  change on the devicemanager and test code to cope with the API but
  still issues an Allocate call per container. We can modify the
  devicemanager in 1.11 to issue a single Allocate call per pod.
  The change will also facilitate incremental API change to communicate
  pod level information through Allocate rpc if there is such future
  need.



**What this PR does / why we need it**:
Made a couple API changes to deviceplugin/v1beta1 to avoid future incompatible API changes.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/kubernetes/issues/59370

**Special notes for your reviewer**:

**Release note**:

```release-note

```
2018-02-24 21:19:33 -08:00
Kubernetes Submit Queue 720c29b3e8
Merge pull request #60314 from mtaufen/kubelet-manifest-is-oldspeak
Automatic merge from submit-queue (batch tested with PRs 60324, 60269, 59771, 60314, 59941). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

expunge the word 'manifest' from Kubelet's config API

The word 'manifest' technically refers to a container-group specification
that predated the Pod abstraction. We should avoid using this legacy
terminology where possible. Fortunately, the Kubelet's config API will
be beta in 1.10 for the first time, so we still had the chance to make
this change.

I left the flags alone, since they're deprecated anyway.

I changed a few var names in files I touched too, but this PR is the
just the first shot, not the whole campaign
(`git grep -i manifest | wc -l -> 1248`).

```release-note
Some field names in the Kubelet's now v1beta1 config API differ from the v1alpha1 API: PodManifestPath is renamed to PodPath, ManifestURL is renamed to PodURL, ManifestURLHeader is renamed to PodURLHeader.
```
2018-02-24 20:01:46 -08:00
Kubernetes Submit Queue 829ada8e30
Merge pull request #57965 from xiangpengzhao/cleanup-feature-gates
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Update test framework featuregates type

**What this PR does / why we need it**:
A cleanup following #53025 and #57962.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
ref: #53025
and #57962.

**Special notes for your reviewer**:
but yeah, not sure if it's worthy to do this :)

**Release note**:

```release-note
NONE
```
2018-02-24 07:34:19 -08:00
Jiaying Zhang 07beac6004 Made a couple API changes to deviceplugin/v1beta1 to avoid future
incompatible changes:
- Add GetDevicePluginOptions rpc call. This is needed when we switch
  from Registration service to probe-based plugin watcher.
- Change AllocateRequest and AllocateResponse to allow device requests
  from multiple containers in a pod. Currently only made mechanical
  change on the devicemanager and test code to cope with the API but
  still issues an Allocate call per container. We can modify the
  devicemanager in 1.11 to issue a single Allocate call per pod.
  The change will also facilitate incremental API change to communicate
  pod level information through Allocate rpc if there is such future
  need.
2018-02-23 16:15:09 -08:00
Michael Taufen b4bddcc998 expunge the word 'manifest' from Kubelet's config API
The word 'manifest' technically refers to a container-group specification
that predated the Pod abstraction. We should avoid using this legacy
terminology where possible. Fortunately, the Kubelet's config API will
be beta in 1.10 for the first time, so we still had the chance to make
this change.

I left the flags alone, since they're deprecated anyway.

I changed a few var names in files I touched too, but this PR is the
just the first shot, not the whole campaign
(`git grep -i manifest | wc -l -> 1248`).
2018-02-23 11:44:06 -08:00
Lantao Liu faa581c5cb Add node e2e test for log rotation. 2018-02-23 01:42:35 +00:00
Lantao Liu 313e8717f6 Generated code 2018-02-23 01:42:35 +00:00
Kubernetes Submit Queue 270148d7d9
Merge pull request #58684 from hzxuzhonghu/default-enabled-admission
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

set default enabled admission plugins by official document

**What this PR does / why we need it**:

https://kubernetes.io/docs/admin/admission-controllers/#is-there-a-recommended-set-of-admission-controllers-to-use

recommend  running the following set of admission controllers 
```
If you previously had not set the `--admission-control` flag, your cluster behavior may change (to be more standard).  See [https://kubernetes.io/docs/admin/admission-controllers/] for explanation of admission control.
```

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
Set default enabled admission plugins `NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota`
```
2018-02-22 05:24:44 -08:00
Kubernetes Submit Queue 714b19ee75
Merge pull request #57583 from MorrisLaw/bugfix/logf-newline
Automatic merge from submit-queue (batch tested with PRs 60158, 60156, 58111, 57583, 60055). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Bugfix/logf newline

**What this PR does / why we need it**:
Removes all redundant new lines being passed into the `Logf()` function. This involved going through code in both `test/e2e` and `test/e2e_node`, finding the newline redundancies in calls to `Logf()` and removing them.

**Which issue(s) this PR fixes**:
Fixes [#57102](https://github.com/kubernetes/kubernetes/issues/57102)

**Release note**:

```release-note
NONE
```
2018-02-21 22:10:34 -08:00
hzxuzhonghu 27f3fd2d79 set default enabled admission plugins by official document 2018-02-22 11:02:02 +08:00
Kubernetes Submit Queue 687c651dfd
Merge pull request #59884 from mikedanese/remove-deprecated-proxy
Automatic merge from submit-queue (batch tested with PRs 58716, 59977, 59316, 59884, 60117). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

remove deprecated /proxy paths

These were deprecated in v1.2.
ref https://github.com/kubernetes/kubernetes/issues/59885
```release-note
kube-apiserver: the root /proxy paths have been removed (deprecated since v1.2). Use the /proxy subresources on objects that support HTTP proxying.
```

@kubernetes/sig-api-machinery-api-reviews
2018-02-21 15:40:45 -08:00
vikaschoudhary16 e64517cd74 Migrate deviceplugin api from v1alpha to v1beta1 2018-02-21 01:26:20 -05:00
vikaschoudhary16 defcab81d5 Invoke PreStart RPC call before container start, if desired by plugin
Signed-off-by: vikaschoudhary16 <vichoudh@redhat.com>
2018-02-21 01:25:24 -05:00
Mike Danese 7b4722964d remove deprecated /proxy paths
These were depercated in v1.2.
2018-02-20 14:42:19 -08:00
Kubernetes Submit Queue 96ec318718
Merge pull request #59842 from ixdy/update-rules_go-02-2018
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

 Update bazelbuild/rules_go, kubernetes/repo-infra, and gazelle dependencies

**What this PR does / why we need it**: updates our bazelbuild/rules_go dependency in order to bump everything to go1.9.4. I'm separating this effort into two separate PRs, since updating rules_go requires a large cleanup, removing an attribute from most build rules.

**Release note**:

```release-note
NONE
```
2018-02-19 22:23:05 -08:00
Jeremy L. Morris e724886ad5 Removed newlines from e2e log statements. 2018-02-17 22:25:38 -05:00
David Ashpole 960856f4e8 collect metrics on the /kubepods cgroup on-demand 2018-02-17 12:32:40 -08:00
Kubernetes Submit Queue 1e5a58416b
Merge pull request #59989 from mtaufen/fix-e2e-node-tests
Automatic merge from submit-queue (batch tested with PRs 59927, 59989, 59950). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix e2e node setKubeletConfiguration helper

The helper should have been using `apiequality.Semantic.DeepEqual`,
instead of `reflect.DeepEqual`. Previously, nil vs empty containers
were treated as not equal, but they should be considered equal for
objects managed by Kubernetes API machinery, like KubeletConfiguration.

This should fix the failing eviction tests.

```release-note
NONE
```
2018-02-16 17:42:33 -08:00
Kubernetes Submit Queue 270ed995f4
Merge pull request #59841 from dashpole/metrics_after_reclaim
Automatic merge from submit-queue (batch tested with PRs 59683, 59964, 59841, 59936, 59686). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Reevaluate eviction thresholds after reclaim functions

**What this PR does / why we need it**:
When the node comes under `DiskPressure` due to inodes or disk space, the eviction manager runs garbage collection functions to clean up dead containers and unused images.
Currently, we use the strategy of trying to measure the disk space and inodes freed by garbage collection.  However, as #46789 and #56573 point out, there are gaps in the implementation that can cause extra evictions even when they are not required.  Furthermore, for nodes which frequently cycle through images, it results in a large number of evictions, as running out of inodes always causes an eviction.

This PR changes this strategy to call the garbage collection functions and ignore the results.  Then, it triggers another collection of node-level metrics, and sees if the node is still under DiskPressure.
This way, we can simply observe the decrease in disk or inode usage, rather than trying to measure how much is freed.

**Which issue(s) this PR fixes**:
Fixes #46789
Fixes #56573
Related PR #56575

**Special notes for your reviewer**:
This will look cleaner after #57802  removes arguments from [makeSignalObservations](https://github.com/kubernetes/kubernetes/pull/57802/files#diff-9e5246d8c78d50ce4ba440f98663f3e9R719).

**Release note**:
```release-note
NONE
```

/sig node
/kind bug
/priority important-soon
cc @kubernetes/sig-node-pr-reviews
2018-02-16 16:31:33 -08:00
Michael Taufen 26cc4ff55c Fix e2e node setKubeletConfiguration helper
The helper should have been using `apiequality.Semantic.DeepEqual`,
instead of `reflect.DeepEqual`. Previously, nil vs empty containers
were treated as not equal, but they should be considered equal for
objects managed by Kubernetes API machinery, like KubeletConfiguration.

This should fix the failing eviction tests.
2018-02-16 14:53:27 -08:00
Jeff Grafton ef56a8d6bb Autogenerated: hack/update-bazel.sh 2018-02-16 13:43:01 -08:00
Kubernetes Submit Queue eac5bc0035
Merge pull request #57136 from k82cn/k8s_54313
Automatic merge from submit-queue (batch tested with PRs 57136, 59920). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Updated PID pressure node condition.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
part of #54313 

**Release note**:

```release-note
Updated PID pressure node condition
```
2018-02-16 10:35:33 -08:00
David Ashpole e0830d0b71 reevaluate eviction thresholds after reclaim functions 2018-02-16 08:35:24 -08:00
Kubernetes Submit Queue c03edcc58e
Merge pull request #53833 from mtaufen/kubeletconfig-to-beta
Automatic merge from submit-queue (batch tested with PRs 59353, 59905, 53833). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Graduate kubeletconfig API group to beta

Regarding https://github.com/kubernetes/features/issues/281, this PR moves the kubeletconfig API group to beta. 

After #53088, the KubeletConfiguration type should not contain any deprecated or experimental fields, and we should not have to remove any more fields from the type before graduating it to beta. 

We need the community to double check for two things, however:
1. Are there any fields currently in the KubeletConfiguration type that you were going to mark deprecated this quarter, but haven't yet?
2. Are there any fields currently in the KubeletConfiguration type that are experimental or alpha, but were not explicitly denoted as such?

Please comment on this PR if you can answer "yes" to either of those two questions. Please cc anyone with a stake in the kubeletconfig API, so we get as much coverage as possible.

/cc @thockin @dchen1107 @Random-Liu @yujuhong @dashpole @tallclair @vishh @abw @freehan @dnardo @bowei @MrHohn @luxas @liggitt @ncdc @derekwaynecarr @mikedanese 

@kubernetes/sig-network-pr-reviews, @kubernetes/sig-node-pr-reviews 

```release-note
action required: The `kubeletconfig` API group has graduated from alpha to beta, and the name has changed to `kubelet.config.k8s.io`. Please use `kubelet.config.k8s.io/v1beta1`, as `kubeletconfig/v1alpha1` is no longer available. 
```

**TODO:**
- [x] Move experimental/non-gated-alpha/soon-to-be-deprecated fields to `KubeletFlags`
  - [x] #53088
  - [x] #54154
  - [x] #54160
  - [x] #55562
  - [x] #55983
  - [x] #57851
- [x] Lift embedded structure out of strings
  - [x] #53025
  - [x] #54643
  - [x] #54823
  - [x] #55254
- [x] Resolve relative paths against the location config files are loaded from
  - [x] #55648 
- [x] Rename to `kubelet.config.k8s.io`
- [x] Comments
  - [x] Make sure existing comments at least read sensibly.
  - [x] Note default values in comments on the versioned struct.
  - [x] Remove any reference to default values in comments on the internal struct.
- [x] Most fields should be `+optional` and `omitempty`. Add where necessary. ~Where omitted, explicitly comment.~ Edit: We should not distinguish between nil and empty, see below items.
- [x] Ensure defaults are specified via `pkg/kubelet/apis/kubelet.config.k8s.io/v1beta1/defaults.go`, not `cmd/kubelet/app/options/options.go`.
  - [x] #57770
- [x] Ensure kubeadm does not persist v1alpha1 KubeletConfiguration objects (or feature-gates this functionality)
- [x] Don't make a distinction between empty and nil, because of #43203.
  - [x] #59515
  - [x] #59681
- [x] Take the opportunity to fix insecure Kubelet defaults @tallclair 
  - [x] #59666
- [x] Remove CAdvisorPort from KubeletConfiguration wrt #56523.
  - [x] #59580
- [x] Hide `ConfigTrialDuration` until we're more sure what to do with it.
   - [x] #59628
- [x] Fix `// default: x` comments after rebasing on recent changes.
2018-02-15 11:06:40 -08:00
Michael Taufen d8cc440dd6 Rename ConfigOK to KubeletConfigOk
This is a more accurate name for the condition, as it describes the
status of the Kubelet's configuration.

Also cleans up capitalization of internal names.
2018-02-14 19:36:52 -08:00
Michael Taufen 9ebaf5e7d2 Move the kubeletconfig v1alpha1 API to beta, rename to kubelet.config.k8s.io 2018-02-14 17:30:22 -08:00
Kubernetes Submit Queue 24be88a8c5
Merge pull request #59511 from derekwaynecarr/hugepage-e2e
Automatic merge from submit-queue (batch tested with PRs 59653, 58812, 59582, 59665, 59511). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add node e2e tests to verify HugePages feature

**What this PR does / why we need it**:
Add node e2e tests to verify HugePages feature.

**Special notes for your reviewer**:
Test follows same pattern as pod container manager tests.

**Release note**:
```release-note
NONE
```
2018-02-13 11:12:54 -08:00
Kubernetes Submit Queue 9de5839944
Merge pull request #59681 from mtaufen/kc-empty-eviction-hard
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Ignore 0% and 100% eviction thresholds

Primarily, this gives a way to explicitly disable eviction, which is
necessary to use omitempty on EvictionHard.
See: https://github.com/kubernetes/kubernetes/pull/53833#discussion_r166672137

As justification for this approach, neither 0% nor 100% make sense as
eviction thresholds; in the "less-than" case, you can't have less than
0% of a resource and 100% perpetually evicts; in the
"greater-than" case (assuming we ever add a resource with this
semantic), the reasoning is the reverse (not more than 100%, 0%
perpetually evicts).

```release-note
Eviction thresholds set to 0% or 100% are now ignored.
```
2018-02-13 09:48:11 -08:00
Michael Taufen 21dbbe14f2 Ignore 0% and 100% eviction thresholds
Primarily, this gives a way to explicitly disable eviction, which is
necessary to use omitempty on EvictionHard.
See: https://github.com/kubernetes/kubernetes/pull/53833#discussion_r166672137

As justification for this approach, neither 0% nor 100% make sense as
eviction thresholds; in the "less-than" case, you can't have less than
0% of a resource and 100% perpetually evicts; in the
"greater-than" case (assuming we ever add a resource with this
semantic), the reasoning is the reverse (not more than 100%, 0%
perpetually evicts).
2018-02-12 14:13:00 -08:00
Wang Guoliang 31aad75316 more concise to merge the array 2018-02-11 21:27:11 +08:00
Kubernetes Submit Queue 317853c90c
Merge pull request #59464 from dixudx/fix_all_typos
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

fix all the typos across the project

**What this PR does / why we need it**:
There are lots of typos across the project. We should avoid small PRs on fixing those annoying typos, which is time-consuming and low efficient.

This PR does fix all the typos across the project currently. And with #59463, typos could be avoided when a new PR gets merged.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:
/sig testing
/area test-infra
/sig release
/cc @ixdy 
/assign @fejta 

**Release note**:

```release-note
None
```
2018-02-10 22:12:45 -08:00
Di Xu 48388fec7e fix all the typos across the project 2018-02-11 11:04:14 +08:00
Michael Taufen 9c8eab96d0 Bury KubeletConfiguration.ConfigTrialDuration for now 2018-02-08 21:41:21 -08:00
Derek Carr 6e0a52e7ff Add node e2e to verify hugepages feature 2018-02-08 10:37:04 -05:00
Kubernetes Submit Queue fb340a4695
Merge pull request #57824 from thockin/gcr-vanity
Automatic merge from submit-queue (batch tested with PRs 57824, 58806, 59410, 59280). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

2nd try at using a vanity GCR name

The 2nd commit here is the changes relative to the reverted PR.  Please focus review attention on that.

This is the 2nd attempt.  The previous try (#57573) was reverted while we
figured out the regional mirrors (oops).
    
New plan: k8s.gcr.io is a read-only facade that auto-detects your source
region (us, eu, or asia for now) and pulls from the closest.  To publish
an image, push k8s-staging.gcr.io and it will be synced to the regionals
automatically (similar to today).  For now the staging is an alias to
gcr.io/google_containers (the legacy URL).
    
When we move off of google-owned projects (working on it), then we just
do a one-time sync, and change the google-internal config, and nobody
outside should notice.
    
We can, in parallel, change the auto-sync into a manual sync - send a PR
to "promote" something from staging, and a bot activates it.  Nice and
visible, easy to keep track of.

xref https://github.com/kubernetes/release/issues/281

TL;DR:
  *  The new `staging-k8s.gcr.io` is where we push images.  It is literally an alias to `gcr.io/google_containers` (the existing repo) and is hosted in the US.
  * The contents of `staging-k8s.gcr.io` are automatically synced to `{asia,eu,us)-k8s.gcr.io`.
  * The new `k8s.gcr.io` will be a read-only alias to whichever regional repo is closest to you.
  * In the future, images will be promoted from `staging` to regional "prod" more explicitly and auditably.

 ```release-note
Use "k8s.gcr.io" for pulling container images rather than "gcr.io/google_containers".  Images are already synced, so this should not impact anyone materially.
    
Documentation and tools should all convert to the new name. Users should take note of this in case they see this new name in the system.
```
2018-02-08 03:29:32 -08:00
Tim Hockin 3586986416 Switch to k8s.gcr.io vanity domain
This is the 2nd attempt.  The previous was reverted while we figured out
the regional mirrors (oops).

New plan: k8s.gcr.io is a read-only facade that auto-detects your source
region (us, eu, or asia for now) and pulls from the closest.  To publish
an image, push k8s-staging.gcr.io and it will be synced to the regionals
automatically (similar to today).  For now the staging is an alias to
gcr.io/google_containers (the legacy URL).

When we move off of google-owned projects (working on it), then we just
do a one-time sync, and change the google-internal config, and nobody
outside should notice.

We can, in parallel, change the auto-sync into a manual sync - send a PR
to "promote" something from staging, and a bot activates it.  Nice and
visible, easy to keep track of.
2018-02-07 21:14:19 -08:00
Kubernetes Submit Queue cf7073a831
Merge pull request #58973 from verb/cri-enum
Automatic merge from submit-queue (batch tested with PRs 59276, 51042, 58973, 59377, 59472). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Update Container Runtime Interface to use enumerated namespace modes

**What this PR does / why we need it**: This updates the CRI as described in the [Shared PID Namespace](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/pod-pid-namespace.md#container-runtime-interface-changes) proposal. This change to the alpha API is not backwards compatible: implementations of the CRI will need to update to the new API version.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
WIP #1615

**Special notes for your reviewer**:
/assign @yujuhong 

**Release note**:

```release-note
[action-required] The Container Runtime Interface (CRI) version has increased from v1alpha1 to v1alpha2. Runtimes implementing the CRI will need to update to the new version, which configures container namespaces using an enumeration rather than booleans.
```
2018-02-07 12:00:47 -08:00
Lee Verberne e10042d22f Increment CRI version from v1alpha1 to v1alpha2
This also incorporates the version string into the package name so
that incompatibile versions will fail to connect.

Arbitrary choices:
- The proto3 package name is runtime.v1alpha2. The proto compiler
  normally translates this to a go package of "runtime_v1alpha2", but
  I renamed it to "v1alpha2" for consistency with existing packages.
- kubelet/apis/cri is used as "internalapi". I left it alone and put the
  public "runtimeapi" in kubelet/apis/cri/runtime.
2018-02-07 09:06:26 +01:00
hangaoshuai 5fac1dd6c9 remove a todo which is out of date 2018-02-07 09:31:15 +08:00
hangaoshuai 7898534720 fix todo:Move function readinessCheck to util 2018-02-05 20:16:05 +08:00
Kubernetes Submit Queue 7fb445c92d
Merge pull request #59094 from jianglingxia/jlx-013114
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

reopen #58913 Fix TODO move GetPauseImageNameForHostArch func

**What this PR does / why we need it**:
reopen #58913 Fix TODO move GetPauseImageNameForHostArch func,because of I squash to a single commit wrong,so recommit one,and close the #58913 
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:
/assign @liggitt 
**Release note**:

```release-note
NONE
```
2018-02-03 22:36:09 -08:00
Kubernetes Submit Queue 7e33ffba10
Merge pull request #58728 from dashpole/cadvisor_testing
Automatic merge from submit-queue (batch tested with PRs 57683, 59116, 58728, 59140, 58976). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Use node-e2e framework for testing cadvisor

**What this PR does / why we need it**:
With cadvisor checked out in your gopath, we can now run cadvisor integration tests: `make test-e2e-node TEST_SUITE=cadvisor`.  
This has a number of advantages:
 * we can use the same images to test both, configured the same way.
 * we will now get cadvisor logs from the integration test.
 * we can now use the familiar node-e2e arguments to specify images to test with cadvisor
 * no more managing snowflake VMs for cadvisor.

**Special notes for your reviewer**:
cadvisor doesnt currently produce junit* files, so I removed that as a requirement.
This wont actually work until https://github.com/google/cadvisor/pull/1868 is merged as well.

Related issue:
https://github.com/kubernetes/test-infra/issues/190

**Release note**:
```release-note
NONE
```

/assign @Random-Liu 
/sig node
/priority important-soon
/kind cleanup
2018-02-01 07:04:40 -08:00
Kubernetes Submit Queue f96ac05774
Merge pull request #59062 from mtaufen/fix-pod-pids-limit
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix PodPidsLimit and ConfigTrialDuration on internal KubeletConfig type

They should both follow the convention of not being a pointer on the internal type. 

This required adding a conversion function between `int64` and `*int64`. A side effect is this removes a warning in the generated code for the apps API group.

@dims

```release-note
NONE
```
2018-02-01 01:45:55 -08:00
xiangpengzhao abdfa7f35f Update test framework featuregates type. 2018-02-01 17:15:12 +08:00
David Ashpole 17e8d8c040 use node-e2e framework for testing cadvisor 2018-01-31 10:14:54 -08:00
jianglingxia 76e90061a2 reopen #58913 Fix TODO move GetPauseImageNameForHostArch func 2018-01-31 15:06:32 +08:00
Kubernetes Submit Queue 84408378f9
Merge pull request #58174 from filbranden/ipcs1
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fixes for HostIPC tests to work when Docker has SELinux support enabled.

**What this PR does / why we need it**:

Fixes for HostIPC tests to work when Docker has SELinux support enabled.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:

N/A

**Special notes for your reviewer**:

The core of the matter is to use `ipcs` from util-linux rather than the one from busybox. The typical SELinux policy has enough to allow Docker containers (running under svirt_lxc_net_t SELinux type) to access IPC information by reading the contents of the files under /proc/sysvipc/, but not by using the shmctl etc. syscalls.

The `ipcs` implementation in busybox will use `shmctl(0, SHM_INFO, ...)` to detect whether it can read IPC info (see source code [here](https://git.busybox.net/busybox/tree/util-linux/ipcs.c?h=1_28_0#n138)), while the one in util-linux will prefer to read from the /proc files directly if they are available (see source code [here](https://github.com/karelzak/util-linux/blob/v2.27.1/sys-utils/ipcutils.c#L108)).

It turns out the SELinux policy doesn't allow the shmctl syscalls in an unprivileged container, while access to it through the /proc interface is fine. (One could argue this is a bug in the SELinux policy, but getting it fixed on stable OSs is hard, and it's not that hard for us to test it with an util-linux `ipcs`, so I propose we do so.)

This PR also contains a refactor of the code setting IpcMode, since setting it in the "common options" function is misleading, as on containers other than the sandbox, it ends up always getting overwritten, so let's only set it to "host" in the Sandbox.

It also has a minor fix for the `ipcmk` call, since support for size suffix was only introduced in recent versions of it.

**Release note**:

```release-note
NONE
```
2018-01-30 17:18:52 -08:00
Michael Taufen da41a6e793 Fix PodPidsLimit and ConfigTrialDuration on internal KubeletConfig type
They should both follow the convention of not being a pointer on the
internal type. This required adding a conversion function between
`int64` and `*int64`.

A side effect is this removes a warning in the generated code for the
apps API group.
2018-01-30 11:43:41 -08:00
Kubernetes Submit Queue 9fa96264f9
Merge pull request #58996 from Random-Liu/enable-feature-gate-by-default
Automatic merge from submit-queue (batch tested with PRs 57467, 58996). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Set generate-kubelet-config-file to true by default.

This should fix the flaky suite.
https://k8s-testgrid.appspot.com/sig-node-kubelet#kubelet-flaky-gce-e2e

@mtaufen /cc @kubernetes/sig-node-bugs 

Signed-off-by: Lantao Liu <lantaol@google.com>



**What this PR does / why we need it**:

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
none
```
2018-01-29 19:03:35 -08:00
Kubernetes Submit Queue fc0e07465f
Merge pull request #57467 from dashpole/move_eviction_tests
Automatic merge from submit-queue (batch tested with PRs 57467, 58996). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Remove flaky label from Eviction tests

**What this PR does / why we need it**:
All eviction tests in the flaky suite are no longer flaky.  Remove the flaky label to move them from the flaky suite to the serial suite.
I removed the QoS-based memory eviction test since it does not reflect the current eviction strategy.

**Release note**:
```release-note
NONE
```
/assign @mtaufen @Random-Liu 
/sig node
/priority important-soon
/kind cleanup
2018-01-29 19:03:32 -08:00
Kubernetes Submit Queue f519cba47f
Merge pull request #58980 from Random-Liu/fix-qps-set
Automatic merge from submit-queue (batch tested with PRs 58899, 58980). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix setting qps in density test.

Current QPS setting code doesn't work. All the density tests with higher QPS are failing. https://k8s-testgrid.appspot.com/sig-node-kubelet#kubelet-benchmark-gce-e2e

We should use existing helper function `tempSetCurrentKubeletConfig` to set QPS.

@kubernetes/sig-node-bugs @mtaufen 

**Release note**:

```release-note
none
```
2018-01-29 16:45:34 -08:00
Lantao Liu e7531ca6c8 Set generate-kubelet-config-file to true by default.
Signed-off-by: Lantao Liu <lantaol@google.com>
2018-01-30 00:20:35 +00:00
Kubernetes Submit Queue 85e435d35a
Merge pull request #58777 from filbranden/nnp1
Automatic merge from submit-queue (batch tested with PRs 58777, 58978, 58977, 58775). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Skip NoNewPrivileges test when SELinux is enabled

**What this PR does / why we need it**:

A bug in the SELinux policy prevented NoNewPrivileges from working on Docker with SELinux support enabled.

The problem has been fixed upstream (see projectatomic/container-selinux#45)

But hasn't been backported yet (a fix might come in RHEL 7.5)

For now, let's skip the NoNewPrivileges test when SELinux support is enabled in Docker.

Tested:

- Before this commit, the test fails:
```
    $ make test-e2e-node REMOTE=true FOCUS="allow privilege escalation"
    (on a host with SELinux enabled)

    • [SLOW TEST:22.798 seconds] (passed)
    [k8s.io] Security Context
      when creating containers with AllowPrivilegeEscalation
        should allow privilege escalation when true

    • Failure [16.539 seconds]
    [k8s.io] Security Context
      when creating containers with AllowPrivilegeEscalation
        should not allow privilege escalation when false [It]

        wait for pod "alpine-nnp-false-aef03e47-0090-11e8-886f-42010af00009" to success
        Expected success, but got an error:
            <*errors.errorString | 0xc4204e26d0>: {
                s: "pod \"alpine-nnp-false-aef03e47-0090-11e8-886f-42010af00009\" failed with reason: \"\", message: \"\"",
            }
            pod "alpine-nnp-false-aef03e47-0090-11e8-886f-42010af00009" failed with reason: "", message: ""

    • [SLOW TEST:26.572 seconds] (passed)
    [k8s.io] Security Context
      when creating containers with AllowPrivilegeEscalation
        should allow privilege escalation when not explicitly set and uid != 0

    Ran 3 of 257 Specs in 45.364 seconds
    FAIL! -- 2 Passed | 1 Failed | 0 Pending | 254 Skipped

    Ginkgo ran 1 suite in 49.389123442s
    Test Suite Failed
```
- After this commit, the test is skipped:
```
    $ make test-e2e-node REMOTE=true FOCUS="allow privilege escalation"
    (on a host with SELinux enabled)

    S [SKIPPING] in Spec Setup (BeforeEach) [12.452 seconds]
    S [SKIPPING] in Spec Setup (BeforeEach) [16.298 seconds]
    S [SKIPPING] in Spec Setup (BeforeEach) [18.183 seconds]

    Ran 0 of 257 Specs in 39.174 seconds
    SUCCESS! -- 0 Passed | 0 Failed | 0 Pending | 257 Skipped

    Ginkgo ran 1 suite in 43.570630357s
    Test Suite Passed
```
- No changes when SELinux is disabled:
```
    $ make test-e2e-node REMOTE=true FOCUS="allow privilege escalation"
    (on a host with SELinux disabled)

    • [SLOW TEST:15.013 seconds]
    [k8s.io] Security Context
      when creating containers with AllowPrivilegeEscalation
        should not allow privilege escalation when false

    • [SLOW TEST:19.155 seconds]
    [k8s.io] Security Context
      when creating containers with AllowPrivilegeEscalation
        should allow privilege escalation when true

    • [SLOW TEST:21.087 seconds]
    [k8s.io] Security Context
      when creating containers with AllowPrivilegeEscalation
        should allow privilege escalation when not explicitly set and uid != 0

    Ran 3 of 259 Specs in 38.560 seconds
    SUCCESS! -- 3 Passed | 0 Failed | 0 Pending | 256 Skipped

    Ginkgo ran 1 suite in 41.937918928s
    Test Suite Passed
```




**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
N/A

**Special notes for your reviewer**:
N/A

**Release note**:

```release-note
NONE
```
2018-01-29 14:59:36 -08:00
Lantao Liu 3d51577327 Fix setting qps in density test. 2018-01-29 19:41:31 +00:00
Kubernetes Submit Queue 5792214647
Merge pull request #58760 from mtaufen/kc-remove-kubeletconfigfile-gate
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Removal of KubeletConfigFile feature gate: Step 1

This feature gate was redundant with the `--config` flag, which already
enables/disables loading Kubelet config from a file.

Since the gate guarded an alpha feature, removing it is not a violation
of our API guidelines.

Some stuff in `kubernetes/test-infra` currently sets the gate,
so removing will be a 3 step process:
1. This PR, which makes the gate a no-op.
2. Stop setting the gate in `kubernetes/test-infra`.
3. Completely remove the gate (this PR will get the release note).

```release-note
NONE
```
2018-01-26 14:35:25 -08:00
Filipe Brandenburger 46a83c2883 Use ipc-utils container in HostIPC tests.
This ensures the `ipcs` command from util-linux will be used, which
succeeds when Docker is running with SELinux enabled (while the one from
busybox fails.)

Tested: On a host with Docker running with SELinux enabled:

  $ make test-e2e-node REMOTE=true FOCUS="host IPC"

  • [SLOW TEST:17.272 seconds] (passed)
  [k8s.io] Security Context
    when creating a pod in the host IPC namespace
      should show the shared memory ID in the host IPC containers

  • [SLOW TEST:20.419 seconds] (passed)
  [k8s.io] Security Context
    when creating a pod in the host IPC namespace
      should not show the shared memory ID in the non-hostIPC containers

  Ran 2 of 257 Specs in 43.934 seconds
  SUCCESS! -- 2 Passed | 0 Failed | 0 Pending | 255 Skipped
2018-01-25 11:09:16 -08:00
Filipe Brandenburger 67869273a8 Don't assume ipcmk command supports size suffix.
Expand the use of "1M" to the corresponding number of bytes, since
support for size suffix was only added to `ipcmk` in util-linux 2.27
which is not yet available in some Linux distributions.

Tested by running `make test-e2e-node` against distributions with ipcmk
that supports and doesn't support the suffix syntax, all of them passed.
2018-01-25 11:09:16 -08:00
Filipe Brandenburger 6d30b026ba Skip NoNewPrivileges test when SELinux is enabled
A bug in the SELinux policy prevented NoNewPrivileges from working on
Docker with SELinux support enabled.

The problem has been fixed upstream:
https://github.com/projectatomic/container-selinux/issues/45

But hasn't been backported yet (a fix might come in RHEL 7.5)

For now, let's skip the NoNewPrivileges test when SELinux support is
enabled in Docker.

Tested:

- Before this commit, the test fails:

    $ make test-e2e-node REMOTE=true FOCUS="allow privilege escalation"
    (on a host with SELinux enabled)

    • [SLOW TEST:22.798 seconds] (passed)
    [k8s.io] Security Context
      when creating containers with AllowPrivilegeEscalation
        should allow privilege escalation when true

    • Failure [16.539 seconds]
    [k8s.io] Security Context
      when creating containers with AllowPrivilegeEscalation
        should not allow privilege escalation when false [It]

        wait for pod "alpine-nnp-false-aef03e47-0090-11e8-886f-42010af00009" to success
        Expected success, but got an error:
            <*errors.errorString | 0xc4204e26d0>: {
                s: "pod \"alpine-nnp-false-aef03e47-0090-11e8-886f-42010af00009\" failed with reason: \"\", message: \"\"",
            }
            pod "alpine-nnp-false-aef03e47-0090-11e8-886f-42010af00009" failed with reason: "", message: ""

    • [SLOW TEST:26.572 seconds] (passed)
    [k8s.io] Security Context
      when creating containers with AllowPrivilegeEscalation
        should allow privilege escalation when not explicitly set and uid != 0

    Ran 3 of 257 Specs in 45.364 seconds
    FAIL! -- 2 Passed | 1 Failed | 0 Pending | 254 Skipped

    Ginkgo ran 1 suite in 49.389123442s
    Test Suite Failed

- After this commit, the test is skipped:

    $ make test-e2e-node REMOTE=true FOCUS="allow privilege escalation"
    (on a host with SELinux enabled)

    S [SKIPPING] in Spec Setup (BeforeEach) [12.452 seconds]
    S [SKIPPING] in Spec Setup (BeforeEach) [16.298 seconds]
    S [SKIPPING] in Spec Setup (BeforeEach) [18.183 seconds]

    Ran 0 of 257 Specs in 39.174 seconds
    SUCCESS! -- 0 Passed | 0 Failed | 0 Pending | 257 Skipped

    Ginkgo ran 1 suite in 43.570630357s
    Test Suite Passed

- No changes when SELinux is disabled:

    $ make test-e2e-node REMOTE=true FOCUS="allow privilege escalation"
    (on a host with SELinux disabled)

    • [SLOW TEST:15.013 seconds]
    [k8s.io] Security Context
      when creating containers with AllowPrivilegeEscalation
        should not allow privilege escalation when false

    • [SLOW TEST:19.155 seconds]
    [k8s.io] Security Context
      when creating containers with AllowPrivilegeEscalation
        should allow privilege escalation when true

    • [SLOW TEST:21.087 seconds]
    [k8s.io] Security Context
      when creating containers with AllowPrivilegeEscalation
        should allow privilege escalation when not explicitly set and uid != 0

    Ran 3 of 259 Specs in 38.560 seconds
    SUCCESS! -- 3 Passed | 0 Failed | 0 Pending | 256 Skipped

    Ginkgo ran 1 suite in 41.937918928s
    Test Suite Passed
2018-01-25 09:11:22 -08:00
Connor Doyle e5667cf426 Rename package deviceplugin => devicemanager. 2018-01-24 22:32:43 -08:00
Michael Taufen 6443b6f543 Removal of KubeletConfigFile feature gate: Step 1
This feature gate was redundant with the `--config` flag, which already
enables/disables loading Kubelet config from a file.

Since the gate guarded an alpha feature, removing it is not a violation
of our API guidelines.

Some stuff in `kubernetes/test-infra` currently sets the gate,
so removing will be a 3 step process:
1. This PR, which makes the gate a no-op.
2. Stop setting the gate in `kubernetes/test-infra`.
3. Completely remove the gate.
2018-01-24 10:19:15 -08:00
Filipe Brandenburger e98ba5021e Skip log path tests when they are expected to fail.
The log path test is not expected to pass unless the Docker is using the
JSON logging driver, since that's what the log path is trying to find.
When Docker is using the journald logging driver, there will be no JSON
files in the logging directories for it to find.

Furthermore, when SELinux support is enabled in the Docker daemon,
SELinux will prevent processes running inside Docker containers from
accessing the log files owned by Docker (which is what this test is
trying to accomplish), so let's also skip this test in case SELinux
support is enabled.

Tested:

- With Docker daemon started using --log-driver=journald:

    S [SKIPPING] in Spec Setup (BeforeEach) [8.193 seconds]
    [k8s.io] ContainerLogPath
      Pod with a container
        printed log to stdout
          should print log to correct log path [BeforeEach]
          Jan  3 18:33:44.869: Skipping because Docker daemon is using a logging driver other than "json-file": journald

- With Docker daemon started using --selinux-enabled:

    S [SKIPPING] in Spec Setup (BeforeEach) [8.488 seconds]
    [k8s.io] ContainerLogPath
      Pod with a container
        printed log to stdout
          should print log to correct log path [BeforeEach]
          Jan  3 18:35:58.909: Skipping because Docker daemon is running with SELinux support enabled

- With Docker started using JSON logging driver and with SELinux disabled:

    • [SLOW TEST:16.352 seconds]  (passed)
    [k8s.io] ContainerLogPath
      Pod with a container
        printed log to stdout
          should print log to correct log path
    Ran 1 of 256 Specs in 36.428 seconds
    SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 255 Skipped
2018-01-19 10:51:13 -08:00
Kubernetes Submit Queue bda841fa7b
Merge pull request #58323 from miaoyq/benchmark-non-docker-specific
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Benchmark test non docker specific

**What this PR does / why we need it**:
This will make benchmark test generic to all container runtimes

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #58321

**Special notes for your reviewer**:

**Release note**:

```release-note
none
```
/cc @Random-Liu
2018-01-17 14:17:26 -08:00
Kubernetes Submit Queue d72631b6da
Merge pull request #57344 from balajismaniam/cpuman-test-del-state-file
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix policy conflict in the CPU manager node e2e test.

**What this PR does / why we need it**:
After graduation of the CPU manager feature to Beta, the CPU manager `none` policy is ON by default. But when the CPU manager is set to use `static` policy in the node e2e test, there will always be a conflict with the policy checkpointed in the disk. This PR fixes that by deleting the state file where required. 

Manually tested in an `n1-standard-4` instance with `Ubuntu 16.04` image on GCP, which is the same machine and image type as one of the configs used in the node e2e tests. 

Use the following command to run the test locally:
`make test-e2e-node TEST_ARGS='--feature-gates=DynamicKubeletConfig=true' FOCUS="CPU Manager" SKIP="" PARALLELISM=1`

CC @ConnorDoyle @derekwaynecarr
2018-01-16 15:02:10 -08:00
Kubernetes Submit Queue a1921f665b
Merge pull request #56232 from ConnorDoyle/add-balaji-connor-node-e2e-approvers
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add balajismaniam and ConnorDoyle to node-e2e approvers.

**What this PR does / why we need it**:

- Add balajismaniam and ConnorDoyle to node-e2e approvers.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:
_Rationale:_ We are maintaining node e2e tests for the CPU manager component, and would also like to help with the rest of review load in this package. Both Balaji and I are approvers for the cpumanager and cpuset packages in the Kubelet container manager.

**Release note**:
```release-note
NONE
```
2018-01-16 11:41:10 -08:00
David Ashpole a436a3fe26 remove flaky label from eviction tests 2018-01-16 11:22:17 -08:00
Yanqiang Miao 3660563e22 Benchmark non docker specific
Signed-off-by: Yanqiang Miao <miao.yanqiang@zte.com.cn>
2018-01-16 15:38:35 +08:00
Da K. Ma 9a78753144 Updated PID pressure node condition.
Signed-off-by: Da K. Ma <madaxa@cn.ibm.com>
2018-01-14 18:26:00 +08:00
Mike Danese 1e2b644260 cluster: move logging library to hack/
it's used once in cluster and used a bunch in hack/ and build/
2018-01-13 16:37:50 -08:00
Kubernetes Submit Queue 3107fb4f0d
Merge pull request #58193 from Random-Liu/fix-resource-collector-panic
Automatic merge from submit-queue (batch tested with PRs 58216, 58193, 53033, 58219, 55921). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Use GinkgoRecover to avoid panic.

See this in the test:
```
I0111 14:28:31.010] panic: 
I0111 14:28:31.010] Your test failed.
I0111 14:28:31.010] Ginkgo panics to prevent subsequent assertions from running.
I0111 14:28:31.011] Normally Ginkgo rescues this panic so you shouldn't see it.
I0111 14:28:31.011] 
I0111 14:28:31.011] But, if you make an assertion in a goroutine, Ginkgo can't capture the panic.
I0111 14:28:31.011] To circumvent this, you should call
I0111 14:28:31.011] 
I0111 14:28:31.012] 	defer GinkgoRecover()
I0111 14:28:31.012] 
I0111 14:28:31.012] at the top of the goroutine that caused this panic.
I0111 14:28:31.012] 
I0111 14:28:31.012] 
I0111 14:28:31.012] goroutine 1028 [running]:
I0111 14:28:31.013] k8s.io/kubernetes/vendor/github.com/onsi/ginkgo.Fail(0xc421098000, 0xb0, 0xc420da24c8, 0x1, 0x1)
I0111 14:28:31.013] 	/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/github.com/onsi/ginkgo/ginkgo_dsl.go:255 +0xda
I0111 14:28:31.014] k8s.io/kubernetes/vendor/github.com/onsi/gomega/internal/assertion.(*Assertion).match(0xc4220bd700, 0x9e897e0, 0xa123640, 0x0, 0x0, 0x0, 0x0, 0xa123640)
I0111 14:28:31.014] 	/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/github.com/onsi/gomega/internal/assertion/assertion.go:69 +0x1ef
I0111 14:28:31.014] k8s.io/kubernetes/vendor/github.com/onsi/gomega/internal/assertion.(*Assertion).NotTo(0xc4220bd700, 0x9e897e0, 0xa123640, 0x0, 0x0, 0x0, 0xc4220bd700)
I0111 14:28:31.015] 	/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/github.com/onsi/gomega/internal/assertion/assertion.go:43 +0xae
I0111 14:28:31.015] k8s.io/kubernetes/test/e2e_node.deletePodsSync.func1(0xc421485220, 0xc421321680, 0xc421517180)
I0111 14:28:31.015] 	/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e_node/resource_collector.go:382 +0x320
I0111 14:28:31.015] created by k8s.io/kubernetes/test/e2e_node.deletePodsSync
I0111 14:28:31.016] 	/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e_node/resource_collector.go:375 +0x9e
```

e.g.: https://storage.googleapis.com/kubernetes-jenkins/logs/ci-cri-containerd-node-e2e-serial/17/build-log.txt

**Release note**:

```release-note
none
```
2018-01-13 03:34:50 -08:00
Kubernetes Submit Queue 6d55ffbf84
Merge pull request #54792 from ScorpioCPH/add-stub-device-plugin-for-e2e
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add stub device plugin for conformance e2e test

**What this PR does / why we need it**:

Add stub device plugin for conformance e2e test
- extend [device_plugin_stub](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cm/deviceplugin/device_plugin_stub.go) to support e2e test
- add test suite with this device-plugin-stub
- simulate more use cases by deploying some pods to request these resources

**Which issue this PR fixes**:

fixes #52861

**Special notes for your reviewer**:

@vishh @jiayingz PTAL.

**Release note**:

```release-note
None
```
2018-01-12 04:05:55 -08:00
Kubernetes Submit Queue a97ce942e9
Merge pull request #57812 from ScorpioCPH/double-check-set-kubelet-config
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

[Alpha: DynamicKubeletConfig] Double check before setKubeletConfiguration

**What this PR does / why we need it**:

Double check the `newCfg` is not equal to the `oldCfg` before we call `setKubeletConfiguration(newCfg)` in `tempSetCurrentKubeletConfig()`.

**Which issue(s) this PR fixes**:
Fixes https://github.com/kubernetes/kubernetes/issues/57701

**Special notes for your reviewer**:

/area kubelet
/sig node
/assign @mtaufen 
/cc @vishh @jiayingz @derekwaynecarr @dchen1107 @liggitt 
PTAL, Thanks!

**Release note**:

```release-note
NONE
```
2018-01-12 00:00:37 -08:00
Lantao Liu 4d6817dd71 Use GinkgoRecover to avoid panic. 2018-01-12 06:51:29 +00:00
Penghao Cen 671c4eb2b7 Add e2e test logic for device plugin 2018-01-11 14:41:45 +08:00
Balaji Subramaniam 032fa206af Fix policy conflict in the CPU manager node e2e test. 2018-01-10 09:38:00 -08:00
Penghao Cen 386c077dc6 Move common functions together 2018-01-10 09:47:05 +08:00
Kubernetes Submit Queue ecd525d8aa
Merge pull request #57976 from Random-Liu/node-e2e-non-docker-specific
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Node e2e non docker specific

Fixes https://github.com/kubernetes/kubernetes/issues/57977.

Make node e2e test generic to container runtimes.

With this change, other than tests with `[Feature:Docker]`, all tests can run against all CRI container runtimes.

Note that this PR also marks cpu manager test as `Serial`, because it restarts kubelet during the test. It doesn't cause problem in regular node e2e suite today, because it is skipped if node has less than 2 CPUs, which is the case for our test environment. /cc @balajismaniam 

@yujuhong @mrunalp @feiskyer 
/cc @dashpole @balajismaniam @bprashanth Because I addressed your comments.
/cc @kubernetes/sig-node-pr-reviews 
**Release note**:

```release-note
none
```
2018-01-09 17:26:40 -08:00
Lantao Liu e05a5b9f7a Remove unnecessary docker specific logic in node e2e test. 2018-01-09 22:59:17 +00:00
Lantao Liu f64c508e2e Add getCRIClient and set default values for CRI related flags 2018-01-09 22:59:17 +00:00
Michael Taufen 5caf26fa84 Move some old security controls to KubeletFlags and mark them deprecated 2018-01-09 10:18:36 -08:00
Kubernetes Submit Queue 2d2fa02337
Merge pull request #57638 from mtaufen/kc-e2e-node-file
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

e2e node framework can generate a base kubelet config file

Fixes #57980

This allows the e2e node test framework to generate a Kubelet config file containing the defaults it would typically pass to a test via flags, rather than passing these defaults as flags.

```release-note
NONE
```
2018-01-08 20:38:22 -08:00
Joe Betz 6a0c69e971 Fix build and test errors from etcd 3.2.13 upgrade 2018-01-07 08:22:08 -08:00
Yanqiang Miao 19fb0da059 Make sure is not nil
Signed-off-by: Yanqiang Miao <miao.yanqiang@zte.com.cn>
2018-01-06 11:33:11 +08:00
Michael Taufen 90814b3a97 e2e node framework can generate a base kubelet config file
This allows the e2e node test framework to generate a kubelet config
file containing the defaults it would typically pass to a test via
flags, rather than passing these defaults as flags.
2018-01-04 16:47:04 -08:00
Michael Taufen dd74a39700 Make ConfigOK status messages more human readable by including the API path to the object instead of the UID 2018-01-03 16:05:43 -08:00
Penghao Cen 6cf819165f Double check before setKubeletConfiguration 2018-01-03 16:58:12 +08:00
Jeff Grafton efee0704c6 Autogenerate BUILD files 2017-12-23 13:12:11 -08:00
Tim Hockin e9dd8a68f6 Revert k8s.gcr.io vanity domain
This reverts commit eba5b6092a.

Fixes https://github.com/kubernetes/kubernetes/issues/57526
2017-12-22 14:36:16 -08:00
Kubernetes Submit Queue 8f8579d439
Merge pull request #57392 from chrisglass/gke-spec-tweaks
Automatic merge from submit-queue (batch tested with PRs 57532, 57392). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Gke spec tweaks

**What this PR does / why we need it**:
This PR removes two unnecessary requirements for the GKE node image validation spec:

- The "vim" package doesn't need to be installed, as a vim environment is already available and the full "vim" pacakge installation takes some precious disk space.
- The linux headers are not needed for the kubernetes node and cluster tests to succeed, and again, take unnecessary disk space.

**Special notes for your reviewer**:
None.

**Release note**:

```release-note
NONE
```
2017-12-21 20:00:34 -08:00
Kubernetes Submit Queue fb4dc98016
Merge pull request #57532 from mtaufen/fix-hyphen-dkcfg-test
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

dynamic config test: use a hyphen between the config name and the unique suffix

These are painful to read right now due to the lack of hyphen.

```release-note
NONE
```
2017-12-21 19:27:31 -08:00
Michael Taufen 33d4fe4074 dynamic config test: use a hyphen between the config name and the unique suffix 2017-12-21 16:05:50 -08:00
Michael Taufen 6ee191ab74 Refactor kubelet config controller bootstrap process
This makes the bootstrap feel much more linear and as a result it is
easier to read.

Also simplifies status reporting for local config.
2017-12-21 15:24:56 -08:00
Kubernetes Submit Queue 2a1cdfffaa
Merge pull request #57221 from mtaufen/kc-event
Automatic merge from submit-queue (batch tested with PRs 57434, 57221, 57417, 57474, 57481). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Send an event just before the Kubelet restarts to use a new config

**What this PR does / why we need it**:
This PR makes the Kubelet send events for configuration changes. This makes it much easier to see a recent history of configuration changes for the Kubelet. 

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #56895

**Special notes for your reviewer**:

**Release note**:
```release-note
NONE
```

/cc @dchen1107 @liggitt @dashpole
2017-12-20 17:42:37 -08:00
Kubernetes Submit Queue 09e5ce3056
Merge pull request #57434 from yguo0905/fix-gke-env-test
Automatic merge from submit-queue (batch tested with PRs 57434, 57221, 57417, 57474, 57481). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Node e2e test: do not return error if Docker's check-config.sh fails

**What this PR does / why we need it**:

This PR fixes the issue in https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2enode-ubuntudev2-k8sbeta-gkespec/1#k8sio-gke-system-requirements-conformance-featuregkeenv-the-docker-configuration-validation-should-pass

https://github.com/moby/moby/pull/28007 changed the Docker's check-config.sh to return a non-zero exit code if the check fails. But we expect certain configs to be missing, so we should ignore the exit code.

This needs to be cherry-picked into 1.9.

**Release note**:

```release-note
None
```

/area node-e2e
/kind bug
/assign @dchen1107
2017-12-20 17:42:34 -08:00
Michael Taufen d5d7d6d684 Send an event just before the Kubelet restarts to use a new config 2017-12-20 13:02:55 -08:00
Kubernetes Submit Queue 666b95743a
Merge pull request #54278 from foxyriver/fix-exit
Automatic merge from submit-queue (batch tested with PRs 54278, 56259, 56762). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

make sure of deleting archive

**What this PR does / why we need it**:

Exit() causes the current program to exit with the given status code, but deferred function does not run.
2017-12-20 12:32:32 -08:00
Yang Guo 7e1d74ead2 node_e2e: do not return error if Docker's check-config.sh fails 2017-12-19 20:50:09 -08:00
Wojciech Tyczynski 4e8526dc6b
Revert "Version bump to etcd v3.2.11, grpc v1.7.5" 2017-12-19 15:25:06 +01:00
Kubernetes Submit Queue a7b404ec7f
Merge pull request #57160 from jpbetz/etcd-client-3.2.11
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Version bump to etcd v3.2.11, grpc v1.7.5

Fix https://github.com/kubernetes/kubernetes/issues/56114: Update to etcd client 3.2.11

Version bumps:

- etcd from 3.1.10 to 3.2.11
- grpc from 1.3.0 to 1.7.5
- grpc-gateway from v1.1.0-25-g84398b9 to v1.3.0

TODO:

- [x] Apply etcd [3.2 client upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md)
- [x] Apply grpc API changes in 1.6.0 and 1.7.0 [release notes](https://github.com/grpc/grpc-go/releases)
- [x] bbolt was pulled in transitively, why? We have tests that embed etcd, so we must vendor the etcd server and all it's dependencies.
- [x] Upgrade to containerd v1.0.0? Currently kubernetes depends on containerd v1.0.0-beta.2-159-g27d450a0 which depends on grpc v1.3.0, but  containerd v1.0.0 depends on grpc 1.7.2. Not needed. The containerd grpc upgrade required [no code changes](ce3e32680d).
- [x] Fix all failing tests
- [x] Ensure we can safely upgrade grpc to 1.7.5 given that docker and cAdvisor still depend on grpc 1.3.0 (both in the versions we vend and on master for both projects). Should we hold off on this change until we have a docker release that uses gprc 1.7.x?
- [x] Wait for grpc 1.7.5 to be released (it will include https://github.com/grpc/grpc-go/pull/1747). Once released, bump grpc version in this PR and remove workarounds in `hack/godep-save.sh`.

Transitive dependencies on grpc:
- docker depends on grpc, but according to the package dependency graph (`go list -f '{{ .Deps }}'`) there are no dependencies from kubernetes to grpc via docker packages.
- containerd v1.0.0 depends on grpc 1.7.2, we should upgrade to containerd v1.0.0 soon, this can be done in a separate PR
- cadvisor depends on grpc 1.3.0 on master, it should upgrade it to grpc 1.7.5, this can be done in a separate PR

**Release note**:

```release-note
Upgrade to etcd client 3.2.11 and grpc 1.7.5 to improve HA etcd cluster stability.
```
2017-12-19 02:14:36 -08:00
Chris Glass 3e3385821a Do not require the linux headers to be installed.
The linux headers take significant disk space and are not necessary to
run kubernetes on a GKE node. User logging on to a node can trivially
install the kernel headers should they need to by running "apt-get install
linux-headers-$(uname -r)".

Signed-off-by: Chris Glass <chris.glass@canonical.com>
2017-12-19 10:58:13 +01:00
Chris Glass be41d1e2ce Do not require the vim package to be installed
The minimal Ubuntu image used on GKE nodes provides the vim editor as
part of system packages, as "vim.tiny". People logging on the nodes have a vim
environment available despite the "vim" package not being installed.

Signed-off-by: Chris Glass <chris.glass@canonical.com>
2017-12-19 10:58:13 +01:00
Kubernetes Submit Queue c16a336eaa
Merge pull request #55475 from ScorpioCPH/fix-e2e-local-test
Automatic merge from submit-queue (batch tested with PRs 55475, 57155, 57260, 57222). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix e2e local test

**What this PR does / why we need it**:

Fix some issue on local e2e_node test: `Can't start e2e service "kubelet"`

**Which issue(s) this PR fixes**:

Fixes https://github.com/kubernetes/kubernetes/issues/54622

**Special notes for your reviewer**:

**Release note**:

```release-note

```
/sig node
2017-12-18 19:45:38 -08:00
Kubernetes Submit Queue 6eb6955371
Merge pull request #56108 from tpepper/trivial
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

e2e_node: small fixes to setup_host.sh for Ubuntu Trusty

**What this PR does / why we need it**:
Two small fixes for how nsenter is built from source and installed on Ubuntu Trusty:
1. Use mktemp for the creating the build directory instead of a hard coded name
2. Use current (2.31) util-linux instead of 3+ year old version

**Which issue(s) this PR fixes**:

Fixes #56106 

**Special notes for your reviewer**:

See https://github.com/kubernetes/kubernetes/issues/56106 for some thoughts on other ways to address this.  My patch for util-linux 2.31 may just be a band-aid?

**Release note**:
```release-note
NONE
```
2017-12-18 16:36:26 -08:00
Joe Betz 94f2ed6849 Fix build and test errors from etcd 3.2.11 upgrade 2017-12-18 16:17:42 -08:00
Tim Pepper 3f4294cdf9 e2e_node: use newer util-linux
The e2e_node test environment setup script is hard coded to pull down a
quite old version of util-linux in order to build nsenter on trusty,
which sadly is well known to not include an nsenter executable.

While "just a test", it's unfortunate to be building from really old
util-linux sources when newer are available.

Signed-off-by: Tim Pepper <tpepper@vmware.com>
2017-12-18 13:18:36 -08:00
Tim Pepper 5256e26f68 e2e_node: use mktemp when building nsenter on trusty
There's a bit of a hack in place to insure nsenter is present on Ubuntu
trusty, which doesn't otherwise include it.  This downloads util-linux
to a hard coded directory in /tmp which is a bad practice.  Even though
"this is just a test case" it should properly use mktemp.

Signed-off-by: Tim Pepper <tpepper@vmware.com>
2017-12-18 13:18:36 -08:00
Tim Hockin f7be352a67 gcloud docker now auths k8s.gcr.io by default 2017-12-18 09:18:34 -08:00
Tim Hockin eba5b6092a Use k8s.gcr.io vanity domain for container images 2017-12-18 09:18:34 -08:00
Kubernetes Submit Queue 0a940bfd10
Merge pull request #56250 from ingvagabund/extend-node-e2e-test-suite-with-containerized-kubelet
Automatic merge from submit-queue (batch tested with PRs 56250, 56809, 56812, 56792, 56724). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Extend node e2e test suite with containerized Kubelet

How to run it?
```sh
make test-e2e-node \
    TEST_ARGS='--kubelet-containerized=true --hyperkube-image=hyperkube-amd64:1.9 --kubelet-flags="<FLAGS>"' \
    FOCUS="Conformance"
```
2017-12-16 07:46:38 -08:00
Kubernetes Submit Queue 8415e0c608
Merge pull request #56661 from xiangpengzhao/move-kubelet-constants
Automatic merge from submit-queue (batch tested with PRs 56410, 56707, 56661, 54998, 56722). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Move some kubelet constants to a common place

**What this PR does / why we need it**:
More context, see: https://github.com/kubernetes/kubernetes/issues/56516
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #56516
[thanks @ixdy for verifying this!]

**Special notes for your reviewer**:
@ixdy how can I verify #56516 against this locally?

/cc @ixdy @mtaufen 

**Release note**:

```release-note
NONE
```
2017-12-16 05:46:35 -08:00
Kubernetes Submit Queue 6c5eb50c8d
Merge pull request #55984 from derekwaynecarr/summary-tests
Automatic merge from submit-queue (batch tested with PRs 55954, 56037, 55866, 55984, 54994). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

kubelet summary api test updates

**What this PR does / why we need it**
Fixes https://github.com/kubernetes/kubernetes/issues/55985
Improve the accuracy of the test as follows:
- ensure memory bound checks for unconstrained group are limited by actual node capacity
- grow the fs capacity bounds so we can run on larger drives (i.e. my dev laptop)

**Special notes for your reviewer**:

**Release note**:
```release-note
NONE
```
2017-12-13 23:25:59 -08:00
Kubernetes Submit Queue 7b0e6b7f14
Merge pull request #56909 from dashpole/fix_local_storage_test
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

[Test Fix] Fix Broken LocalStorageEviction test

**What this PR does / why we need it**:
In #56643, I added a new test case, but did not rename the pod.  The test failed because the new pod can not be created since there is a name collision.
It fixes these new test failures: https://k8s-testgrid.appspot.com/sig-node-kubelet#kubelet-flaky-gce-e2e&include-filter-by-regex=LocalStorageCapacityIsolationEviction

**Special notes for your reviewer**:
I tested it this time

**Release note**:
```release-note
NONE
```

/assign @yujuhong 
/sig node
/kind bug
/priority critical-urgent
2017-12-07 01:30:05 -08:00
Kubernetes Submit Queue f875ee596d
Merge pull request #56519 from jiayingz/e2e-node-readiness
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Make sure node is ready before calling getLocalNode to fix test failure.

**What this PR does / why we need it**:

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/kubernetes/issues/56518

**Special notes for your reviewer**:

**Release note**:

```release-note

```
2017-12-06 18:29:19 -08:00
David Ashpole 239217f84a fix test 2017-12-06 16:55:28 -08:00
David Ashpole 7294a1d8fa make requirements less precise for disk eviction test 2017-12-06 12:58:53 -08:00
Kubernetes Submit Queue a48f11c225
Merge pull request #55448 from yguo0905/spec-change-for-new-kernel
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix GKE system spec for OS images with kernel version 4.10+

Two changes are required for validating images with kernel version 4.10+.

1. Update the spec in gke.yaml to allow 4.10+ kernel.
2. Update the GKE environment docker validation to allow `CONFIG_DEVPTS_MULTIPLE_INSTANCES` to be missing in >= 4.8 kernel because this option has been removed in 4.8.

**Release note**:

```
None
```

/assign @dchen1107
2017-12-06 08:20:24 -08:00
xiangpengzhao 1f2262e6b0 Move some kubelet constants to a common place. 2017-12-01 11:24:04 +08:00
Kubernetes Submit Queue f9c93154fc
Merge pull request #56497 from rphillips/fixes/fix_disk_metrics_coreos
Automatic merge from submit-queue (batch tested with PRs 56497, 56500, 55018, 56544, 56425). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

e2e: eviction test redirect dd stderr

**What this PR does / why we need it**: Redirects `dd` stderr to /dev/null

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #56234 

**Special notes for your reviewer**:

**Release note**:
```release-note
NONE
```
2017-11-29 15:25:58 -08:00
Jiaying Zhang 8d9a2e09c4 Make sure node is ready before calling getLocalNode to fix test failure. 2017-11-28 15:18:17 -08:00
Kubernetes Submit Queue 8226973ae8
Merge pull request #52144 from andyxning/fix_network_value_for_stats_summary
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

fix network value for stats summary for multiple network interfaces

This PR is part of [Heapster #1788](https://github.com/kubernetes/heapster/pull/1788). 

The original reason is when there are more than one none `lo`, `docker0`, `veth` network interfaces instead of just one `eth0`, the network interface value is only partial and does not correct. For now, summary stats api only gets the eth0 network interface values.

The original issues about this can be find in [Heapster #1058](https://github.com/kubernetes/heapster/issues/1058) and [Cadvisor #1593](https://github.com/google/cadvisor/issues/1593).

```release-note
Fix stats summary network value when multiple network interfaces are available.
```

/cc @DirectXMan12 @piosz @xiangpengzhao @vishh @timstclair
2017-11-28 14:59:08 -08:00
Ryan Phillips 769b9eed35 e2e: eviction test redirect dd stderr
* redirect stderr to /dev/null

Fixes #56234
2017-11-28 08:42:17 -06:00
Jan Chaloupka 607e863f85 WIP: extend node e2e test suite with containerized Kubelet 2017-11-28 15:29:27 +01:00
Aaron Crickenberger 040b80d9a7 Add [sig-node] to some unowned e2e_node tests
Follow the SIGDescribe pattern used in test/e2e/foo tests
2017-11-27 11:35:44 -05:00
Kubernetes Submit Queue 1098a3012a
Merge pull request #56228 from balajismaniam/cpuman-test-feature-tag
Automatic merge from submit-queue (batch tested with PRs 52767, 55065, 55148, 56228, 56221). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add Feature tag to CPU Manager node e2e test.

**What this PR does / why we need it**: Adds a `Feature` tag to the CPU manager node e2e tests.

CC @ConnorDoyle
2017-11-22 19:49:39 -08:00
Balaji Subramaniam 7acdf3e15e Add Feature tag to CPU Manager node e2e test. 2017-11-22 11:49:35 -08:00
Connor Doyle 6b05ac10f0 Add balajismaniam, ConnorDoyle node-e2e approvers 2017-11-22 10:01:14 -08:00
Jing Xu a66ee2eb3f Add pod-level metric for CPU and memory stats
This PR adds the pod-level metrics for CPU and memory stats. cAdvisor
can get all pod cgroup information so we can add this pod-level CPU and
memory stats information from the corresponding pod cgroup
2017-11-22 09:25:23 -08:00
Jordan Liggitt ae7dccf2e9
Revert "Kubelet flags take precedence over config from files/ConfigMaps"
This reverts commit cbebb61450.
2017-11-21 23:55:43 -05:00
Kubernetes Submit Queue d7a96d5e88
Merge pull request #56097 from mtaufen/kc-file-e2e-test
Automatic merge from submit-queue (batch tested with PRs 51494, 56097, 56072, 56175). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Kubelet flags take precedence over config from files/ConfigMaps

Changes the Kubelet configuration flag precedence order so that flags
take precedence over config from files/ConfigMaps.

See:
https://docs.google.com/document/d/18-MsChpTkrMGCSqAQN9QGgWuuFoK90SznBbwVkfZryo/

Also modifies e2e node test suite to transform all relevant Kubelet flags into
a config file before starting tests when the KubeletConfigFile feature gate is
true, and turns on the KubeletConfigFile gate for all e2e node tests.
This allows the alpha dynamic Kubelet config feature to continue to 
work in tests after the precedence change.

fixes #56171

Related: https://github.com/kubernetes/features/issues/281


```release-note
CLI flags passed to the Kubelet now take precedence over Kubelet config files and dynamic Kubelet config. This helps ensure backwards compatible behavior across Kubelet binary updates.
```
2017-11-21 19:49:27 -08:00
Kubernetes Submit Queue 754017bef4
Merge pull request #56105 from balajismaniam/enable-cpuman-only-when-not-skipped
Automatic merge from submit-queue (batch tested with PRs 55340, 55329, 56168, 56170, 56105). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Enable cpu manager only if the node e2e test is not skipped.

**What this PR does / why we need it**: This PR enables cpu manager in Kubelet only if the node e2e tests are not skipped. This change fixes the failures seen in https://k8s-testgrid.appspot.com/sig-node-kubelet#kubelet-serial-gce-e2e. 

Fixes #56144
2017-11-21 18:56:39 -08:00
Kubernetes Submit Queue 3bb6eeeb07
Merge pull request #55340 from jiayingz/metrics
Automatic merge from submit-queue (batch tested with PRs 55340, 55329, 56168, 56170, 56105). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Adds device plugin allocation latency metric.

For #53497


**What this PR does / why we need it**:

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note

```
2017-11-21 18:56:29 -08:00
Kubernetes Submit Queue 94a8d81172
Merge pull request #55447 from jingxu97/Nov/podmetric
Automatic merge from submit-queue (batch tested with PRs 55812, 55752, 55447, 55848, 50984). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add Pod-level local ephemeral storage metric in Summary API

This PR adds pod-level ephemeral storage metric into Summary API.
Pod-level ephemeral storage usage is the sum of all containers and local
ephemeral volume including EmptyDir (if not backed up by memory or
hugepages), configueMap, and downwardAPI.
Address issue #55978

**Release note**:
```release-note
Add pod-level local ephemeral storage metric in Summary API. Pod-level ephemeral storage reports the total filesystem usage for the containers and emptyDir volumes in the measured Pod.
```
2017-11-21 17:57:34 -08:00
Michael Taufen cbebb61450 Kubelet flags take precedence over config from files/ConfigMaps
Changes the Kubelet configuration flag precedence order so that flags
take precedence over config from files/ConfigMaps.

See issue #56171 for more details.

Also modifies e2e node test suite to transform all relevant Kubelet
flags into a config file before starting tests when the
KubeletConfigFile feature gate is true, and turns on the
KubeletConfigFile gate for all e2e node tests. This allows the alpha
dynamic Kubelet config feature to continue to work in tests after
the precedence change.
2017-11-21 16:02:27 -08:00
Kubernetes Submit Queue 03b7d77be4
Merge pull request #54316 from dashpole/disk_request_eviction
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Take disk requests into account during evictions

fixes #54314

This PR is part of the local storage feature, and it makes the eviction manager take disk requests into account during disk evictions.
This uses the same eviction strategy as we do for memory.
Disk requests are only considered when the LocalStorageCapacityIsolation feature gate is enabled.  This is enforced by adding a check for the feature gate in getRequests().
I have added unit testing to ensure that previous behavior is preserved when the feature gate is disabled.
Most of the changes are testing.  Reviewers should focus on changes in **eviction/helpers.go**

/sig node
/assign @jingxu97  @vishh
2017-11-21 14:31:47 -08:00
Jiaying Zhang 048bafdd0b Adds device plugin registration count metric and allocation latency metric. 2017-11-21 13:44:10 -08:00
Balaji Subramaniam 16e0f12253 Enable cpu manager only if the test is not skipped.
- Also, if KubeReserved is nil, allocate a map.
2017-11-21 10:48:54 -08:00