Commit Graph

4633 Commits (9f81cf096bfa56d22d345ff2f97473e03bc354d8)

Author SHA1 Message Date
Vishnu kannan ad743a922a remove dead code in gpu manager
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-03-13 10:58:26 -07:00
Vishnu kannan ff158090b3 use active pods instead of runtime pods in gpu manager
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-03-13 10:58:26 -07:00
Vishnu Kannan 8ed9bff073 handle container restarts for GPUs
Signed-off-by: Vishnu Kannan <vishnuk@google.com>
2017-03-13 10:58:26 -07:00
tanshanshan 26ab52a3cb fix 2017-03-13 10:00:19 +08:00
Kubernetes Submit Queue 59aa924a9b Merge pull request #42642 from fraenkel/envfrom
Automatic merge from submit-queue

Invalid environment var names are reported and pod starts

When processing EnvFrom items, all invalid keys are collected and
reported as a single event.

The Pod is allowed to start.

fixes #42583
2017-03-10 17:37:31 -08:00
timchenxiaoyu c295514443 accurate hint 2017-03-10 16:41:51 +08:00
Kubernetes Submit Queue d790851c8f Merge pull request #42694 from dchen1107/master
Automatic merge from submit-queue (batch tested with PRs 42734, 42745, 42758, 42814, 42694)

Dropped docker 1.9.x support. Changed the minimumDockerAPIVersion to

1.22

cc/ @Random-Liu @yujuhong 

We talked about dropping docker 1.9.x support for a while. I just realized that we haven't really done it yet. 

```release-note
Dropped the support for docker 1.9.x and the belows. 
```
2017-03-09 15:07:00 -08:00
Dawn Chen 69eaea2fcc Merge pull request #42779 from dashpole/fix_status
[Bug Fix] Allow Status Updates for Pods that can be deleted
2017-03-09 13:23:00 -08:00
David Ashpole e3e0bc6ce0 do not skip pods that can be deleted 2017-03-09 09:35:50 -08:00
Kubernetes Submit Queue 9cfc4f1a10 Merge pull request #42739 from yujuhong/created_time
Automatic merge from submit-queue (batch tested with PRs 42762, 42739, 42425, 42778)

FakeDockerClient: add creation timestamp

This fixes #42736
2017-03-09 02:51:38 -08:00
Kubernetes Submit Queue 4cf553f78e Merge pull request #42767 from Random-Liu/cleanup-infra-container-on-error
Automatic merge from submit-queue (batch tested with PRs 42768, 42760, 42771, 42767)

Stop sandbox container when hit network error.

Fixes https://github.com/kubernetes/kubernetes/issues/42698.

This PR stops the sandbox container when hitting a network error.
This PR also adds a unit test for it.

I'm not sure whether we should try teardown pod network after `SetUpPod` failure. We don't do that in dockertools https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/dockertools/docker_manager.go#L2276.

@yujuhong @freehan
2017-03-09 00:08:01 -08:00
Michael Fraenkel c4d07466e8 Invalid environment var names are reported and pod starts
When processing EnvFrom items, all invalid keys are collected and
reported as a single event.

The Pod is allowed to start.
2017-03-09 07:21:53 +00:00
Kubernetes Submit Queue 6fac75c80a Merge pull request #42768 from yujuhong/fix_sandbox_listing
Automatic merge from submit-queue

dockershim: Fix the race condition in ListPodSandbox

In ListPodSandbox(), we
 1. List all sandbox docker containers
 2. List all sandbox checkpoints. If the checkpoint does not have a
    corresponding container in (1), we return partial result based on
    the checkpoint.

The problem is that new PodSandboxes can be created between step (1) and
(2). In those cases, we will see the checkpoints, but not the sandbox
containers. This leads to strange behavior because the partial result
from the checkpoint does not include some critical information. For
example, the creation timestamp'd be zero, and that would cause kubelet's
garbage collector to immediately remove the sandbox.

This change fixes that by getting the list of checkpoints before listing
all the containers (since in RunPodSandbox we create them in the reverse
order).
2017-03-08 21:33:31 -08:00
Kubernetes Submit Queue ec46846a25 Merge pull request #38691 from xiangpengzhao/fix-empty-logpath
Automatic merge from submit-queue (batch tested with PRs 42211, 38691, 42737, 42757, 42754)

Only create the symlink when container log path exists

When using `syslog` logging driver instead of `json-file`, there will not be container log files such as `<containerID-json.log>`. We should not create symlink in this case.
2017-03-08 18:52:26 -08:00
timchenxiaoyu 0bfbd40d4c fix some typo 2017-03-09 09:34:43 +08:00
Random-Liu 2690461cbb Stop sandbox container when hit network error. 2017-03-08 17:28:42 -08:00
Eric Paris df590da6ab Return early from eviction debug helpers if !glog.V(3)
Should keep us from running a bunch of loops needlessly.
2017-03-08 20:19:52 -05:00
Yu-Ju Hong 38d8da1215 FakeDockerClient: add creation timestamp
This is necessary for kubemark to work correctly.
2017-03-08 17:11:16 -08:00
timchenxiaoyu 767719ea9c fix across typo 2017-03-09 09:07:21 +08:00
Yu-Ju Hong 8328a66bdf dockershim: Fix the race condition in ListPodSandbox
In ListPodSandbox(), we
 1. List all sandbox docker containers
 2. List all sandbox checkpoints. If the checkpoint does not have a
    corresponding container in (1), we return partial result based on
    the checkpoint.

The problem is that new PodSandboxes can be created between step (1) and
(2). In those cases, we will see the checkpoints, but not the sandbox
containers. This leads to strange behavior because the partial result
from the checkpoint does not include some critical information. For
example, the creation timestamp'd be zero, and that would cause kubelet's
garbage collector to immediately remove the sandbox.

This change fixes that by getting the list of checkpoints before listing
all the containers (since in RunPodSandbox we create them in the reverse
order).
2017-03-08 17:02:34 -08:00
Yu-Ju Hong 1095652cb8 Add more logs to help debugging 2017-03-08 12:27:49 -08:00
xiangpengzhao 7fed242d55 Only create the symlink when container log path exists 2017-03-08 01:36:48 -05:00
Kubernetes Submit Queue 5bc7387b3c Merge pull request #42169 from ncdc/pprof-trace
Automatic merge from submit-queue (batch tested with PRs 42692, 42169, 42173)

Add pprof trace support

Add support for `/debug/pprof/trace`

Can wait for master to reopen for 1.7.

cc @smarterclayton @wojtek-t @gmarek @timothysc @jeremyeder @kubernetes/sig-scalability-pr-reviews
2017-03-07 20:10:26 -08:00
Dawn Chen ab790b6a3a Dropped docker 1.9.x support. Changed the minimumDockerAPIVersion to
1.22
2017-03-07 17:07:07 -08:00
Kubernetes Submit Queue 1ed3aa6750 Merge pull request #42264 from yujuhong/kubemark_cri
Automatic merge from submit-queue

kubemark: enable CRI for the hollow nodes

This fixes #41488
2017-03-07 13:04:39 -08:00
Matthew Wong 1dabce9815 Print dereferenced pod status fields when logging status update 2017-03-07 15:00:54 -05:00
Yu-Ju Hong a0f90e1490 Use FakeDockerPuller to bypass auth/keyring logic in tests 2017-03-07 10:11:49 -08:00
Yu-Ju Hong 516848c37d Various fixes for the fake docker client
* Properly return ImageNotFoundError
 * Support inject "Images" or "ImageInspects" and keep both in sync.
 * Remove the FakeDockerPuller and let FakeDockerClient subsumes its
   functinality. This reduces the overhead to maintain both objects.
 * Various small fixes and refactoring of the testing utils.
2017-03-07 10:11:49 -08:00
Kubernetes Submit Queue 5cc6a4e269 Merge pull request #42609 from intelsdi-x/test-out-of-oir
Automatic merge from submit-queue (batch tested with PRs 41890, 42593, 42633, 42626, 42609)

Pods pending due to insufficient OIR should get scheduled once sufficient OIR becomes available (e2e disabled).

#41870 was reverted because it introduced an e2e test flake. This is the same code with the e2e for OIR disabled again.

We can attempt to enable the e2e test cases one-by-one in follow-up PRs, but it would be preferable to get the main fix merged in time for 1.6 since OIR is broken on master (see #41861).

cc @timothysc
2017-03-07 08:10:46 -08:00
Andy Goldstein b011529d8a Add pprof trace support
Add pprof trace support and --enable-contention-profiling to those
components that don't already have it.
2017-03-07 10:10:42 -05:00
Kubernetes Submit Queue a1c5d1b80f Merge pull request #42585 from derekwaynecarr/cgroup-flake
Automatic merge from submit-queue (batch tested with PRs 42506, 42585, 42596, 42584)

provide active pods to cgroup cleanup

**What this PR does / why we need it**:
This PR provides more information for when a pod cgroup is considered orphaned.  The running pods cache is based on the runtime's view of the world.  we create pod cgroups before containers so we should just be looking at activePods.

**Which issue this PR fixes**
Fixes https://github.com/kubernetes/kubernetes/issues/42431
2017-03-06 22:20:11 -08:00
Kubernetes Submit Queue 31db570a00 Merge pull request #42497 from derekwaynecarr/lower_cgroup_names
Automatic merge from submit-queue

cgroup names created by kubelet should be lowercased

**What this PR does / why we need it**:
This PR modifies the kubelet to create cgroupfs names that are lowercased.  This better aligns us with the naming convention for cgroups v2 and other cgroup managers in ecosystem (docker, systemd, etc.)

See: https://www.kernel.org/doc/Documentation/cgroup-v2.txt
"2-6-2. Avoid Name Collisions"

**Special notes for your reviewer**:
none

**Release note**:
```release-note
kubelet created cgroups follow lowercase naming conventions
```
2017-03-06 20:43:03 -08:00
Connor Doyle 364dbc0ca5 Revert "Revert "Pods pending due to insufficient OIR should get scheduled once sufficient OIR becomes available.""
- This reverts commit 60758f3fff.
- Disabled opaque integer resource end-to-end tests.
2017-03-06 17:48:09 -08:00
Derek Carr 5ce298c9aa provide active pods to cgroup cleanup 2017-03-06 17:37:26 -05:00
Dawn Chen 60758f3fff Revert "Pods pending due to insufficient OIR should get scheduled once sufficient OIR becomes available." 2017-03-06 14:27:17 -08:00
Kubernetes Submit Queue 0fad9ce5e2 Merge pull request #41870 from intelsdi-x/test-out-of-oir
Automatic merge from submit-queue (batch tested with PRs 31783, 41988, 42535, 42572, 41870)

Pods pending due to insufficient OIR should get scheduled once sufficient OIR becomes available.

This appears to be a regression since v1.5.0 in scheduler behavior for opaque integer resources, reported in https://github.com/kubernetes/kubernetes/issues/41861.

- [X] Add failing e2e test to trigger the regression
- [x] Restore previous behavior (pods pending due to insufficient OIR get scheduled once sufficient OIR becomes available.)
2017-03-06 11:30:24 -08:00
Derek Carr 48d822eafe cgroup names created by kubelet should be lowercased 2017-03-06 11:19:21 -05:00
Seth Jennings ccd87fca3f kubelet: add cgroup manager metrics 2017-03-06 08:53:47 -06:00
Harry Zhang bc644f9e04 Use pod sandbox id in checkpoint 2017-03-06 10:46:26 +08:00
Kubernetes Submit Queue 4bbf98850f Merge pull request #42500 from vishh/fix-gpu-init
Automatic merge from submit-queue

[Bug] Fix gpu initialization in Kubelet

Kubelet incorrectly fails if `AllAlpha=true` feature gate is enabled with container runtimes that are not `docker`.

Replaces #42407
2017-03-04 20:28:08 -08:00
Connor Doyle 8a42189690 Fix unbounded growth of cached OIRs in sched cache
- Added schedulercache.Resource.SetOpaque helper.
- Amend kubelet allocatable sync so that when OIRs are removed from capacity
  they are also removed from allocatable.
- Fixes #41861.
2017-03-04 09:26:22 -08:00
Kubernetes Submit Queue f9ccee7714 Merge pull request #42435 from dashpole/timestamps_for_fsstats
Automatic merge from submit-queue (batch tested with PRs 42369, 42375, 42397, 42435, 42455)

[Bug Fix]: Avoid evicting more pods than necessary by adding Timestamps for fsstats and ignoring stale stats

Continuation of #33121.  Credit for most of this goes to @sjenning.  I added volume fs timestamps.

**why is this a bug** 
This PR attempts to fix part of https://github.com/kubernetes/kubernetes/issues/31362 which results in multiple pods getting evicted unnecessarily whenever the node runs into resource pressure. This PR reduces the chances of such disruptions by avoiding reacting to old/stale metrics.
Without this PR, kubernetes nodes under resource pressure will cause unnecessary disruptions to user workloads. 
This PR will also help deflake a node e2e test suite.

The eviction manager currently avoids evicting pods if metrics are old.  However, timestamp data is not available for filesystem data, and this causes lots of extra evictions.
See the [inode eviction test flakes](https://k8s-testgrid.appspot.com/google-node#kubelet-flaky-gce-e2e) for examples.
This should probably be treated as a bugfix, as it should help mitigate extra evictions.

cc: @kubernetes/sig-storage-pr-reviews  @kubernetes/sig-node-pr-reviews @vishh @derekwaynecarr @sjenning
2017-03-03 23:21:48 -08:00
Kubernetes Submit Queue 51a3d7b663 Merge pull request #42397 from feiskyer/fix-42396
Automatic merge from submit-queue (batch tested with PRs 42369, 42375, 42397, 42435, 42455)

Kubelet: return container runtime's version instead of CRI's one

**What this PR does / why we need it**:

With CRI enabled by default, kubelet reports the version of CRI instead of container runtime version. This PR fixes this problem.

**Which issue this PR fixes** 

Fixes #42396.

**Special notes for your reviewer**:

Should also cherry-pick to 1.6 branch.

**Release note**:

```release-note
NONE
```

cc @yujuhong  @kubernetes/sig-node-bugs
2017-03-03 23:21:46 -08:00
Kubernetes Submit Queue 2d319bd406 Merge pull request #42204 from dashpole/allocatable_eviction
Automatic merge from submit-queue

Eviction Manager Enforces Allocatable Thresholds

This PR modifies the eviction manager to enforce node allocatable thresholds for memory as described in kubernetes/community#348.
This PR should be merged after #41234. 

cc @kubernetes/sig-node-pr-reviews @kubernetes/sig-node-feature-requests @vishh 

** Why is this a bug/regression**

Kubelet uses `oom_score_adj` to enforce QoS policies. But the `oom_score_adj` is based on overall memory requested, which means that a Burstable pod that requested a lot of memory can lead to OOM kills for Guaranteed pods, which violates QoS. Even worse, we have observed system daemons like kubelet or kube-proxy being killed by the OOM killer.
Without this PR, v1.6 will have node stability issues and regressions in an existing GA feature `out of Resource` handling.
2017-03-03 20:20:12 -08:00
Kubernetes Submit Queue 9cc5480918 Merge pull request #41149 from sjenning/qos-memory-limits
Automatic merge from submit-queue (batch tested with PRs 41919, 41149, 42350, 42351, 42285)

kubelet: enable qos-level memory limits

```release-note
Experimental support to reserve a pod's memory request from being utilized by pods in lower QoS tiers.
```

Enables the QoS-level memory cgroup limits described in https://github.com/kubernetes/community/pull/314

**Note: QoS level cgroups have to be enabled for any of this to take effect.**

Adds a new `--experimental-qos-reserved` flag that can be used to set the percentage of a resource to be reserved at the QoS level for pod resource requests.

For example, `--experimental-qos-reserved="memory=50%`, means that if a Guaranteed pod sets a memory request of 2Gi, the Burstable and BestEffort QoS memory cgroups will have their `memory.limit_in_bytes` set to `NodeAllocatable - (2Gi*50%)` to reserve 50% of the guaranteed pod's request from being used by the lower QoS tiers.

If a Burstable pod sets a request, its reserve will be deducted from the BestEffort memory limit.

The result is that:
- Guaranteed limit matches root cgroup at is not set by this code
- Burstable limit is `NodeAllocatable - Guaranteed reserve`
- BestEffort limit is `NodeAllocatable - Guaranteed reserve - Burstable reserve`

The only resource currently supported is `memory`; however, the code is generic enough that other resources can be added in the future.

@derekwaynecarr @vishh
2017-03-03 16:44:39 -08:00
Vishnu kannan 038585626d fix gpu initialization
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-03-03 12:13:01 -08:00
Klaus Ma 41c4426a30 Removed un-necessary empty line. 2017-03-03 19:43:48 +08:00
David Ashpole a90c7951d4 add volume timestamps 2017-03-02 15:01:59 -08:00
Seth Jennings cc50aa9dfb kubelet: enable qos-level memory request reservation 2017-03-02 15:04:13 -06:00
Seth Jennings c5faf1c156 kubelet: eviction: add timestamp to FsStats 2017-03-02 11:20:24 -08:00
David Ashpole ac612eab8e eviction manager changes for allocatable 2017-03-02 07:36:24 -08:00
Kubernetes Submit Queue 00c0c8332f Merge pull request #42273 from smarterclayton/evaluate_probes
Automatic merge from submit-queue (batch tested with PRs 41672, 42084, 42233, 42165, 42273)

ExecProbes should be able to do simple env var substitution

For containers that don't have bash, we should support env substitution
like we do on command and args. However, without major refactoring
valueFrom is not supportable from inside the prober. For now, implement
substitution based on hardcoded env and leave TODOs for future work.

Improves the state of #40846, will spawn a follow up issue for future refactoring after CRI settles down
2017-03-02 03:20:29 -08:00
Kubernetes Submit Queue 5ee6ba2f59 Merge pull request #42223 from Random-Liu/dockershim-better-implement-cri
Automatic merge from submit-queue (batch tested with PRs 41980, 42192, 42223, 41822, 42048)

CRI: Make dockershim better implements CRI.

When thinking about CRI Validation test, I found that `PodSandboxStatus.Linux.Namespaces.Options.HostPid` and `PodSandboxStatus.Linux.Namespaces.Options.HostIpc` are not populated. Although they are not used by kuberuntime now, we should populate them to conform to CRI.

/cc @yujuhong @feiskyer
2017-03-02 00:59:19 -08:00
Pengfei Ni 1986b78e0e Version(): return runtime version instead of CRI 2017-03-02 14:42:37 +08:00
Kubernetes Submit Queue fa0387c9fe Merge pull request #42195 from Random-Liu/cri-support-non-json-logging
Automatic merge from submit-queue (batch tested with PRs 41931, 39821, 41841, 42197, 42195)

Use `docker logs` directly if the docker logging driver is not `json-file`

Fixes https://github.com/kubernetes/kubernetes/issues/41996.

Post the PR first, I still need to manually test this, because we don't have test coverage for journald logging pluggin.

@yujuhong @dchen1107 
/cc @kubernetes/sig-node-pr-reviews
2017-03-01 20:08:08 -08:00
Kubernetes Submit Queue dfe05e0512 Merge pull request #41753 from derekwaynecarr/burstable-cpu-shares
Automatic merge from submit-queue (batch tested with PRs 41644, 42020, 41753, 42206, 42212)

Burstable QoS cgroup has cpu shares assigned

**What this PR does / why we need it**:
This PR sets the Burstable QoS cgroup cpu shares value to the sum of the pods cpu requests in that tier.  We need it for proper evaluation of CPU shares in the new QoS hierarchy.

**Special notes for your reviewer**:
It builds against the framework proposed for https://github.com/kubernetes/kubernetes/pull/41833
2017-03-01 15:30:34 -08:00
Kubernetes Submit Queue ddd8b5c1cf Merge pull request #41644 from derekwaynecarr/ensure-pod-cgroup-deleted
Automatic merge from submit-queue (batch tested with PRs 41644, 42020, 41753, 42206, 42212)

Ensure pod cgroup is deleted prior to deletion of pod

**What this PR does / why we need it**:
This PR ensures that the kubelet removes the pod cgroup sandbox prior to deletion of a pod from the apiserver.   We need this to ensure that the default behavior in the kubelet is to not leak resources.
2017-03-01 15:30:30 -08:00
Kubernetes Submit Queue d5ff69468e Merge pull request #29378 from vefimova/docker_resolv
Automatic merge from submit-queue

Re-writing of the resolv.conf file generated by docker

Fixes #17406 

Docker 1.12 will contain feature "The option --dns and --net=host should not be mutually exclusive" (docker/docker#22408)
This patch adds optional support for this ability in kubelet (for now in case of "hostNetwork: true" set all dns settings are ignored if any).
To enable feature use newly added kubelet flag: --allow-dns-for-hostnet=true
2017-03-01 14:19:08 -08:00
Derek Carr 21a899cf85 Ensure pod cgroup is deleted prior to deletion of pod 2017-03-01 15:29:36 -05:00
Derek Carr 1947e76e91 Set Burstable QOS Cgroup cpu.shares 2017-03-01 14:51:34 -05:00
Random-Liu 7c261bfed7 Use `docker logs` directly if the docker logging driver is not
supported.
2017-03-01 10:50:11 -08:00
Yu-Ju Hong 1759b87ffe Generate valid container id in fake docker client. 2017-03-01 10:33:08 -08:00
vefimova fc8a37ec86 Added ability for Docker containers to set usage of dns settings along with hostNetwork is true
Introduced chages:
   1. Re-writing of the resolv.conf file generated by docker.
      Cluster dns settings aren't passed anymore to docker api in all cases, not only for pods with host network:
      the resolver conf will be overwritten after infra-container creation to override docker's behaviour.

   2. Added new one dnsPolicy - 'ClusterFirstWithHostNet', so now there are:
      - ClusterFirstWithHostNet - use dns settings in all cases, i.e. with hostNet=true as well
      - ClusterFirst - use dns settings unless hostNetwork is true
      - Default

Fixes #17406
2017-03-01 17:10:00 +00:00
Hemant Kumar 2d3008fc56 Implement support for mount options in PVs
Add support for mount options via annotations on PVs
2017-03-01 11:50:40 -05:00
Kubernetes Submit Queue ed479163fa Merge pull request #42116 from vishh/gpu-experimental-support
Automatic merge from submit-queue

Extend experimental support to multiple Nvidia GPUs

Extended from #28216

```release-note
`--experimental-nvidia-gpus` flag is **replaced** by `Accelerators` alpha feature gate along with  support for multiple Nvidia GPUs. 
To use GPUs, pass `Accelerators=true` as part of `--feature-gates` flag.
Works only with Docker runtime.
```

1. Automated testing for this PR is not possible since creation of clusters with GPUs isn't supported yet in GCP.
1. To test this PR locally, use the node e2e.
```shell
TEST_ARGS='--feature-gates=DynamicKubeletConfig=true' FOCUS=GPU SKIP="" make test-e2e-node
```

TODO:

- [x] Run manual tests
- [x] Add node e2e
- [x] Add unit tests for GPU manager (< 100% coverage)
- [ ] Add unit tests in kubelet package
2017-03-01 04:52:50 -08:00
Kubernetes Submit Queue f68c824f95 Merge pull request #42139 from Random-Liu/unify-fake-runtime-helper
Automatic merge from submit-queue (batch tested with PRs 41921, 41695, 42139, 42090, 41949)

Unify fake runtime helper in kuberuntime, rkt and dockertools.

Addresses https://github.com/kubernetes/kubernetes/pull/42081#issuecomment-282429775.

Add `pkg/kubelet/container/testing/fake_runtime_helper.go`, and change `kuberuntime`, `rkt` and `dockertools` to use it.

@yujuhong This is a small unit test refactoring PR. Could you help me review it?
2017-03-01 04:10:04 -08:00
Kubernetes Submit Queue 1351324bed Merge pull request #41833 from sjenning/qos-refactor
Automatic merge from submit-queue (batch tested with PRs 38676, 41765, 42103, 41833, 41702)

kubelet: cm: refactor QoS logic into seperate interface

This commit has no functional change.  It refactors the QoS cgroup logic into a new `QOSContainerManager` interface to allow for better isolation for QoS cgroup features coming down the pike.

This is a breakout of the refactoring component of my QoS memory limits PR https://github.com/kubernetes/kubernetes/pull/41149 which will need to be rebased on top of this.

@vishh @derekwaynecarr
2017-03-01 01:44:10 -08:00
Kubernetes Submit Queue 9f3343df40 Merge pull request #42015 from dashpole/min_timeout_eviction
Automatic merge from submit-queue (batch tested with PRs 42162, 41973, 42015, 42115, 41923)

Increase Min Timeout for kill pod

Should mitigate #41347, which describes flakes in the inode eviction test due to "GracePeriodExceeded" errors.

When we use gracePeriod == 0, as we do in eviction, the pod worker currently sets a timeout of 2 seconds to kill a pod.
We are hitting this timeout fairly often during eviction tests, causing extra pods to be evicted (since the eviction manager "fails" to evict that pod, and kills the next one).

This PR increases the timeout from 2 seconds to 4, although we could increase it even more if we think that would be appropriate.

cc @yujuhong @vishh @derekwaynecarr
2017-02-28 22:06:01 -08:00
Kubernetes Submit Queue 91e1933f9f Merge pull request #42149 from Random-Liu/check-infra-container-image-existence
Automatic merge from submit-queue (batch tested with PRs 42216, 42136, 42183, 42149, 36828)

Check infra container image existence before pulling.

Fixes https://github.com/kubernetes/kubernetes/issues/42040.

This PR:
* Fixes https://github.com/kubernetes/kubernetes/issues/42040 by checking image existence before pulling.
* Add unit test for it.
* Fix a potential panic at https://github.com/kubernetes/kubernetes/compare/master...Random-Liu:check-infra-container-image-existence?expand=1#diff-e2eefa11d78ba95197ce406772c18c30R421.

@yujuhong
2017-02-28 21:17:02 -08:00
Clayton Coleman ce62f3d4a0
ExecProbes should be able to do simple env var substitution
For containers that don't have bash, we should support env substitution
like we do on command and args. However, without major refactoring
valueFrom is not supportable from inside the prober. For now, implement
substitution based on hardcoded env and leave TODOs for future work.
2017-02-28 22:46:04 -05:00
Vishnu kannan 13582a65aa fix a bug in nvidia gpu allocation and added unit test
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-02-28 13:42:08 -08:00
Vishnu kannan 2554b95994 Map nvidia devices one to one.
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-02-28 13:42:08 -08:00
Vishnu kannan 318f4e102a adding an e2e for GPUs
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-02-28 13:42:08 -08:00
Vishnu kannan 69acb02394 use feature gate instead of flag to control support for GPUs
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-02-28 13:42:07 -08:00
Vishnu kannan 3b0a408e3b improve gpu integration
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-02-28 11:27:53 -08:00
Hui-Zhi 57c77ffbdd Add support for multiple nvidia gpus 2017-02-28 11:24:48 -08:00
Seth Jennings b9adb66426 kubelet: cm: refactor QoS logic into seperate interface 2017-02-28 09:19:29 -06:00
Jan Safranek d7d039dba2 Make kubelet never delete files on mounted filesystems
With bug #27653, kubelet could remove mounted volumes and delete user data.
The bug itself is fixed, however our trust in kubelet is significantly lower.
Let's add an extra version of RemoveAll that does not cross mount boundary
(rm -rf --one-file-system).

It calls lstat(path) three times for each removed directory - once in
RemoveAllOneFilesystem and twice in IsLikelyNotMountPoint, however this way
it's platform independent and the directory that is being removed by kubelet
should be almost empty.
2017-02-28 14:32:07 +01:00
timchenxiaoyu 4772931e63 fix reconcile typo 2017-02-28 13:50:25 +08:00
Vishnu kannan 9b4a8f7464 fix eviction helper function description
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-02-27 21:24:45 -08:00
Derek Carr a7684569fb Fix get all pods from cgroups logic 2017-02-27 21:24:45 -08:00
Vishnu Kannan cc5f5474d5 add support for node allocatable phase 2 to kubelet
Signed-off-by: Vishnu Kannan <vishnuk@google.com>
2017-02-27 21:24:44 -08:00
Vishnu Kannan 70e340b045 adding kubelet flags for node allocatable phase 2
Signed-off-by: Vishnu Kannan <vishnuk@google.com>
2017-02-27 21:24:44 -08:00
Random-Liu 0351629517 Make dockershim better implements CRI. 2017-02-27 20:37:49 -08:00
Random-Liu 29a063e62e Check infra container image existence before pulling. 2017-02-27 10:59:36 -08:00
David Ashpole 6daa2f2ef3 increase timeout 2017-02-27 10:59:24 -08:00
Minhan Xia f006c8bcd3 teach kubenet to use annotation instead of pod object for traffic shaper 2017-02-27 10:11:09 -08:00
Minhan Xia 947e0e1bf5 pass pod annotation to SetUpPod 2017-02-27 10:09:45 -08:00
Kubernetes Submit Queue be724ba3c1 Merge pull request #42111 from feiskyer/sandbox
Automatic merge from submit-queue (batch tested with PRs 41116, 41804, 42104, 42111, 42120)

Remove SandboxReceived event

This PR removes SandboxReceived event in sync pod.

> This event seems somewhat meaningless, and clouds the event records for a pod. Do we actually need it? Pulling and pod received on the node are very relevant, this seems much less so. Would suggest we either remove it, or turn it into a message that clearly indicates why it has value.

Refer d65309399a (commitcomment-21052453).

cc @smarterclayton @yujuhong
2017-02-27 04:10:28 -08:00
Kubernetes Submit Queue d1f5331102 Merge pull request #41804 from chakri-nelluri/flex
Automatic merge from submit-queue (batch tested with PRs 41116, 41804, 42104, 42111, 42120)

Add support for attacher/detacher interface in Flex volume

Add support for attacher/detacher interface in Flex volume
This change breaks backward compatibility and requires to be release noted.

```release-note
Flex volume plugin is updated to support attach/detach interfaces. It broke backward compatibility. Please update your drivers and implement the new callouts. 
```
2017-02-27 04:10:25 -08:00
Random-Liu 0deec63d1a Unify fake runtime helper in kuberuntime, rkt and dockertools. 2017-02-27 01:43:37 -08:00
Kubernetes Submit Queue 077e67eb77 Merge pull request #42076 from freehan/network-owner
Automatic merge from submit-queue (batch tested with PRs 42058, 41160, 42065, 42076, 39338)

add OWNER file to kubelet/network
2017-02-27 01:30:05 -08:00
Kubernetes Submit Queue bc650d7ec7 Merge pull request #42055 from derekwaynecarr/sandbox-cgroup-parent
Automatic merge from submit-queue (batch tested with PRs 41962, 42055, 42062, 42019, 42054)

dockershim puts pause container in pod cgroup

**What this PR does / why we need it**:
The CRI was not launching the pause container in the pod level cgroup.  The non-CRI code path was.
2017-02-27 00:16:55 -08:00
Kubernetes Submit Queue a8b629d4ee Merge pull request #41701 from vishh/evict-non-static-critical-pods
Automatic merge from submit-queue

Admit critical pods under resource pressure

And evict critical pods that are not static.

Depends on #40952.

For #40573
2017-02-26 13:43:10 -08:00
Kubernetes Submit Queue 16f87fe7d8 Merge pull request #40952 from dashpole/premption
Automatic merge from submit-queue (batch tested with PRs 41994, 41969, 41997, 40952, 40576)

Guaranteed admission for Critical Pods

This is the first step in implementing node-level preemption for critical pods.
It defines the AdmissionFailureHandler interface, which allows callers, like the kubelet, to define how failed predicates are handled, and take steps to correct failures if necessary.
In the kubelet's implementation, it triggers preemption if the pod being admitted is critical, and if the only failed predicates are InsufficientResourceErrors, then it prempts (not yet implemented) other other pods to allow admission of the critical pod.

cc: @vishh
2017-02-26 12:57:59 -08:00
Kubernetes Submit Queue 2eef3b1a14 Merge pull request #41957 from liggitt/mirror-pod-secrets
Automatic merge from submit-queue (batch tested with PRs 41814, 41922, 41957, 41406, 41077)

Use consistent helper for getting secret names from pod

Kubelet secret-manager and mirror-pod admission both need to know what secrets a pod spec references. Eventually, a node authorizer will also need to know the list of secrets.

This creates a single (well, double, because api versions) helper that can be used to traverse the secret names referenced from a pod, optionally short-circuiting (for places that are just looking to see if any secrets are referenced, like admission, or are looking for a particular secret ref, like authorization)

Fixes:
* secret manager not handling secrets used by env/envFrom in initcontainers
* admission allowing mirror pods with secret references

@smarterclayton @wojtek-t
2017-02-26 10:22:51 -08:00
Kubernetes Submit Queue 8e531de1d5 Merge pull request #41946 from freehan/hostport-fix
Automatic merge from submit-queue (batch tested with PRs 41621, 41946, 41941, 41250, 41729)

bug fix for hostport-syncer

fix a bug introduced by the previous refactoring of hostport-syncer.  https://github.com/kubernetes/kubernetes/pull/39443
and fix some nits
2017-02-26 06:46:55 -08:00
Kubernetes Submit Queue 28a8d783e6 Merge pull request #41621 from derekwaynecarr/best-effort-qos-shares
Automatic merge from submit-queue

BestEffort QoS class has min cpu shares

**What this PR does / why we need it**:
BestEffort QoS class is given the minimum amount of CPU shares per the QoS design.
2017-02-26 06:32:43 -08:00
Pengfei Ni 245dad86b4 Remove SandboxReceived event 2017-02-26 09:30:00 +08:00
Kubernetes Submit Queue 067f92e789 Merge pull request #41801 from riverzhang/patch-1
Automatic merge from submit-queue (batch tested with PRs 41854, 41801, 40088, 41590, 41911)

Fix  some typos

**Release note**:

```release-note
```
2017-02-25 05:02:53 -08:00
Kubernetes Submit Queue 8e6af485f9 Merge pull request #41918 from ncdc/shared-informers-14-scheduler
Automatic merge from submit-queue (batch tested with PRs 41714, 41510, 42052, 41918, 31515)

Switch scheduler to use generated listers/informers

Where possible, switch the scheduler to use generated listers and
informers. There are still some places where it probably makes more
sense to use one-off reflectors/informers (listing/watching just a
single node, listing/watching scheduled & unscheduled pods using a field
selector).

I think this can wait until master is open for 1.7 pulls, given that we're close to the 1.6 freeze.

After this and #41482 go in, the only code left that references legacylisters will be federation, and 1 bit in a stateful set unit test (which I'll clean up in a follow-up).

@resouer I imagine this will conflict with your equivalence class work, so one of us will be doing some rebasing 😄 

cc @wojtek-t @gmarek  @timothysc @jayunit100 @smarterclayton @deads2k @liggitt @sttts @derekwaynecarr @kubernetes/sig-scheduling-pr-reviews @kubernetes/sig-scalability-pr-reviews
2017-02-25 02:17:55 -08:00
Chakravarthy Nelluri 0d2af70e95 Add support for attacher/detacher interface in Flex volume 2017-02-24 20:18:06 -05:00
Random-Liu 8380148d48 Remove extra operations when generating pod sandbox configuration. 2017-02-24 15:06:03 -08:00
Minhan Xia 727c3f28e5 add OWNER file to kubelet/network 2017-02-24 11:41:13 -08:00
Derek Carr 0449b008a8 dockershim puts pause container in pod cgroup 2017-02-24 11:30:06 -05:00
Kubernetes Submit Queue 4c1b875ca0 Merge pull request #39196 from resouer/omit-dot
Automatic merge from submit-queue

kubelet config should ignore file start with dots

Fixes: #39156

Ignore files started with dot.
2017-02-24 05:30:21 -08:00
David Ashpole c58970e47c critical pods can preempt other pods to be admitted 2017-02-23 10:31:20 -08:00
Andy Goldstein 9d8d6ad16c Switch scheduler to use generated listers/informers
Where possible, switch the scheduler to use generated listers and
informers. There are still some places where it probably makes more
sense to use one-off reflectors/informers (listing/watching just a
single node, listing/watching scheduled & unscheduled pods using a field
selector).
2017-02-23 09:57:12 -05:00
Kubernetes Submit Queue 17175b24a2 Merge pull request #40007 from JulienBalestra/rktnetes-systemd-ops-helpers
Automatic merge from submit-queue (batch tested with PRs 41812, 41665, 40007, 41281, 41771)

Kubelet-rkt: Add useful informations for Ops on the Kubelet Host

Create a Systemd SyslogIdentifier inside the [Service]
Create a Systemd Description inside the [Unit]

**What this PR does / why we need it**:

#### Overview
Logged against the host, it's difficult to identify who's who.
This PR add useful information to quickly get straight to the point with the **DESCRIPTION** field:

```
systemctl list-units "k8s*"
UNIT                                             LOAD   ACTIVE SUB     DESCRIPTION
k8s_b5a9bdf7-e396-4989-8df0-30a5fda7f94c.service loaded active running kube-controller-manager-172.20.0.206
k8s_bec0d8a1-dc15-4b47-a850-e09cf098646a.service loaded active running nginx-daemonset-gxm4s
k8s_d2981e9c-2845-4aa2-a0de-46e828f0c91b.service loaded active running kube-apiserver-172.20.0.206
k8s_fde4b0ab-87f8-4fd1-b5d2-3154918f6c89.service loaded active running kube-scheduler-172.20.0.206

```

#### Overview and Journal

Always on the host, to easily retrieve the pods logs, this PR add a SyslogIdentifier named as the PodBaseName.


```
# A DaemonSet prometheus-node-exporter is running on the Kubernetes Cluster
systemctl list-units "k8s*" | grep prometheus-node-exporter
k8s_c60a4b1a-387d-4fce-afa1-642d6f5716c1.service loaded active running prometheus-node-exporter-85cpp

# Get the logs from the prometheus-node-exporter DaemonSet 
journalctl -t prometheus-node-exporter | wc -l
278
```

Sadly the `journalctl` flag `-t` / `--identifier` doesn't allow a pattern to catch the logs.

Also this field improve any queries made by any tools who exports the Journal (E.g: ES, Kibana):
```
{
	"__CURSOR" : "s=86fd390d123b47af89bb15f41feb9863;i=164b2c27;b=7709deb3400841009e0acc2fec1ebe0e;m=1fe822ca4;t=54635e6a62285;x=b2d321019d70f36f",
	"__REALTIME_TIMESTAMP" : "1484572200411781",
	"__MONOTONIC_TIMESTAMP" : "8564911268",
	"_BOOT_ID" : "7709deb3400841009e0acc2fec1ebe0e",
	"PRIORITY" : "6",
	"_UID" : "0",
	"_GID" : "0",
	"_SYSTEMD_SLICE" : "system.slice",
	"_SELINUX_CONTEXT" : "system_u:system_r:kernel_t:s0",
	"_MACHINE_ID" : "7bbb4401667243da81671e23fd8a2246",
	"_HOSTNAME" : "Kubelet-Host",
	"_TRANSPORT" : "stdout",
	"SYSLOG_FACILITY" : "3",
	"_COMM" : "ld-linux-x86-64",
	"_CAP_EFFECTIVE" : "3fffffffff",
	"SYSLOG_IDENTIFIER" : "prometheus-node-exporter",
	"_PID" : "88827",
	"_EXE" : "/var/lib/rkt/pods/run/c60a4b1a-387d-4fce-afa1-642d6f5716c1/stage1/rootfs/usr/lib64/ld-2.21.so",
	"_CMDLINE" : "stage1/rootfs/usr/lib/ld-linux-x86-64.so.2 stage1/rootfs/usr/bin/systemd-nspawn [....]",
	"_SYSTEMD_CGROUP" : "/system.slice/k8s_c60a4b1a-387d-4fce-afa1-642d6f5716c1.service",
	"_SYSTEMD_UNIT" : "k8s_c60a4b1a-387d-4fce-afa1-642d6f5716c1.service",
	"MESSAGE" : "[ 8564.909237] prometheus-node-exporter[115]: time=\"2017-01-16T13:10:00Z\" level=info msg=\" - time\" source=\"node_exporter.go:157\""
}
```
2017-02-23 00:11:38 -08:00
Kubernetes Submit Queue 0d5a638d24 Merge pull request #41665 from freehan/cri-checkpoint-fix
Automatic merge from submit-queue (batch tested with PRs 41812, 41665, 40007, 41281, 41771)

initialize directory while creating checkpoint file store

fixes: #41616 
ref: https://github.com/kubernetes/kubernetes/issues/41225
2017-02-23 00:11:35 -08:00
Jordan Liggitt a5526304bc
Use consistent helper for getting secret names from pod 2017-02-23 00:40:17 -05:00
Kubernetes Submit Queue c36eee2a0c Merge pull request #41784 from dixudx/fix_issue_41746
Automatic merge from submit-queue (batch tested with PRs 41146, 41486, 41482, 41538, 41784)

fix issue #41746

**What this PR does / why we need it**:

**Which issue this PR fixes** : fixes #41746 

**Special notes for your reviewer**:

cc @feiskyer
2017-02-22 21:09:38 -08:00
Kubernetes Submit Queue 6024f56f80 Merge pull request #38957 from aveshagarwal/master-taints-tolerations-api-fields
Automatic merge from submit-queue (batch tested with PRs 38957, 41819, 41851, 40667, 41373)

Change taints/tolerations to api fields

This PR changes current implementation of taints and tolerations from annotations to API fields. Taint and toleration are now part of `NodeSpec` and `PodSpec`, respectively. The annotation keys: `scheduler.alpha.kubernetes.io/tolerations` and `scheduler.alpha.kubernetes.io/taints`  have been removed.

**Release note**:
Pod tolerations and node taints have moved from annotations to API fields in the PodSpec and NodeSpec, respectively. Pod tolerations and node taints that are defined in the annotations will be ignored. The annotation keys: `scheduler.alpha.kubernetes.io/tolerations` and `scheduler.alpha.kubernetes.io/taints`  have been removed.
2017-02-22 19:59:31 -08:00
Minhan Xia 6b34343946 bug fix for hostport-syncer 2017-02-22 16:38:09 -08:00
Kubernetes Submit Queue d1687d2f67 Merge pull request #41349 from derekwaynecarr/enable-pod-cgroups
Automatic merge from submit-queue (batch tested with PRs 41349, 41532, 41256, 41587, 41657)

Enable pod level cgroups by default

**What this PR does / why we need it**:
It enables pod level cgroups by default.

**Special notes for your reviewer**:
This is intended to be enabled by default on 2/14/2017 per the plan outlined here:
https://github.com/kubernetes/community/pull/314

**Release note**:
```release-note
Each pod has its own associated cgroup by default.
```
2017-02-22 08:12:37 -08:00
Avesh Agarwal 9b640838a5 Change taint/toleration annotations to api fields. 2017-02-22 09:27:42 -05:00
Tim Hockin 98d693e9d2 Merge pull request #39837 from foxyriver/modify-comment
modify-comment
2017-02-21 16:33:47 -06:00
JulienBalestra 7de2d51f90 gofmt rkt.go, rkt_test.go 2017-02-21 23:06:13 +01:00
Derek Carr 43ae6f49ad Enable per pod cgroups, fix defaulting of cgroup-root when not specified 2017-02-21 16:34:22 -05:00
Derek Carr 7fe105ebc7 stop double encoding systemd style cgroup names 2017-02-21 16:34:21 -05:00
Kubernetes Submit Queue b201ac2f8f Merge pull request #41457 from alejandroEsc/ae/kubelet/debug2
Automatic merge from submit-queue

Log that debug handlers have been turned on.

**What this PR does / why we need it**: PR allows user to have a message in logs that debug handlers are on. It should allow the operator to know and automate a check for the case where debug has been left on. 

**Release note**:
```
NONE
```
2017-02-21 10:51:32 -08:00
riverzhang 5156b7f8cf Fix some typos 2017-02-21 07:15:40 -06:00
Di Xu 49098d08b7 fix issue #41746 2017-02-21 18:41:27 +08:00
Jeff Peeler 8fb1b71c66 Implements projected volume driver
Proposal: kubernetes/kubernetes#35313
2017-02-20 12:56:04 -05:00
Derek Carr 9a1e30f776 BestEffort QoS class has min cpu shares 2017-02-20 12:28:00 -05:00
Julien Balestra 89e1382dd9 Remove else if else 2017-02-20 18:24:41 +01:00
Julien Balestra ff8fbd4c8b Fix a typo 2017-02-20 18:16:41 +01:00
Harry Zhang cba9a90fd1 Ignore file start with dots 2017-02-20 21:49:42 +08:00
Vishnu kannan 26f9598279 admit critical pods under resource pressure\n evict critical pods that are not static
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-02-19 19:19:09 -08:00
Kubernetes Submit Queue 7236af6162 Merge pull request #39373 from apprenda/fix_configmap
Automatic merge from submit-queue (batch tested with PRs 39373, 41585, 41617, 41707, 39958)

Fix ConfigMaps for Windows

**What this PR does / why we need it**: ConfigMaps were broken for Windows as the existing code used linux specific file paths. Updated the code in `kubelet_getters.go` to use `path/filepath` to get the directories. Also reverted back the code in `secret.go` as updating `kubelet_getters.go` to use `path/filepath` also fixes `secrets`

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/kubernetes/kubernetes/issues/39372

```release-note
Fix ConfigMap for Windows Containers.
```

cc: @pires
2017-02-19 13:50:37 -08:00
Jacob Simpson 855627e5cb Rotate the kubelet certificate when about to expire.
Changes the kubelet so it doesn't use the cert/key files directly for
starting the TLS server. Instead the TLS server reads the cert/key from
the new CertificateManager component, which is responsible for
requesting new certificates from the Certificate Signing Request API on
the API Server.
2017-02-17 17:42:35 -08:00
Minhan Xia 4f21b0280d initialize directory while creating checkpoint file store 2017-02-17 16:56:46 -08:00
Kubernetes Submit Queue 7bbafd259c Merge pull request #41626 from derekwaynecarr/improve-kubelet-volume-logging
Automatic merge from submit-queue (batch tested with PRs 41649, 41658, 41266, 41371, 41626)

Understand why kubelet cannot cleanup orphaned pod dirs

**What this PR does / why we need it**:
Understand if we are unable to clean up orphaned pod directories due to a failure to read the directory versus paths still existing to improve ability to debug error situations.
2017-02-17 16:38:41 -08:00
Jacob Simpson b9f3e91041 Split `RequestNodeCertificate` function.
Split the `RequestNodeCertificate` function so it can be called with
different arguments.
2017-02-17 07:40:48 -08:00
Derek Carr f1b7621f42 kubelet volumes cleanupOrphanedPodDirs does not distinguish error from found volume paths 2017-02-17 09:07:54 -05:00
Kubernetes Submit Queue 98d1cffe05 Merge pull request #37036 from dcbw/docker-gc-teardown-pods
Automatic merge from submit-queue (batch tested with PRs 40505, 34664, 37036, 40726, 41595)

dockertools: call TearDownPod when GC-ing infra pods

The docker runtime doesn't tear down networking when GC-ing pods.
rkt already does so make docker do it too. To ensure this happens,
infra pods are now always GC-ed rather than gating them by
containersToKeep.

This prevents IPAM from leaking when the pod gets killed for
some reason outside kubelet (like docker restart) or when pods
are killed while kubelet isn't running.

Fixes: https://github.com/kubernetes/kubernetes/issues/14940
Related: https://github.com/kubernetes/kubernetes/pull/35572
2017-02-16 17:05:12 -08:00
Kubernetes Submit Queue 05c05de798 Merge pull request #41569 from yujuhong/add_healthcheck
Automatic merge from submit-queue (batch tested with PRs 38101, 41431, 39606, 41569, 41509)

Report node not ready on failed PLEG health check

Report node not ready if PLEG health check fails.
2017-02-16 15:49:18 -08:00
Kubernetes Submit Queue 6376ad134d Merge pull request #39606 from NickrenREN/kubelet-pod
Automatic merge from submit-queue (batch tested with PRs 38101, 41431, 39606, 41569, 41509)

optimize killPod() and syncPod() functions

make sure that one of the two arguments must be non-nil: runningPod, status ,just like the function note says
and judge the return value in syncPod() function before setting podKilled
2017-02-16 15:49:17 -08:00
Kubernetes Submit Queue 4515f72824 Merge pull request #38101 from CaoShuFeng/haripin_nsenter
Automatic merge from submit-queue (batch tested with PRs 38101, 41431, 39606, 41569, 41509)

[hairpin] fix argument of nsenter

**Release note**:

```release-note
None
```

We should use:
	nsenter --net=netnsPath -- -F some_command
instend of:
	nsenter -n netnsPath -- -F some_command
Because "nsenter -n netnsPath" get an error output:
	# nsenter -n /proc/67197/ns/net ip addr
	nsenter: neither filename nor target pid supplied for ns/net

If we really want use -n, we need to use -n in such format:
	# sudo nsenter -n/proc/67197/ns/net ip addr
2017-02-16 15:49:10 -08:00
Dan Williams 20e1cdb97c dockertools: tear down dead infra containers in SyncPod()
Dead infra containers may still have network resources allocated to
them and may not be GC-ed for a long time.  But allowing SyncPod()
to restart an infra container before the old one is destroyed
prevents network plugins from carrying the old network details
(eg IPAM) over to the new infra container.
2017-02-16 13:51:19 -06:00
Dan Williams 4d7d7faa81 dockertools: clean up networking when garbage-collecting pods
The docker runtime doesn't tear down networking when GC-ing pods.
rkt already does so make docker do it too. To ensure this happens,
networking is always torn down for the container even if the
container itself is not deleted.

This prevents IPAM from leaking when the pod gets killed for
some reason outside kubelet (like docker restart) or when pods
are killed while kubelet isn't running.

Fixes: https://github.com/kubernetes/kubernetes/issues/14940
Related: https://github.com/kubernetes/kubernetes/pull/35572
2017-02-16 13:51:19 -06:00
Dan Williams dc2fd511ab dockertools: use network PluginManager to synchronize pod network operations
We need to tear down networking when garbage collecting containers too,
and GC is run from a different goroutine in kubelet.  We don't want
container network operations running for the same pod concurrently.
2017-02-16 13:51:19 -06:00
Dan Williams 4c3cc67385 rkt: use network PluginManager to synchronize pod network operations 2017-02-16 13:51:19 -06:00
Dan Williams aafd5c9ef6 dockershim: use network PluginManager to synchronize pod network operations 2017-02-16 13:51:19 -06:00
Dan Williams 60525801c1 kubelet/network: move mock network plugin to pkg/kubelet/network/testing 2017-02-16 13:48:32 -06:00
Dan Williams 5633d7423a kubelet: add network plugin manager with per-pod operation locking
The PluginManager almost duplicates the network plugin interface, but
not quite since the Init() function should be called by whatever
actually finds and creates the network plugin instance.  Only then
does it get passed off to the PluginManager.

The Manager synchronizes pod-specific network operations like setup,
teardown, and pod network status.  It passes through all other
operations so that runtimes don't have to cache the network plugin
directly, but can use the PluginManager as a wrapper.
2017-02-16 13:48:32 -06:00
Kubernetes Submit Queue 3c606cdd20 Merge pull request #41456 from dashpole/pod_volume_cleanup
Automatic merge from submit-queue (batch tested with PRs 41466, 41456, 41550, 41238, 41416)

Delay Deletion of a Pod until volumes are cleaned up

#41436 fixed the bug that caused #41095 and #40239 to have to be reverted.  Now that the bug is fixed, this shouldn't cause problems.

 @vishh @derekwaynecarr @sjenning @jingxu97 @kubernetes/sig-storage-misc
2017-02-16 10:14:05 -08:00
Yu-Ju Hong 5bb43a3a24 Report node not ready on failed PLEG health check 2017-02-16 09:00:22 -08:00
Kubernetes Submit Queue 11bf535e03 Merge pull request #41434 from freehan/cri-kubenet-error
Automatic merge from submit-queue (batch tested with PRs 41531, 40417, 41434)

[CRI] beef up network teardown in  StopPodSandbox

1. Added CheckpointNotFound error to allow dockershim to conduct error handling
2. Retry network teardown if failed

ref: https://github.com/kubernetes/kubernetes/issues/41225
2017-02-15 23:01:09 -08:00
Kubernetes Submit Queue ddf4a0cad5 Merge pull request #40417 from jsravn/fix-reconciler-external-updates-race
Automatic merge from submit-queue (batch tested with PRs 41531, 40417, 41434)

Always detach volumes in operator executor

**What this PR does / why we need it**:

Instead of marking a volume as detached immediately in Kubelet's
reconciler, delegate the marking asynchronously to the operator
executor. This is necessary to prevent race conditions with other
operations mutating the same volume state.

An example of one such problem:

1. pod is created, volume is added to desired state of the world
2. reconciler process starts
3. reconciler starts MountVolume, which is kicked off asynchronously via
   operation_executor.go
4. MountVolume mounts the volume, but hasn't yet marked it as mounted
5. pod is deleted, volume is removed from desired state of the world
6. reconciler reaches detach volume section, detects volume is no longer in desired state of world,
   removes it from volumes in use
7. MountVolume tries to mark mount, throws an error because
   volume is no longer in actual state of world list. After this, kubelet isn't aware of the mount
   so doesn't try to unmount again.
8. controller-manager tries to detach the volume, this fails because it
   is still mounted to the OS.
9. EBS gets stuck indefinitely in busy state trying to detach.



**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #32881, fixes ##37854 (maybe)

**Special notes for your reviewer**:

**Release note**:

```release-note
```
2017-02-15 23:01:07 -08:00
Alejandro Escobar 024d750370 making log statement clearer. 2017-02-15 19:49:52 -08:00
NickrenREN b40e575076 optimize killPod() and syncPod() functions
make sure that one of the two arguments must be non-nil: runningPod, status ,just like the function note says
and judge the return value in syncPod() function before setting podKilled
2017-02-16 09:13:23 +08:00
Kubernetes Submit Queue d60d8a7b92 Merge pull request #41104 from apprenda/kubeadm_client-go_move
Automatic merge from submit-queue

kubeadm: Migrate to client-go

**What this PR does / why we need it**: Finish the migration for kubeadm to use client-go wherever possible

**Which issue this PR fixes**: fixes #https://github.com/kubernetes/kubeadm/issues/52

**Special notes for your reviewer**: /cc @luxas @pires 

**Release note**:
```release-note
NONE
```
2017-02-15 16:01:22 -08:00
Kubernetes Submit Queue a1afc024cb Merge pull request #34931 from nhlfr/cadvisor-container-info-table
Automatic merge from submit-queue

kubelet: Make cadvisor GetContainerInfo tests table driven
2017-02-15 15:14:28 -08:00
Derek McQuay 70e7d64b46 kubeadm: moved import to client-go, where possible
Some imports dont exist yet (or so it seems) in client-go (examples
being:

  - "k8s.io/kubernetes/pkg/api/validation"
  - "k8s.io/kubernetes/pkg/util/initsystem"
  - "k8s.io/kubernetes/pkg/util/node"

one change in kubelet to import to client-go
2017-02-15 13:06:15 -08:00
Kubernetes Submit Queue 3bc575c91f Merge pull request #33550 from rtreffer/kubelet-allow-multiple-dns-server
Automatic merge from submit-queue

Allow multipe DNS servers as comma-seperated argument for kubelet --dns

This PR explores how kubectls "--dns" could be extended to specify multiple DNS servers for in-cluster PODs. Testing on the local libvirt-coreos cluster shows that multiple DNS server are injected without issues.

Specifying multiple DNS servers increases resilience against
- Packet drops
- Single server failure

I am debugging services that do 50+ DNS requests for a single incoming interactive request, thus highly increase the chance of a slowdown (+5s) due to a single packet drop. Switching to two DNS servers will reduce the impact of the issues (roughly +1s on glibc, 0s on musl, error-rate goes down to error-rate^2).

Note that there is no need to change any runtime related code as far as I know. In the case of "default" dns the /etc/resolv.conf is parsed and multiple DNS server are send to the backend anyway. This only adds the same capability for the clusterFirst case.

I've heard from @thockin that multiple DNS entries are somehow considered. I've no idea what was considered, though. This is what I would like to see for our production use, though.

```release-note
NONE
```
2017-02-15 12:45:32 -08:00
Minhan Xia 4ca2642dd3 update bazel 2017-02-15 10:06:49 -08:00
Minhan Xia 3cc837878f retry StopPodSandbox on Network teardown failure 2017-02-15 10:06:41 -08:00
David Ashpole 1d38818326 Revert "Merge pull request #41202 from dashpole/revert-41095-deletion_pod_lifecycle"
This reverts commit ff87d13b2c, reversing
changes made to 46becf2c81.
2017-02-15 08:44:03 -08:00
Michal Rostecki 4ed087e01f kubelet: Make cadvisor GetContainerInfo tests table driven 2017-02-15 16:15:21 +01:00
Kubernetes Submit Queue dd696683b7 Merge pull request #40647 from NickrenREN/secretManager
Automatic merge from submit-queue (batch tested with PRs 41360, 41423, 41430, 40647, 41352)

optimize NewSimpleSecretManager and cleanupOrphanedPodCgroups
2017-02-15 05:06:11 -08:00
Kubernetes Submit Queue d47ffa08c7 Merge pull request #41423 from yujuhong/better_logging
Automatic merge from submit-queue (batch tested with PRs 41360, 41423, 41430, 40647, 41352)

kubelet: reduce extraneous logging for pods using host network

For pods using the host network, kubelet/shim should not log
error/warning messages when determining the pod IP address.
2017-02-15 05:06:08 -08:00
Kubernetes Submit Queue 3a6fa67f59 Merge pull request #39179 from NickrenREN/killpod
Automatic merge from submit-queue (batch tested with PRs 41196, 41252, 41300, 39179, 41449)

record ReduceCPULimits result err info if err returned

record ReduceCPULimits result err info if err returned for debug
2017-02-15 04:14:15 -08:00
Kubernetes Submit Queue a57967f47b Merge pull request #41436 from dashpole/status_bug
Automatic merge from submit-queue

Fix bug in status manager TerminatePod

In TerminatePod, we previously pass pod.Status to updateStatusInternal.  This is a bug, since it is the original status that we are given.  Not only does it skip updates made to container statuses, but in some cases it reverted the pod's status to an earlier version, since it was being passed a stale status initially.

This was the case in #40239 and #41095.  As shown in #40239, the pod's status is set to running after it is set to failed, occasionally causing very long delays in pod deletion since we have to wait for this to be corrected.

This PR fixes the bug, adds some helpful debugging statements, and adds a unit test for TerminatePod (which for some reason didnt exist before?).

@kubernetes/sig-node-bugs @vish @Random-Liu
2017-02-14 21:03:31 -08:00
Kubernetes Submit Queue c485e76fe0 Merge pull request #41378 from yujuhong/enable_cri
Automatic merge from submit-queue

Make EnableCRI default to true

This change makes kubelet to use the CRI implementation by default,
unless the users opt out explicitly by using --enable-cri=false.
For the rkt integration, the --enable-cri flag will have no effect
since rktnetes does not use CRI.

Also, mark the original --experimental-cri flag hidden and deprecated,
so that we can remove it in the next release. If both flags are specified,
the --enable-cri flag overrides the --experimental-cri flag.
2017-02-14 19:22:36 -08:00
Minhan Xia 012acad32e add CheckpointNotFound error 2017-02-14 16:55:51 -08:00
Kubernetes Submit Queue fe4a254a70 Merge pull request #41176 from tanshanshan/fix-little2
Automatic merge from submit-queue

fix comment 

**What this PR does / why we need it**:

fix comment 

Thanks.

**Special notes for your reviewer**:

**Release note**:

```release-note
```
2017-02-14 16:41:45 -08:00
Yu-Ju Hong fb94f441ce Set EnableCRI to true by default
This change makes kubelet to use the CRI implementation by default,
unless the users opt out explicitly by using --enable-cri=false.
For the rkt integration, the --enable-cri flag will have no effect
since rktnetes does not use CRI.

Also, mark the original --experimental-cri flag hidden and deprecated,
so that we can remove it in the next release.
2017-02-14 16:15:51 -08:00
Yu-Ju Hong 77286c38d3 kubelet: reduce extraneous logging for pods using host network
For pods using the host network, kubelet/shim should not log
error/warning messages when determining the pod IP address.
2017-02-14 16:09:42 -08:00
Alejandro Escobar 5d71eb4b05 added log message to capture that log handlers have been turned on. 2017-02-14 14:58:59 -08:00
David Ashpole c612e09acd use the status we modify, not original 2017-02-14 13:36:20 -08:00
Yu-Ju Hong 9fa1ad29fd kubelet: handle containers in the "created" state 2017-02-14 07:51:35 -08:00
NickrenREN 31bfefca3c optimize NewSimpleSecretManager and cleanupOrphanedPodCgroups
remove NewSimpleSecretManager second return value and cleanupOrphanedPodCgroups's return since they will never return err
2017-02-14 09:47:05 +08:00
James Ravn 9992bd23c2 Mark detached only if no pending operations
To safely mark a volume detached when the volume controller manager is used.

An example of one such problem:

1. pod is created, volume is added to desired state of the world
2. reconciler process starts
3. reconciler starts MountVolume, which is kicked off asynchronously via
   operation_executor.go
4. MountVolume mounts the volume, but hasn't yet marked it as mounted
5. pod is deleted, volume is removed from desired state of the world
6. reconciler detects volume is no longer in desired state of world,
   removes it from volumes in use
7. MountVolume tries to mark volume in use, throws an error because
   volume is no longer in actual state of world list.
8. controller-manager tries to detach the volume, this fails because it
   is still mounted to the OS.
9. EBS gets stuck indefinitely in busy state trying to detach.
2017-02-13 11:51:44 +00:00
Pengfei Ni 81b9064ca4 Fix typo of defualt 2017-02-11 22:28:24 +08:00
Random-Liu 8177030957 Change timeout for ExecSync, RunPodSandbox and PullImage. 2017-02-10 16:09:19 -08:00
Kubernetes Submit Queue e9de1b0221 Merge pull request #40992 from k82cn/rm_empty_line
Automatic merge from submit-queue (batch tested with PRs 41236, 40992)

Removed unnecessarly empty line.
2017-02-10 05:38:42 -08:00
Kubernetes Submit Queue 8188c3cca4 Merge pull request #40796 from wojtek-t/use_node_ttl_in_secret_manager
Automatic merge from submit-queue (batch tested with PRs 40796, 40878, 36033, 40838, 41210)

Implement TTL controller and use the ttl annotation attached to node in secret manager

For every secret attached to a pod as volume, Kubelet is trying to refresh it every sync period. Currently Kubelet has a ttl-cache of secrets of its pods and the ttl is set to 1 minute. That means that in large clusters we are targetting (5k nodes, 30pods/node), given that each pod has a secret associated with ServiceAccount from its namespaces, and with large enough number of namespaces (where on each node (almost) every pod is from a different namespace), that resource in ~30 GETs to refresh all secrets every minute from one node, which gives ~2500QPS for GET secrets to apiserver.

Apiserver cannot keep up with it very easily.

Desired solution would be to watch for secret changes, but because of security we don't want a node watching for all secrets, and it is not possible for now to watch only for secrets attached to pods from my node.

So as a temporary solution, we are introducing an annotation that would be a suggestion for kubelet for the TTL of secrets in the cache and a very simple controller that would be setting this annotation based on the cluster size (the large cluster is, the bigger ttl is). 
That workaround mean that only very local changes are needed in Kubelet, we are creating a well separated very simple controller, and once watching "my secrets" will be possible it will be easy to remove it and switch to that. And it will allow us to reach scalability goals.

@dchen1107 @thockin @liggitt
2017-02-10 00:04:44 -08:00
Kubernetes Submit Queue 76b39431d3 Merge pull request #41147 from derekwaynecarr/improve-eviction-logs
Automatic merge from submit-queue (batch tested with PRs 41074, 41147, 40854, 41167, 40045)

Add debug logging to eviction manager

**What this PR does / why we need it**:
This PR adds debug logging to eviction manager.

We need it to help users understand when/why eviction manager is/is not making decisions to support information gathering during support.
2017-02-09 17:41:41 -08:00
Kubernetes Submit Queue f5c07157a8 Merge pull request #41092 from yujuhong/cri-docker1_10
Automatic merge from submit-queue (batch tested with PRs 41037, 40118, 40959, 41084, 41092)

CRI node e2e: add tests for docker 1.10
2017-02-09 16:44:44 -08:00
David Ashpole b224f83c37 Revert "[Kubelet] Delay deletion of pod from the API server until volumes are deleted" 2017-02-09 08:45:18 -08:00
Wojciech Tyczynski 6c0535a939 Use secret TTL annotation in secret manager 2017-02-09 13:53:32 +01:00
tanshanshan 46b1256d34 fix wrong 2017-02-09 12:19:32 +08:00
Kubernetes Submit Queue 42d8d4ca88 Merge pull request #40948 from freehan/cri-hostport
Automatic merge from submit-queue (batch tested with PRs 40873, 40948, 39580, 41065, 40815)

[CRI] Enable Hostport Feature for Dockershim

Commits:
1. Refactor common hostport util logics and add more tests

2. Add HostportManager which can ADD/DEL hostports instead of a complete sync.

3. Add Interface for retreiving portMappings information of a pod in Network Host interface. 
Implement GetPodPortMappings interface in dockerService. 

4. Teach kubenet to use HostportManager
2017-02-08 14:14:43 -08:00
Derek Carr 0171121486 Add debug logging to eviction manager 2017-02-08 15:01:12 -05:00
Yu-Ju Hong f96611ac45 dockershim: set the default cgroup driver 2017-02-08 10:22:19 -08:00
Minhan Xia be9eca6b51 teach kubenet to use hostport_manager 2017-02-08 09:35:04 -08:00
Minhan Xia bd05e1af2b add portmapping getter into network host 2017-02-08 09:35:04 -08:00
Minhan Xia 8e7219cbb4 add hostport manager 2017-02-08 09:26:52 -08:00
Minhan Xia aabdaa984f refactor hostport logic 2017-02-08 09:26:52 -08:00
David Ashpole 67cb2704c5 delete volumes before pod deletion 2017-02-08 07:34:49 -08:00
Kubernetes Submit Queue c6f30d3750 Merge pull request #41105 from Random-Liu/fix-kuberuntime-log
Automatic merge from submit-queue (batch tested with PRs 38796, 40823, 40756, 41083, 41105)

Let ReadLogs return when there is a read error.

Fixes a bug in kuberuntime log.

Today, @yujuhong found that once we cancel `kubectl logs -f` with `Ctrl+C`, kuberuntime will keep complaining:
```
27939 kuberuntime_logs.go:192] Failed with err write tcp 10.240.0.4:10250->10.240.0.2:53913: write: broken pipe when writing log for log file "/var/log/pods/5bb76510-ed71-11e6-ad02-42010af00002/busybox_0.log": &{timestamp:{sec:63622095387 nsec:625309193 loc:0x484c440} stream:stdout log:[84 117 101 32 70 101 98 32 32 55 32 50 48 58 49 54 58 50 55 32 85 84 67 32 50 48 49 55 10]}
```

This is because kuberuntime keeps writing to the connection even though it is already closed. Actually, kuberuntime should return and report error whenever there is a writing error.

Ref the [docker code](3a4ae1f661/pkg/stdcopy/stdcopy.go (L159-L167))

I'm still creating the cluster and verifying this fix. Will post the result here after that.

/cc @yujuhong @kubernetes/sig-node-bugs
2017-02-08 00:49:51 -08:00
Kubernetes Submit Queue 1b7bdde40b Merge pull request #38796 from chentao1596/kubelet-cni-log
Automatic merge from submit-queue (batch tested with PRs 38796, 40823, 40756, 41083, 41105)

kubelet/network-cni-plugin: modify the log's info

**What this PR does / why we need it**:

  Checking the startup logs of kubelet, i can always find a error like this:
    "E1215 10:19:24.891724    2752 cni.go:163] error updating cni config: No networks found in /etc/cni/net.d"
 
 It will appears, neither i use cni network-plugin or not. 
After analysis codes, i thought it should be a warn log, because it will not produce any actions like as exit or abort, and just ignored when not any valid plugins exit.

 thank you!
2017-02-08 00:49:44 -08:00
Kubernetes Submit Queue 843e6d1cc3 Merge pull request #40770 from apilloud/clientset_interface
Automatic merge from submit-queue (batch tested with PRs 41103, 41042, 41097, 40946, 40770)

Use Clientset interface in KubeletDeps

**What this PR does / why we need it**:
This replaces the Clientset struct with the equivalent interface for the KubeClient injected via KubeletDeps. This is useful for testing and for accessing the Node and Pod status event stream without an API server.

**Special notes for your reviewer**:
Follow up to #4907

**Release note**:

`NONE`
2017-02-07 22:12:39 -08:00
Kubernetes Submit Queue 51b8eb9424 Merge pull request #40946 from yujuhong/docker_sep
Automatic merge from submit-queue (batch tested with PRs 41103, 41042, 41097, 40946, 40770)

dockershim: set security option separators based on the docker version

Also add a version cache to avoid hitting the docker daemon frequently.

This is part of #38164
2017-02-07 22:12:37 -08:00
Random-Liu 65190e2a72 Let ReadLogs return when there is a read error. 2017-02-07 15:43:48 -08:00
Yu-Ju Hong e66dd63b05 Add OWNERS to the dockertools package 2017-02-07 11:31:37 -08:00
Yu-Ju Hong d8e29e782f dockershim: set security option separators based on the docker version
Also add a version cache to avoid hitting the docker daemon frequently.
2017-02-07 11:06:40 -08:00
Kubernetes Submit Queue 98f97496ef Merge pull request #40903 from yujuhong/security_opts
Automatic merge from submit-queue (batch tested with PRs 40971, 41027, 40709, 40903, 39369)

Set docker opt separator correctly for SELinux options

This is based on @pmorie's commit from #40179
2017-02-06 20:57:17 -08:00
Yu-Ju Hong 05c3b8c1cf Set docker opt separator correctly for SELinux options 2017-02-06 14:47:30 -08:00
Kubernetes Submit Queue d4bcf3ede5 Merge pull request #40951 from yujuhong/fix_cri_portforward
Automatic merge from submit-queue (batch tested with PRs 40930, 40951)

Fix CRI port forwarding

Websocket support was introduced #33684, which broke the CRI
implementation. This change fixes it.
2017-02-06 14:27:05 -08:00
Klaus Ma cc26fe6ee9 Removed unnecessarly empty line. 2017-02-06 11:10:34 +08:00
NickrenREN a62da3e22b record ReduceCPULimits result err info if possible
record ReduceCPULimits result err info if possible
2017-02-04 21:12:39 +08:00
Kubernetes Submit Queue a777a8e3ba Merge pull request #39972 from derekwaynecarr/pod-cgroups-default
Automatic merge from submit-queue (batch tested with PRs 40289, 40877, 40879, 39972, 40942)

Rename experimental-cgroups-per-pod flag

**What this PR does / why we need it**:
1. Rename `experimental-cgroups-per-qos` to `cgroups-per-qos`
1. Update hack/local-up-cluster to match `CGROUP_DRIVER` with docker runtime if used.

**Special notes for your reviewer**:
We plan to roll this feature out in the upcoming release.  Previous node e2e runs were running with this feature on by default.  We will default this feature on for all e2es next week.

**Release note**:
```release-note
Rename --experiemental-cgroups-per-qos to --cgroups-per-qos
```
2017-02-04 04:43:08 -08:00
Kubernetes Submit Queue 4796c7b409 Merge pull request #40727 from Random-Liu/handle-cri-in-place-upgrade
Automatic merge from submit-queue

CRI: Handle cri in-place upgrade

Fixes https://github.com/kubernetes/kubernetes/issues/40051.

## How does this PR restart/remove legacy containers/sandboxes?
With this PR, dockershim will convert and return legacy containers and infra containers as regular containers/sandboxes. Then we can rely on the SyncPod logic to stop the legacy containers/sandboxes, and the garbage collector to remove the legacy containers/sandboxes.

To forcibly trigger restart:
* For infra containers, we manually set `hostNetwork` to opposite value to trigger a restart (See [here](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L389))
* For application containers, they will be restarted with the infra container.
## How does this PR avoid extra overhead when there is no legacy container/sandbox?
For the lack of some labels, listing legacy containers needs extra `docker ps`. We should not introduce constant performance regression for legacy container cleanup. So we added the `legacyCleanupFlag`:
* In `ListContainers` and `ListPodSandbox`, only do extra `ListLegacyContainers` and `ListLegacyPodSandbox` when `legacyCleanupFlag` is `NotDone`.
* When dockershim starts, it will check whether there are legacy containers/sandboxes.
  * If there are none, it will mark `legacyCleanupFlag` as `Done`.
  * If there are any, it will leave `legacyCleanupFlag` as `NotDone`, and start a goroutine periodically check whether legacy cleanup is done.
This makes sure that there is overhead only when there are legacy containers/sandboxes not cleaned up yet.

## Caveats
* In-place upgrade will cause kubelet to restart all running containers.
* RestartNever container will not be restarted.
* Garbage collector sometimes keep the legacy containers for a long time if there aren't too many containers on the node. In that case, dockershim will keep performing extra `docker ps` which introduces overhead.
  * Manually remove all legacy containers will fix this.
  * Should we garbage collect legacy containers/sandboxes in dockershim by ourselves? /cc @yujuhong 
* Host port will not be reclaimed for the lack of checkpoint for legacy sandboxes. https://github.com/kubernetes/kubernetes/pull/39903 /cc @freehan 

/cc @yujuhong @feiskyer @dchen1107 @kubernetes/sig-node-api-reviews 
**Release note**:

```release-note
We should mention the caveats of in-place upgrade in release note.
```
2017-02-03 22:17:56 -08:00
Kubernetes Submit Queue f20b4fc67f Merge pull request #40655 from vishh/flag-gate-critical-pod-annotation
Automatic merge from submit-queue

Optionally avoid evicting critical pods in kubelet

For #40573

```release-note
When feature gate "ExperimentalCriticalPodAnnotation" is set, Kubelet will avoid evicting pods in "kube-system" namespace that contains a special annotation - `scheduler.alpha.kubernetes.io/critical-pod`
This feature should be used in conjunction with the rescheduler to guarantee availability for critical system pods - https://kubernetes.io/docs/admin/rescheduler/
```
2017-02-03 16:22:26 -08:00
Yu-Ju Hong bb0eb3c33e Fix CRI port forwarding
Websocket support was introduced #33684, which broke the CRI
implementation. This change fixes it.
2017-02-03 15:29:49 -08:00
Derek Carr 04a909a257 Rename cgroups-per-qos flag to not be experimental 2017-02-03 17:10:53 -05:00
Andrew Pilloud 3f8505022c Use clientset.Interface for KubeClient 2017-02-03 07:36:16 -08:00
Kubernetes Submit Queue 2bb1e75815 Merge pull request #40863 from kubernetes/sttts-big-genericapiserver-move
Automatic merge from submit-queue (batch tested with PRs 40795, 40863)

Move pkg/genericapiserver and pkg/storage to k8s.io/apiserver

approved based on #40363

These must merge first:
- [x] genericvalidation https://github.com/kubernetes/kubernetes/pull/40810
- [x] openapi https://github.com/kubernetes/kubernetes/pull/40829
- [x] episode 7 https://github.com/kubernetes/kubernetes/pull/40853
2017-02-03 03:48:50 -08:00
Kubernetes Submit Queue 0dcc04d698 Merge pull request #40795 from wojtek-t/use_caching_manager
Automatic merge from submit-queue (batch tested with PRs 40795, 40863)

Use caching secret manager in kubelet

I just found that this is in my local branch I'm using for testing, but not in master :)
2017-02-03 03:48:48 -08:00
Dr. Stefan Schimanski 6af3210d6f Update generated files 2017-02-03 08:15:46 +01:00
Dr. Stefan Schimanski 80b96b441b Mechanical import fixup: pkg/storage 2017-02-03 07:33:43 +01:00
Random-Liu b9cf8ebe77 Update bazel. 2017-02-02 15:36:24 -08:00
Random-Liu 626680d289 Add unit test for legacy container cleanup 2017-02-02 15:36:24 -08:00
Random-Liu 14940edaad Add legacy container cleanup 2017-02-02 15:36:24 -08:00
Kubernetes Submit Queue c84f3abca1 Merge pull request #40571 from jcbsmpsn/close-watch
Automatic merge from submit-queue

Release API watch resources when done.
2017-02-02 14:10:10 -08:00
Wojciech Tyczynski 6f3acb801d Fix bug in secret manager. 2017-02-02 21:39:26 +01:00
Vishnu Kannan c967ab7b99 Avoid evicting critical pods in Kubelet if a special feature gate is enabled
Signed-off-by: Vishnu Kannan <vishnuk@google.com>
2017-02-02 11:32:20 -08:00
Vishnu Kannan 6ddb528446 Revert "Sort critical pods before admission"
This reverts commit b7409e0038.
2017-02-02 10:41:24 -08:00
Vishnu Kannan ffd7dda234 Revert "Kubelet admits critical pods even under memory pressure"
This reverts commit afd676d94c.
2017-02-02 10:41:24 -08:00
Vishnu Kannan a3ae8c2b21 Revert "assign -998 as the oom_score_adj for critical pods."
This reverts commit 53931fbce4.
2017-02-02 10:41:21 -08:00
Vishnu Kannan b8a63537dd Revert "Don't evict static pods"
This reverts commit 1743c6b6ab.
2017-02-02 10:29:12 -08:00
Minhan Xia 51526d3103 Add checkpointHandler to DockerService 2017-02-02 10:19:34 -08:00
Minhan Xia 344d2f591f add checkpoint structures for dockershim 2017-02-02 10:18:37 -08:00
Jacob Simpson cf31d9413e Release API watch resources when done. 2017-02-02 10:17:08 -08:00
Wojciech Tyczynski ec6a95a665 Use caching secret manager in kubelet 2017-02-02 15:32:07 +01:00
Kubernetes Submit Queue 0477100f98 Merge pull request #33684 from fraenkel/port_forward_ws
Automatic merge from submit-queue

Add websocket support for port forwarding

#32880

**Release note**:
```release-note
Port forwarding can forward over websockets or SPDY.
```
2017-02-01 23:19:02 -08:00
Michael Fraenkel f07f5a4cc3 Generated code 2017-02-01 18:03:47 -07:00
Michael Fraenkel 93c11422e4 CRI Portforward needs to forward websocket ports
- adjust ports to int32
- CRI flows the websocket ports as query params

- Do not validate ports since the protocol is unknown
  SPDY flows the ports as headers and websockets uses query params
- Only flow query params if there is at least one port query param
2017-02-01 18:03:42 -07:00
Kubernetes Submit Queue 4bffae39cb Merge pull request #40574 from yujuhong/mv_securitycontext
Automatic merge from submit-queue

securitycontext: move docker-specific logic into kubelet/dockertools

This change moves the code specific to docker to kubelet/dockertools,
while leaving the common utility functions at its current package
(pkg/securitycontext).

When we deprecate dockertools in the future, the code will be moved to
pkg/kubelet/dockershim instead.
2017-02-01 15:31:49 -08:00
Rene Treffer 42ff859c27 Allow multipe DNS servers as comma-seperated argument for --dns
Depending on an exact cluster setup multiple dns may make sense.
Comma-seperated lists of DNS server are quite common as DNS servers
are always plain IPs.
2017-02-01 22:38:40 +01:00
Kubernetes Submit Queue 7165fe6e9e Merge pull request #39145 from NickrenREN/podpair
Automatic merge from submit-queue (batch tested with PRs 40758, 39145, 40776)

remove duplicate function notes
2017-02-01 13:30:39 -08:00
Kubernetes Submit Queue cb758738f9 Merge pull request #40265 from feiskyer/cri-verify
Automatic merge from submit-queue

CRI: verify responses from remote runtime

Closes #40264.
2017-02-01 08:41:15 -08:00
Michael Fraenkel beb53fb71a Port forward over websockets
- split out port forwarding into its own package

Allow multiple port forwarding ports
- Make it easy to determine which port is tied to which channel
- odd channels are for data
- even channels are for errors

- allow comma separated ports to specify multiple ports

Add  portfowardtester 1.2 to whitelist
2017-02-01 06:32:04 -07:00
Kubernetes Submit Queue 76550cf2de Merge pull request #40710 from deads2k/client-21-record
Automatic merge from submit-queue (batch tested with PRs 40638, 40742, 40710, 40718, 40763)

move client/record

An attempt at moving client/record to client-go.  It's proving very stubborn and needs a lot manual intervention and near as I can tell, no one actually gets any benefit from the sink and source complexity it adds.

@sttts @caesarchaoxu
2017-01-31 20:40:45 -08:00
Kubernetes Submit Queue d399924b69 Merge pull request #40638 from yujuhong/rm_label
Automatic merge from submit-queue

kuberuntime: remove the kubernetesManagedLabel label

The CRI shim should be responsible for returning only those
containers/sandboxes created through CRI. Remove this label in kubelet.
2017-01-31 19:40:20 -08:00
Kubernetes Submit Queue 32d79c3461 Merge pull request #39443 from freehan/cri-hostport
Automatic merge from submit-queue (batch tested with PRs 40111, 40368, 40342, 40274, 39443)

refactor hostport logic

Refactor hostport logic to avoid using api.Pod object directly.
2017-01-31 19:18:44 -08:00
deads2k a106d9f848 switch kubelet to use external (client-go) object references for events 2017-01-31 19:15:33 -05:00
deads2k 8a12000402 move client/record 2017-01-31 19:14:13 -05:00
Kubernetes Submit Queue 31df7e411c Merge pull request #40527 from php-coder/docker_manager_cleanup
Automatic merge from submit-queue (batch tested with PRs 40527, 40738, 39366, 40609, 40748)

pkg/kubelet/dockertools/docker_manager.go: removing unused stuff

This PR removes unused constants and variables. I checked that neither kubernetes nor openshift code aren't using them.
2017-01-31 15:49:37 -08:00
Kubernetes Submit Queue fe01eef0bb Merge pull request #39242 from NickrenREN/kuberuntime-manager
Automatic merge from submit-queue (batch tested with PRs 40392, 39242, 40579, 40628, 40713)

optimize podSandboxChanged() function and fix some function notes
2017-01-31 01:16:51 -08:00
Kubernetes Submit Queue 564016bd9f Merge pull request #38443 from NickrenREN/vol-mgr-getVolumeInUse
Automatic merge from submit-queue (batch tested with PRs 38443, 40145, 40701, 40682)

fix GetVolumeInUse() function

Since we just want to get volume name info, each volume name just need to added once.  desiredStateOfWorld.GetVolumesToMount() will return volume and pod binding info,
if one volume is mounted to several pods, the volume name will be return several times. That is not what we want in this function.
We can add a new function to only get the volume name info  or judge whether the volume name is added to the desiredVolumesMap array.
2017-01-30 20:59:38 -08:00
Minhan Xia 548a6122c5 rename HostportHandler to HostportSyncer 2017-01-30 16:30:06 -08:00
deads2k c9a008dff3 move util/intstr to apimachinery 2017-01-30 12:46:59 -05:00
Dr. Stefan Schimanski 44ea6b3f30 Update generated files 2017-01-29 21:41:45 +01:00
Dr. Stefan Schimanski 79adb99a13 pkg/api: move Semantic equality to k8s.io/apimachinery/pkg/api/equality 2017-01-29 21:41:45 +01:00
Dr. Stefan Schimanski 88d9829ad5 pkg/kubelet/rkt: adapt to new appc/spec 2017-01-29 21:41:45 +01:00
Dr. Stefan Schimanski bc6fdd925d pkg/api/resource: move to apimachinery 2017-01-29 21:41:44 +01:00
Kubernetes Submit Queue 4bba610565 Merge pull request #40605 from deads2k/generic-32-movehttpstream
Automatic merge from submit-queue

pkg/util: move httpstream to k8s.io/apimachinery

pick one commit from @sttts's pull https://github.com/kubernetes/kubernetes/pull/40426

This blocks some client-go splitting, so I'm picking it out and merging it separately.  It's not my commit, so its not a self-lgtm in that sense.

approved based on https://github.com/kubernetes/kubernetes/issues/40363
2017-01-29 05:15:22 -08:00