Commit Graph

3873 Commits (5658addb9bf6eddbb9f6ddfa215c1ee3f34b263a)

Author SHA1 Message Date
Kubernetes Submit Queue c4b33f3be3 Merge pull request #37661 from yujuhong/always_add_pods
Automatic merge from submit-queue

kubelet: don't reject pods without adding them to the pod manager

kubelet relies on the pod manager as a cache of the pods in the apiserver (and
other sources) . The cache should be kept up-to-date even when rejecting pods.
Without this, kubelet may decide at any point to drop the status update
(request to the apiserver) for the rejected pod since it would think the pod no
longer exists in the apiserver.

This should fix #37658
2016-11-30 21:59:12 -08:00
Kubernetes Submit Queue 2ed490e15b Merge pull request #37255 from jingxu97/Nov/nfshung
Automatic merge from submit-queue

remove checking mount point in cleanupOrphanedPodDirs

To avoid nfs hung problem, remove the mountpoint checking code in
cleanupOrphanedPodDirs(). This removal should still be safe because it checks whether there are still directories under pod's volume and if so, do not delete the pod directory.

Note: After removing the mountpoint check code in cleanupOrphanedPodDirs(), the directories might not be cleaned up in such situation.
1. delete pod, kubelet reconciler tries to unmount the volume directory successfully
2. before reconciler tries to delete the volume directory, kubelet gets retarted
3. since under pod directory, there are still volume directors exist (but not mounted), cleanupOrphanedPodDIrs() will not clean them up.

Will work on a follow up PR to solve above issue.
2016-11-30 21:11:13 -08:00
Yu-Ju Hong 69caf533f0 kubelet: don't reject pods without adding them to the pod manager
kubelet relies on the pod manager as a cache of the pods in the apiserver (and
other sources) . The cache should be kept up-to-date even when rejecting pods.
Without this, kubelet may decide at any point to drop the status update
(request to the apiserver) for the rejected pod since it would think the pod no
longer exists in the apiserver.

Also check if the pod to-be-admitted has terminated or not. In the case where
it has terminated, skip the admission process completely.
2016-11-30 18:05:17 -08:00
Kubernetes Submit Queue 737edd02a4 Merge pull request #35258 from feiskyer/package-aliase
Automatic merge from submit-queue

Fix package aliases to follow golang convention

Some package aliases are not not align with golang convention https://blog.golang.org/package-names. This PR fixes them. Also adds a verify script and presubmit checks.

Fixes #35070.

cc/ @timstclair @Random-Liu
2016-11-30 16:39:46 -08:00
Jing Xu 041fa6477b remove checking mount point in cleanupOrphanedPodDirs
To avoid nfs hung problem, remove the mountpoint checking code in
cleanupOrphanedPodDirs(). This removal should still be safe.
2016-11-30 13:46:39 -08:00
Kubernetes Submit Queue ef079a316e Merge pull request #37535 from yarntime/fix_typo_in_volume_manager
Automatic merge from submit-queue

fix typo in volume_manager

fix typo in volume_manager.
2016-11-30 01:26:36 -08:00
Pengfei Ni f584ed4398 Fix package aliases to follow golang convention 2016-11-30 15:40:50 +08:00
Chao Xu 8554d8e0db fix concurrent read/write to map error caused by SetInitContainersAndStatuses in kubelet 2016-11-28 11:56:21 -08:00
yarntime 8447b9f940 fix typo in volumeme_manager 2016-11-28 11:46:22 +08:00
Clayton Coleman 35a6bfbcee
generated: refactor 2016-11-23 22:30:47 -06:00
Chao Xu bcc783c594 run hack/update-all.sh 2016-11-23 15:53:09 -08:00
Chao Xu b50367cbdc remove v1.Semantics 2016-11-23 15:53:09 -08:00
Chao Xu 5e1adf91df cmd/kubelet 2016-11-23 15:53:09 -08:00
Derek Carr 1ec69f658c Fix cross-build for memcg notification 2016-11-23 12:36:04 -05:00
Kubernetes Submit Queue f8d8831c71 Merge pull request #32577 from sjenning/memcg-notification-wip
Automatic merge from submit-queue

kubelet: eviction: add memcg threshold notifier to improve eviction responsiveness

This PR adds the ability for the eviction code to get immediate notification from the kernel when the available memory in the root cgroup falls below a user defined threshold, controlled by setting the `memory.available` siginal with the `--eviction-hard` flag.

This PR by itself, doesn't change anything as the frequency at which new stats can be obtained is currently controlled by the cadvisor housekeeping interval.  That being the case, the call to `synchronize()` by the notification loop will very likely get stale stats and not act any more quickly than it does now.

However, whenever cadvisor does get on-demand stat gathering ability, this will improve eviction responsiveness by getting async notification of the root cgroup memory state rather than relying on polling cadvisor.

@vishh @derekwaynecarr @kubernetes/rh-cluster-infra
2016-11-22 19:05:54 -08:00
gmarek 795961f7e7 Add more logging around Pod deletion 2016-11-21 11:20:48 +01:00
Pengfei Ni 8322e5091e CRI: address knows issues of seccomp 2016-11-19 08:35:13 +08:00
Kubernetes Submit Queue eca9e989a3 Merge pull request #36779 from sjenning/fix-memory-leak-via-terminated-pods
Automatic merge from submit-queue

fix leaking memory backed volumes of terminated pods

Currently, we allow volumes to remain mounted on the node, even though the pod is terminated.  This creates a vector for a malicious user to exhaust memory on the node by creating memory backed volumes containing large files.

This PR removes memory backed volumes (emptyDir w/ medium Memory, secrets, configmaps) of terminated pods from the node.

@saad-ali @derekwaynecarr
2016-11-17 21:29:51 -08:00
Kubernetes Submit Queue 4241a42ad5 Merge pull request #36965 from Random-Liu/fix-truncate-hostname
Automatic merge from submit-queue

Fix hostname truncate.

Fixes https://github.com/kubernetes/kubernetes/issues/36951.

This PR will keep truncating the hostname until the ending character is valid.

/cc @kubernetes/sig-node 

Mark v1.5 because this is a bug fix.
/cc @saad-ali
2016-11-17 02:03:31 -08:00
Random-Liu e9f1b0f972 Fix hostname truncate. 2016-11-16 18:09:31 -08:00
Yu-Ju Hong 5c90908eb0 dockershim: remove container upon naming conflicts
We have observed that, after failing to create a container due to "device or
resource busy", docker may end up having inconsistent internal state. One
symptom is that docker will not report the existence of the "failed to create"
container, but if kubelet tries to create a new container with the same name,
docker will error out with a naming conflict message.

To work around this, this commit parses the creation error message and if there
is a naming conflict, it would attempt to remove the existing container.
2016-11-16 10:20:16 -08:00
Seth Jennings b80bea4a62 fix leaking memory backed volumes of terminated pods 2016-11-16 10:17:22 -06:00
Kubernetes Submit Queue 193622b31f Merge pull request #36728 from feiskyer/sysctls-docs
Automatic merge from submit-queue

CRI: add docs for sysctls

#34830 adds `sysctls` features in CRI, it is based on sandbox annotations, this PR adds docs for it. 

@yujuhong @timstclair @jonboulle
2016-11-16 02:58:42 -08:00
Kubernetes Submit Queue f4a7b64bf1 Merge pull request #36542 from Random-Liu/clarify-cri-user
Automatic merge from submit-queue

CRI: Clarify User in CRI.

Addressed https://github.com/kubernetes/kubernetes/pull/36423#issuecomment-259343135.

This PR clarifies the user related fields in CRI.

One question is that:
What is the meaning of the `run_as_user` field in `LinuxSandboxSecurityContext`?
* **Is it user on the host?** Then it doesn't make sense, user shouldn't care about what users are on the host.
* **Is it user inside the infra container image?** This is how the field is currently used. However, Infra container is docker specific, I'm not sure whether we should expose this in CRI.
* **Is it the default user inside the pod?** It tells runtime that if there is a container (infra container, or some other helper containers like streaming container etc.), if their `user` is not specified, use the default "sandbox user". Then how can we guarantee that infra or helper container image have the `user`?
* **It doesn't make sense?** If we remove it, we are relying on the shim to set right user (maybe always root) for infra or helper containers (if there will be any in the future), I'm not sure whether this is what we expect.

@yujuhong @feiskyer @jonboulle @yifan-gu 
/cc @kubernetes/sig-node
2016-11-16 01:45:37 -08:00
Random-Liu 2ce5deb6fd Add separate username field in CRI and use it. 2016-11-15 16:50:02 -08:00
Random-Liu c79b8afe5b Clarify user fields in CRI 2016-11-15 16:50:02 -08:00
mdshuai 2189acdd4f [kubelet]update --cgroups-per-qos to --experimental-cgroups-per-qos 2016-11-15 15:55:47 +08:00
Kubernetes Submit Queue 3245e8b355 Merge pull request #36767 from vishh/rename-cgroups-flags
Automatic merge from submit-queue

[kubelet] rename --cgroups-per-qos to --experimental-cgroups-per-qos

This reflects the true nature of "cgroups per qos" feature.

```release-note
 * Rename `--cgroups-per-qos` to `--experimental-cgroups-per-qos` in Kubelet
```
2016-11-14 17:35:19 -08:00
Kubernetes Submit Queue c5c461df38 Merge pull request #36664 from yujuhong/fix_comments
Automatic merge from submit-queue

dockershim: clean up comments
2016-11-14 14:26:51 -08:00
Vishnu kannan 9066253491 [kubelet] rename --cgroups-per-qos to --experimental-cgroups-per-qos to reflect the true nature of that feature
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2016-11-14 14:06:39 -08:00
Yu-Ju Hong b73dfe02b5 dockershim: clean up comments 2016-11-14 12:03:00 -08:00
pweil- d0d78f478c experimental host user ns defaulting 2016-11-14 10:16:03 -05:00
Pengfei Ni 38955897f7 CRI: add docs for sysctls 2016-11-14 12:19:52 +08:00
Kubernetes Submit Queue 8b37baa926 Merge pull request #36616 from jingxu97/Nov/reconstruct-fix
Automatic merge from submit-queue

fix issue in reconstruct volume data when kubelet restarts

During state reconstruction when kubelet restarts, outerVolueSpecName
cannot be recovered by scanning the disk directories. But this
information is used by volume manager to check whether pod's volume is
mounted or not. There are two possible cases:
1. pod is not deleted during kubelet restarts so that desired state
should have the information. reconciler.updateState() will use this
inforamtion to update.
2. pod is deleted during this period, reconciler has to use
InnerVolumeSpecName, but it should be ok since this information will not
be used for volume cleanup (umount)
2016-11-11 11:38:59 -08:00
Kubernetes Submit Queue 6ec02394ab Merge pull request #36448 from jonboulle/criclean
Automatic merge from submit-queue

CRI: general grammar/spelling/consistency cleanup

No semantic changes, but a lot of shuffling of docstrings to make things
more consistent. In particular, standardise on the zeroth-article (i.e.
prefer `// Version` to `// The version`) and ending all docstrings with
periods.


(This knowingly conflicts with #36446 and intentionally omits changing the
Annotations field - I'll rebase this or that respectively as necessary.)
2016-11-10 17:10:12 -08:00
Jing Xu c124830278 fix issue in reconstruct volume data when kubelet restarts
During state reconstruction when kubelet restarts, outerVolueSpecName
cannot be recovered by scanning the disk directories. But this
information is used by volume manager to check whether pod's volume is
mounted or not. There are two possible cases:
1. pod is not deleted during kubelet restarts so that desired state
should have the information. reconciler.updateState() will use this
inforamtion to update.
2. pod is deleted during this period, reconciler has to use
InnerVolumeSpecName, but it should be ok since this information will not
be used for volume cleanup (umount)
2016-11-10 16:23:55 -08:00
Kubernetes Submit Queue 89ebb2af43 Merge pull request #36551 from timstclair/cvm-system
Automatic merge from submit-queue

Fix getting cgroup pids

Fixes https://github.com/kubernetes/kubernetes/issues/35214, https://github.com/kubernetes/kubernetes/issues/33232

Verified manually, but I didn't have time to run all the e2e's yet (will check it in the morning).

This should be cherry-picked into 1.4, and merged into 1.5 (/cc @saad-ali )

```release-note
Fix fetching pids running in a cgroup, which caused problems with OOM score adjustments & setting the /system cgroup ("misc" in the summary API).
```

/cc @kubernetes/sig-node
2016-11-10 14:50:11 -08:00
Tim St. Clair 3aaa6fca88
BUILD changes for cgroup pids 2016-11-10 13:08:39 -08:00
Tim St. Clair cb588e823c
Fix getting cgroup pids 2016-11-10 13:08:17 -08:00
Kubernetes Submit Queue 44f672e5e2 Merge pull request #34877 from resouer/e2e-log-path
Automatic merge from submit-queue

Add e2e node test for log path

fixes #34661

A node e2e test to check if container logs files are properly created with right content.

Since the log files under `/var/log/containers` are actually symbolic of docker containers log files, we can not use a pod to mount them in and do check (symbolic doesn't supported by docker volume).

cc @Random-Liu
2016-11-10 08:35:59 -08:00
Kubernetes Submit Queue 9bdff48d5e Merge pull request #36253 from timstclair/klet-stream-config-pr
Automatic merge from submit-queue

Use indirect streaming path for remote CRI shim

Last step for https://github.com/kubernetes/kubernetes/issues/29579

- Wire through the remote indirect streaming methods in the docker remote shim
- Add the docker streaming server as a handler at `<node>:10250/cri/{exec,attach,portforward}`
- Disable legacy streaming for dockershim

Note: This requires PR https://github.com/kubernetes/kubernetes/pull/34987 to work.

Tested manually on an E2E cluster.

/cc @euank @feiskyer @kubernetes/sig-node
2016-11-09 23:29:18 -08:00
Rajat Ramesh Koujalagi d81e216fc6 Better messaging for missing volume components on host to perform mount 2016-11-09 15:16:11 -08:00
Kubernetes Submit Queue 06fa13efd1 Merge pull request #36455 from dims/fix-issue-36454
Automatic merge from submit-queue

Fix build break

Problem introduced in #31996

Fixes #36454
2016-11-09 10:41:54 -08:00
Kubernetes Submit Queue 6515e3573e Merge pull request #34818 from nebril/eviction-test-cleanup
Automatic merge from submit-queue

Cleanup kubelet eviction manager tests

It cleans up kubelet eviction manager tests

Extracted parts of tests that were similar to each other to functions
2016-11-09 02:36:46 -08:00
Jonathan Boulle 37150b6abd CRI: general grammar/spelling/consistency cleanup
No semantic changes, but a lot of shuffling of docstrings to make things
more consistent. In particular, standardise on the zeroth-article (i.e.
prefer `// Version` to `// The version`) and ending all docstrings with
periods.
2016-11-09 07:37:01 +01:00
Kubernetes Submit Queue b600533794 Merge pull request #36423 from Random-Liu/support-root-nobody
Automatic merge from submit-queue

CRI: Support string user name.

https://github.com/kubernetes/kubernetes/pull/33239 and https://github.com/kubernetes/kubernetes/pull/34811 combined together broke the cri e2e test. https://k8s-testgrid.appspot.com/google-gce#gci-gce-cri

The reason is that:
1) In dockershim and dockertools, we assume that `Image.Config.User` should be an integer. However, sometimes when user build the image with `USER nobody:nobody` or `USER root:root`, the field will become `nobody:nobody` and `root:root`. This makes dockershim to always return error.
2) The new kube-dns-autoscaler image is using `USER nobody:nobody`. (See https://github.com/kubernetes-incubator/cluster-proportional-autoscaler/blob/master/Dockerfile.in#L21)

This doesn't break the normal e2e test, because in dockertools [we only inspect image uid if `RunAsNonRoot` is set](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/dockertools/docker_manager.go#L2333-L2338), which is just a coincidence. However, in kuberuntime, [we always inspect image uid first](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kuberuntime/kuberuntime_container.go#L141).

This PR adds literal `root` and `nobody` support. One problem is that `nobody` is not quite the same in different OS distros. Usually it should be `65534`, but some os distro doesn't follow that. For example, Fedora is using `99`. (See https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/Q5GCKZ7Q7PAUQW66EV7IBJGSRJWYXBBH/?sort=date)

Possible solution:
* Option 1: ~~Just use `65534`. This is fine because currently we only need to know whether the user is root or not.~~ Actually, we need to pass the user id to runtime when creating a container.
* Option 2: Return the uid as string in CRI, and let kuberuntime handle the string directly.

This PR is using option 1.

@yujuhong @feiskyer 
/cc @kubernetes/sig-node
/cc @MrHohn
2016-11-08 20:24:31 -08:00
Random-Liu 99ee3f4b76 Add non-numeric user name support. 2016-11-08 16:07:29 -08:00
Davanum Srinivas cf9e9505f3 Fix build break
Problem introduced in #31996

Fixes #36454
2016-11-08 14:23:33 -05:00
Tim St. Clair 7badc1d226
Use indirect streaming path for dockershim & remote CRI runtime 2016-11-08 10:58:38 -08:00
Tim St. Clair 0f028ff660
Remove legacy dockershim streaming 2016-11-08 10:58:38 -08:00