Commit Graph

4870 Commits (60604f8818aecbc9c3736fbc32747cc0a535bc80)

Author SHA1 Message Date
Chao Xu 60604f8818 run hack/update-all 2017-06-22 11:31:03 -07:00
Chao Xu f2d3220a11 run root-rewrite-import-client-go-api-types 2017-06-22 11:30:59 -07:00
Chao Xu cde4772928 run ./root-rewrite-all-other-apis.sh, then run make all, pkg/... compiles 2017-06-22 11:30:52 -07:00
Chao Xu f4989a45a5 run root-rewrite-v1-..., compile 2017-06-22 10:25:57 -07:00
Kubernetes Submit Queue 03014f486c Merge pull request #47824 from mbohlool/revert2
Automatic merge from submit-queue (batch tested with PRs 47851, 47824, 47858, 46099)

Revert 44714 manually

#44714 broke backward compatibility for old swagger spec that kubectl still uses. The decision on #47448 was to revert this change but the change was not automatically revertible. Here I semi-manually remove all references to UnixUserID and UnixGroupID and updated generated files accordingly.

Please wait for tests to pass then review that as there may still be tests that are failing.

Fixes #47448

Adding release note just because the original PR has a release note. If possible, we should remove both release notes as they cancel each other.

**Release note**: (removed by caesarxuchao)

UnixUserID and UnixGroupID is reverted back as int64 to keep backward compatibility.
2017-06-21 15:21:14 -07:00
Kubernetes Submit Queue 2f4df7ffa6 Merge pull request #47819 from verult/AlphaStorageStatus
Automatic merge from submit-queue (batch tested with PRs 34515, 47236, 46694, 47819, 47792)

Adding alpha feature gate to node statuses from local storage capacity isolation.

**What this PR does / why we need it**: The Capacity.storage node attribute should not be exposed since it's part of an alpha feature. Added an feature gate.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #47809 

There should be a test for new statuses in the alpha feature. Will include in a different PR.
2017-06-21 13:30:17 -07:00
mbohlool 70c4fe7f4f update generated files 2017-06-21 04:09:08 -07:00
mbohlool c91a12d205 Remove all references to types.UnixUserID and types.UnixGroupID 2017-06-21 04:09:07 -07:00
Kubernetes Submit Queue 8316bbc14c Merge pull request #47818 from Random-Liu/change-cri-package-name
Automatic merge from submit-queue (batch tested with PRs 45268, 47573, 47632, 47818)

Change CRI package name to runtime.

Fixes https://github.com/kubernetes/kubernetes/issues/47814.

@yujuhong @feiskyer /cc @kubernetes/sig-node-bugs
2017-06-20 18:19:02 -07:00
Cheng Xing de3bf36b61 Fixing node statuses related to local storage capacity isolation.
- Wrapping all node statuses from local storage capacity isolation under an alpha feature check. Currently there should not be any storage statuses.
- Replaced all "storage" statuses with "storage.kubernetes.io/scratch". "storage" should never be exposed as a status.
2017-06-20 17:34:59 -07:00
Random-Liu d779e9c956 Change CRI package name to runtime. 2017-06-20 15:43:11 -07:00
Kubernetes Submit Queue cfdbc9c028 Merge pull request #46731 from rmmh/test-only-once
Automatic merge from submit-queue

Don't rerun certificate manager tests 1000 times.

**What this PR does / why we need it**:
Running every testcase 1000 times needlessly bloats the logs.

**Release note**:
```release-note
NONE
```
2017-06-19 17:13:06 -07:00
Kubernetes Submit Queue a73bf4e917 Merge pull request #40284 from chentao1596/sliceutils-unittest
Automatic merge from submit-queue (batch tested with PRs 47669, 40284, 47356, 47458, 47701)

add unit test cases for kubelet.util.sliceutils

What this PR does / why we need it:
I have not found any unit test case for this file, so i do it, thank you!

Fixes #47001
2017-06-19 15:24:59 -07:00
Kubernetes Submit Queue 098e1df3b6 Merge pull request #47290 from jhorwit2/jah/hostpath-psp-backstep-check
Automatic merge from submit-queue (batch tested with PRs 47626, 47674, 47683, 47290, 47688)

validate host paths on the kubelet for backsteps

**What this PR does / why we need it**:

This PR adds validation on the kubelet to ensure the host path does not contain backsteps that could allow the volume to escape the PSP's allowed host paths. Currently, there is validation done at in API server; however, that does not account for mismatch of OS's on the kubelet vs api server. 

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #47107

**Special notes for your reviewer**:

cc @liggitt

**Release note**:


```release-note
Paths containing backsteps (for example, "../bar") are no longer allowed in hostPath volume paths, or in volumeMount subpaths
```
2017-06-16 19:57:01 -07:00
Josh Horwitz 48b3fb84ab do not allow backsteps in host volume plugin
Fixes #47107
2017-06-16 16:48:24 -04:00
Jacob Simpson 334de1cbe1 Auto approve kubelet certificate signing requests. 2017-06-16 08:47:12 -07:00
Kubernetes Submit Queue 509bf69a2d Merge pull request #47612 from freehan/hostport-bug-fix
Automatic merge from submit-queue (batch tested with PRs 47523, 47438, 47550, 47450, 47612)

append KUBE-HOSTPORTS to system chains instead of prepend

Bug fix for conflicting iptables rules between hostport and kube-proxy
2017-06-15 18:54:08 -07:00
Minhan Xia e6add2072b append KUBE-HOSTPORTS to system chains instead of prepend 2017-06-15 12:06:46 -07:00
Kubernetes Submit Queue 562e721ece Merge pull request #47462 from derekwaynecarr/strip-container-id-events
Automatic merge from submit-queue

Strip container id from events

**What this PR does / why we need it**:
reduces spam events from kubelet in bad pod scenarios

**Which issue this PR fixes**:
relates to https://github.com/kubernetes/kubernetes/issues/47366

**Special notes for your reviewer**:
pods in permanent failure states created unique events

**Release note**:
```release-note
None
```
2017-06-14 23:26:01 -07:00
Casey Callendrello 14ad62b924 cni: Don't try and map ports with an unset HostPort 2017-06-14 22:31:42 +02:00
Derek Carr 36619fa217 surface rpc error desc only in events 2017-06-13 23:42:15 -04:00
Derek Carr 4a5a221d8f parse executable not found error 2017-06-13 23:31:56 -04:00
Derek Carr a02f10fa3a Strip containerID from events to reduce spam 2017-06-13 23:31:56 -04:00
Kubernetes Submit Queue 22dc980aa4 Merge pull request #46823 from dcbw/fix-up-runtime-GetNetNS2
Automatic merge from submit-queue (batch tested with PRs 46441, 43987, 46921, 46823, 47276)

kubelet/network: report but tolerate errors returned from GetNetNS() v2

Runtimes should never return "" and nil errors, since network plugin
drivers need to treat netns differently in different cases. So return
errors when we can't get the netns, and fix up the plugins to do the
right thing.

Namely, we don't need a NetNS on pod network teardown. We do need
a netns for pod Status checks and for network setup.

V2: don't return errors from getIP(), since they will block pod status :(  Just log them.  But even so, this still fixes the original problem by ensuring we don't log errors when the network isn't ready.

@freehan @yujuhong 

Fixes: https://github.com/kubernetes/kubernetes/issues/42735
Fixes: https://github.com/kubernetes/kubernetes/issues/44307
2017-06-13 13:55:50 -07:00
Kubernetes Submit Queue 17244ea5d9 Merge pull request #47124 from andyxning/remove_sync_loop_health_check
Automatic merge from submit-queue (batch tested with PRs 47000, 47188, 47094, 47323, 47124)

fix sync loop health check

This PR will do error logging about the fall behind sync for kubelet instead of sync loop healthz checking.

The reason is kubelet can not do sync loop and therefore can not update sync loop time when there is any runtime error, such as docker hung. 

When there is any runtime error, according to current implementation, kubelet will not do sync operation and thus kubelet's sync loop time will not be updated. This will make when there is any runtime error, kubelet will also return non 200 response status code when accessing healthz endpoint. This is contrary with #37865 which prevents kubelet from being killed when docker hangs.

**Release note**:
```release-note
fix sync loop health check with seperating runtime errors
```

/cc @yujuhong @Random-Liu @dchen1107
2017-06-12 18:19:51 -07:00
Dan Williams f76cc7642c dockershim: don't spam logs with pod IP errors before networking is ready
GenericPLEG's 1s relist() loop races against pod network setup.  It
may be called after the infra container has started but before
network setup is done, since PLEG and the runtime's SyncPod() run
in different goroutines.

Track network setup status and don't bother trying to read the pod's
IP address if networking is not yet ready.

See also: https://bugzilla.redhat.com/show_bug.cgi?id=1434950

Mar 22 12:18:17 ip-172-31-43-89 atomic-openshift-node: E0322
   12:18:17.651013   25624 docker_manager.go:378] NetworkPlugin
   cni failed on the status hook for pod 'pausepods22' - Unexpected
   command output Device "eth0" does not exist.
2017-06-12 15:07:38 -05:00
Dan Williams 45dffed8ac kubelet/network: return but tolerate errors returned from GetNetNS()
Runtimes should never return "" and nil errors, since network plugin
drivers need to treat netns differently in different cases.  So return
errors when we can't get the netns, and fix up the plugins to do the
right thing.

Namely, we don't need a NetNS on pod network teardown.  We do need
a netns for pod Status checks and for network setup.
2017-06-12 14:46:13 -05:00
Dan Williams 72710b7542 Revert "Return empty network namespace if the infra container has exited"
This reverts commit fee4c9a7d9.

This is not the correct fix for the problem; and it causes other problems
like continuous:

docker_sandbox.go:234] NetworkPlugin cni failed on the status hook for pod
"someotherdc-1-deploy_default": Unexpected command output nsenter: cannot
open : No such file or directory with error: exit status 1

Because GetNetNS() is returning an empty network namespace.  That is
not helpful nor should really be allowed; that's what the error return
from GetNetNS() is for.
2017-06-12 14:46:13 -05:00
Dong Liu a82b8f1094 Fix hostconfig device map logic in dockershim. 2017-06-12 11:15:46 +08:00
Andy Xie 96cb43993a fix sync loop health check 2017-06-10 11:25:59 +08:00
Dawn Chen 2a5ac62dd4 Merge pull request #47212 from MrHohn/kubelet-iptables-lock
Make kubelet touch iptables lock file during initialization
2017-06-09 16:44:00 -07:00
Zihong Zheng d5c9d27ed7 Make kubelet touch iptables lock file during initialization 2017-06-09 09:34:48 -07:00
Pengfei Ni 22e99504d7 Update CRI references 2017-06-09 10:16:40 +08:00
Pengfei Ni 9cc2ecc347 CRI: rename package name to pkg/kubelet/apis/cri/v1alpha1/runtime 2017-06-09 10:13:34 +08:00
Kubernetes Submit Queue 69a9759d90 Merge pull request #46744 from karataliu/wincri4
Automatic merge from submit-queue

Support windows in dockershim

**What this PR does / why we need it**:
This is the 2nd part for https://github.com/kubernetes/kubernetes/issues/45927 .

The non-cri implementation dockertools was removed from kubelet v1.7 .
Part of previous work for supporting windows container lies in v1.6 dockertools, this PR is to port them to dockershim.

Main reference file in v1.6 dockertools windows support:
https://github.com/kubernetes/kubernetes/blob/v1.6.4/pkg/kubelet/dockertools/docker_manager_windows.go

**Which issue this PR fixes**
45927, for now catching up the implementation of v1.6

**Special notes for your reviewer**:
The code change includes 4 parts, put them together as we discussed in https://github.com/kubernetes/kubernetes/pull/46089

1. Update go-winio package to a newer version
  'go-winio' package is used by docker client.
  This change is to bring the support for Go v1.8, specifically included in the PR: https://github.com/Microsoft/go-winio/pull/48 
Otherwise it will produce a lot of error like in: https://github.com/fsouza/go-dockerclient/issues/648 

2. Add os dependent getSecurityOpts helper method. 
seccomp not supported on windows
  Corresponding code in v1.6: https://github.com/kubernetes/kubernetes/blob/v1.6.4/pkg/kubelet/dockertools/docker_manager_windows.go#L78

3. Add updateCreateConfig.
Allow user specified network mode setting. This is to be compatible with what kube-proxy package does on Windows. 
  Also, there is a Linux section in both sandbox config and container config: LinuxPodSandboxConfig, LinuxContainerConfig.
And that section later goes to Config and HostConfig section under docker container createConfig. Ideally hostconfig section should be dependent on host os, while config should depend on container image os.
  To simplify the case, here it assumes that windows host supports windows type container image only. It needs to be updated when kubernetes is to support windows host running linux container image or the like.
  Corresponding code in v1.6: https://github.com/kubernetes/kubernetes/blob/v1.6.4/pkg/kubelet/dockertools/docker_manager_windows.go#L57

4. Add podIpCache in dockershim. 
  For v1.6 windows implementation, it still does not use sandbox, thus only allow single container to be exposed.
  Here added a cache for saving container IP, to get adapted to the new CRI api.
Corresponding code in v1.6:
No sandbox: https://github.com/kubernetes/kubernetes/blob/v1.6.4/pkg/kubelet/dockertools/docker_manager_windows.go#L66
Use container id as pod ip: https://github.com/kubernetes/kubernetes/blob/v1.6.4/pkg/kubelet/dockertools/docker_manager.go#L2727

**Release note**:
2017-06-07 20:03:19 -07:00
Kubernetes Submit Queue 56baaaae73 Merge pull request #46087 from tianshapjq/gpu-info-error-in-restart
Automatic merge from submit-queue (batch tested with PRs 45877, 46846, 46630, 46087, 47003)

gpusInUse info error when kubelet restarts

**What this PR does / why we need it**:
In my test, I found 2 errors in the nvidia_gpu_manager.go.
1. the number of activePods in gpusInUse() equals to 0 when kubelet restarts. It seems the Start() method was called before pods recovery which caused this error. So I decide not to call gpusInUse() in the Start() function, just let it happen when new pod needs to be created.
2. the container.ContainerID in line 242 returns the id in format of "docker://<container_id>", this will make the client failed to inspect the container by id. We have to erase the prefix of "docker://".

**Special notes for your reviewer**:

**Release note**:

```
Avoid assigning the same GPU to multiple containers.
```
2017-06-07 17:55:50 -07:00
Kubernetes Submit Queue 9567a4dfb6 Merge pull request #46846 from carlory/fix
Automatic merge from submit-queue (batch tested with PRs 45877, 46846, 46630, 46087, 47003)

func parseEndpointWithFallbackProtocol should check if protocol of endpoint is empty

**What this PR does / why we need it**:
func parseEndpointWithFallbackProtocol should check if protocol of endpoint is empty
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: part of #45927
NONE
**Special notes for your reviewer**:
NONE
**Release note**:

```release-note
NONE
```
2017-06-07 17:55:46 -07:00
Kubernetes Submit Queue 69342bd1df Merge pull request #43005 from cmluciano/cml/consolidatesysctl
Automatic merge from submit-queue (batch tested with PRs 43005, 46660, 46385, 46991, 47103)

Consolidate sysctl commands for kubelet

**What this PR does / why we need it**:
These commands are important enough to be in the Kubelet itself.
By default, Ubuntu 14.04 and Debian Jessie have these set to 200 and
20000. Without this setting, nodes are limited in the number of
containers that they can start.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #26005 

**Special notes for your reviewer**:
I had a difficult time writing tests for this. It is trivial to create a fake sysctl for testing, but the Kubelet does not have any tests for the prior settings.
**Release note**:

```release-note
```
2017-06-07 13:30:54 -07:00
Ryan Hitchman 49987707a7 Don't rerun certificate manager tests as subtests 1000 times.
Instead, run the core verification repeatedly.
2017-06-06 13:32:04 -07:00
Kubernetes Submit Queue 0538023e86 Merge pull request #47009 from yujuhong/run-as-non-root
Automatic merge from submit-queue (batch tested with PRs 46775, 47009)

kuberuntime: check the value of RunAsNonRoot when verifying

The verification function is fixed to check the value of RunAsNonRoot,
not just the existence of it. Also adds unit tests to verify the correct
behavior.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #46996

**Release note**:

```release-note
Fix the bug where container cannot run as root when SecurityContext.RunAsNonRoot is false.
```
2017-06-06 07:17:39 -07:00
carlory 7831085e13 func parseEndpointWithFallbackProtocol should check if protocol of endpoint is empty. 2017-06-06 17:28:15 +08:00
Yu-Ju Hong d152e20f41 Address the comments 2017-06-05 19:51:55 -07:00
Yu-Ju Hong 07a67c252c kuberuntime: check the value of RunAsNonRoot when verifying
The verification function is fixed to check the value of RunAsNonRoot,
not just the existence of it. Also adds unit tests to verify the correct
behavior.
2017-06-05 18:03:32 -07:00
Jing Xu 0b13aee0c0 Add EmptyDir Volume and local storage for container overlay Isolation
This PR adds two features:
1. add support for isolating the emptyDir volume use. If user
sets a size limit for emptyDir volume, kubelet's eviction manager
monitors its usage
and evict the pod if the usage exceeds the limit.
2. add support for isolating the local storage for container overlay. If
the container's overly usage exceeds the limit defined in container
spec, eviction manager will evict the pod.
2017-06-05 12:05:48 -07:00
Kubernetes Submit Queue af64e0b8c9 Merge pull request #46759 from zjj2wry/kubelet
Automatic merge from submit-queue (batch tested with PRs 46734, 46810, 46759, 46259, 46771)

Improve code coverage for pkg/kubelet/images/image_gc_manager

**What this PR does / why we need it**:
#39559 #40780

code coverage from 74.5% to 77.4%

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2017-06-05 06:51:25 -07:00
Kubernetes Submit Queue 3fdf6c3d14 Merge pull request #45896 from dashpole/disk_pressure_reclaim
Automatic merge from submit-queue

Delete all dead containers and sandboxes when under disk pressure.

This PR modifies the eviction manager to add dead container and sandbox garbage collection as a resource reclaim function for disk.  It also modifies the container GC logic to allow pods that are terminated, but not deleted to be removed.

It still does not delete containers that are less than the minGcAge.  This should prevent nodes from entering a permanently bad state if the entire disk is occupied by pods that are terminated (in the state failed, or succeeded), but not deleted.

There are two improvements we should consider making in the future:

- Track the disk space and inodes reclaimed by deleting containers.  We currently do not track this, and it prevents us from determining if deleting containers resolves disk pressure.  So we may still evict a pod even if we are able to free disk space by deleting dead containers.
- Once we can track disk space and inodes reclaimed, we should consider only deleting the containers we need to in order to relieve disk pressure.  This should help avoid a scenario where we try and delete a massive number of containers all at once, and overwhelm the runtime.

/assign @vishh 
cc @derekwaynecarr 

```release-note
Disk Pressure triggers the deletion of terminated containers on the node.
```
2017-06-03 23:43:46 -07:00
Kubernetes Submit Queue b641aedcac Merge pull request #46371 from sjenning/fix-liveness-probe-reset
Automatic merge from submit-queue

reset resultRun on pod restart

xref https://bugzilla.redhat.com/show_bug.cgi?id=1455056

There is currently an issue where, if the pod is restarted due to liveness probe failures exceeding failureThreshold, the failure count is not reset on the probe worker.  When the pod restarts, if the liveness probe fails even once, the pod is restarted again, not honoring failureThreshold on the restart.

```yaml
apiVersion: v1
kind: Pod
metadata:
  name: busybox
spec:
  containers:
  - name: busybox
    image: busybox
    command:
    - sleep
    - "3600"
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
      initialDelaySeconds: 3
      timeoutSeconds: 1
      periodSeconds: 3
      successThreshold: 1
      failureThreshold: 5
  terminationGracePeriodSeconds: 0
```

Before this PR:
```
$ kubectl create -f busybox-probe-fail.yaml 
pod "busybox" created
$ kubectl get pod -w
NAME      READY     STATUS    RESTARTS   AGE
busybox   1/1       Running   0          4s
busybox   1/1       Running   1         24s
busybox   1/1       Running   2         33s
busybox   0/1       CrashLoopBackOff   2         39s
```

After this PR:
```
$ kubectl create -f busybox-probe-fail.yaml
$ kubectl get pod -w
NAME      READY     STATUS              RESTARTS   AGE
busybox   0/1       ContainerCreating   0          2s
busybox   1/1       Running   0         4s
busybox   1/1       Running   1         27s
busybox   1/1       Running   2         45s
```

```release-note
Fix kubelet reset liveness probe failure count across pod restart boundaries
```

Restarts are now happen at even intervals.

@derekwaynecarr
2017-06-03 15:15:49 -07:00
Kubernetes Submit Queue ebb4b0f7c6 Merge pull request #46494 from xiangpengzhao/fix-pod-manifest
Automatic merge from submit-queue (batch tested with PRs 46782, 46719, 46339, 46609, 46494)

Do not log the content of pod manifest if parsing fails.

**What this PR does / why we need it**:
- ~~only accepts text/plain config file~~
- ~~not log config file content when it's invalid~~

Do not log the content of pod manifest if parsing fails.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #46493

**Special notes for your reviewer**:
/cc @yujuhong 

@sig-node-reviewers

**Release note**:

```release-note
NONE
```
2017-06-03 12:32:42 -07:00
Kubernetes Submit Queue 747b3b1b0c Merge pull request #46609 from abhinavdahiya/fix_inconsistent_path_order_cni
Automatic merge from submit-queue (batch tested with PRs 46782, 46719, 46339, 46609, 46494)

Fix inconsistency in finding cni binaries

Fixes [#46476]

Signed-off-by: Abhinav Dahiya <abhinav.dahiya@coreos.com>



**What this PR does / why we need it**:
This fixes the inconsistency in finding the appropriate cni binaries. 

Currently `lo` cniNetwork follows vendorCniDir > binDir whereas default for all others is binDir > vendorCniDir. This PR makes vendorCniDir > binDir as default behavior.

**Why we need it**:
Hypercube right now ships cni binaries in /opt/cni/bin. 
And to use latest version of calico you need to override kubelet's /opt/cni/bin from host which means all other cni plugins (flannel, loopback etc...) have to be mounted from host too. Keeping vendordir at higher order allows easy installation of newer versions of plugins.
2017-06-03 12:32:41 -07:00
Kubernetes Submit Queue 0bcd9602b4 Merge pull request #46620 from enxebre/kuberuntime-test-coverage
Automatic merge from submit-queue (batch tested with PRs 46620, 46732, 46773, 46772, 46725)

Improving test coverage for kubelet/kuberuntime.

**What this PR does / why we need it**:
Increases test coverage for kubelet/kuberuntime 
https://github.com/kubernetes/kubernetes/issues/46123

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
https://github.com/kubernetes/kubernetes/issues/46123

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2017-06-03 11:39:38 -07:00