Commit Graph

4884 Commits (857ace8951a0579afbc3ede2631f50481831cdc5)

Author SHA1 Message Date
Kubernetes Submit Queue df41787b1a Merge pull request #47823 from sjenning/fix-waiting-reason
Automatic merge from submit-queue (batch tested with PRs 44058, 48085, 48077, 48076, 47823)

don't pass CRI error through to waiting state reason

Raw gRPC errors are getting into the `Reason` field of the container status `State`, causing it to be output inline on a `kubectl get pod`

xref https://bugzilla.redhat.com/show_bug.cgi?id=1449820

Basically the issue is that the err and msg are reversed in `startContainer()`.  The msg is short and the err is long.  It should be the other way around.

This PR changes `startContainer()` to return a short error that becomes the Reason and the extracted gPRC error description that becomes the Message.

@derekwaynecarr @smarterclayton @eparis
2017-06-26 15:29:33 -07:00
Kubernetes Submit Queue 7800b3ffef Merge pull request #47152 from ublubu/cloud-addresses
Automatic merge from submit-queue

kubelet should let cloud-controller-manager set the node addresses

*Before this change:*

1. cloud-controller-manager sets all the addresses for a node.
2. kubelet on that node replaces these addresses with an incomplete set. (i.e. replace InternalIP and Hostname and delete all other addresses--ExternalIP, etc.)

*After this change:*

kubelet doesn't touch its node's addresses when there is an external cloudprovider.

Fixes #47155

```release-note
NONE
```
2017-06-24 09:31:15 -07:00
Kubernetes Submit Queue d95a8bf66b Merge pull request #47783 from NickrenREN/containerruntime
Automatic merge from submit-queue (batch tested with PRs 47694, 47772, 47783, 47803, 47673)

Make different container runtimes constant

Make different container runtimes constant to avoid hardcode

**Release note**:

```release-note
NONE
```
2017-06-23 08:29:28 -07:00
Kubernetes Submit Queue fcfbfecdfd Merge pull request #47856 from mikedanese/bootstrap-resume
Automatic merge from submit-queue (batch tested with PRs 47915, 47856, 44086, 47575, 47475)

kubelet should resume csr bootstrap

Right now the kubelet creates a new csr object with the same key every
time it restarts during the bootstrap process. It should resume with the
old csr object if it exists. To do this the name of the csr object must
be stable.

Issue https://github.com/kubernetes/kubernetes/issues/47855
2017-06-23 04:06:20 -07:00
Kubernetes Submit Queue 3adb6c630b Merge pull request #47414 from karataliu/wincri5.devwin
Automatic merge from submit-queue (batch tested with PRs 47227, 47119, 46280, 47414, 46696)

Move seccomp helper methods and tests to platform-specific files.

**What this PR does / why we need it**:
Seccomp helper methods are for linux only, move them to linux-specific helper file.

As discussed in https://github.com/kubernetes/kubernetes/pull/46744

**Which issue this PR fixes** 

**Special notes for your reviewer**:

**Release note**:
2017-06-22 23:59:26 -07:00
Kubernetes Submit Queue 467705be00 Merge pull request #47195 from dims/bind-cadvisor-on-kubelet-interface
Automatic merge from submit-queue (batch tested with PRs 47922, 47195, 47241, 47095, 47401)

Run cAdvisor on the same interface as kubelet

**What this PR does / why we need it**:

cAdvisor currently binds to all interfaces. Currently the only
solution is to use iptables to block access to the port. We
are better off making cAdvisor to bind to the interface that
kubelet uses for better security.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

Fixes #11710

**Special notes for your reviewer**:

**Release note**:

```release-note
cAdvisor binds only to the interface that kubelet is running on instead of all interfaces.
```
2017-06-22 21:33:27 -07:00
Dong Liu 4df4ea2bea Move seccomp helper methods and tests to platform-specific files. 2017-06-23 09:49:06 +08:00
Kubernetes Submit Queue dd126ae19c Merge pull request #38431 from NickrenREN/newVolumeMgr-return
Automatic merge from submit-queue

Modify NewVolumeManager() function return value
2017-06-22 16:43:29 -07:00
Mike Danese 627c414c1b kubelet should resume csr bootstrap
Right now the kubelet creates a new csr object with the same key every
time it restarts during the bootstrap process. It should resume with the
old csr object if it exists. To do this the name of the csr object must
be stable. Also using a list watch here eliminates a race condition
where a watch event is missed and the kubelet stalls.
2017-06-22 23:45:15 +02:00
Chao Xu 60604f8818 run hack/update-all 2017-06-22 11:31:03 -07:00
Chao Xu f2d3220a11 run root-rewrite-import-client-go-api-types 2017-06-22 11:30:59 -07:00
Chao Xu cde4772928 run ./root-rewrite-all-other-apis.sh, then run make all, pkg/... compiles 2017-06-22 11:30:52 -07:00
Chao Xu f4989a45a5 run root-rewrite-v1-..., compile 2017-06-22 10:25:57 -07:00
Kubernetes Submit Queue 03014f486c Merge pull request #47824 from mbohlool/revert2
Automatic merge from submit-queue (batch tested with PRs 47851, 47824, 47858, 46099)

Revert 44714 manually

#44714 broke backward compatibility for old swagger spec that kubectl still uses. The decision on #47448 was to revert this change but the change was not automatically revertible. Here I semi-manually remove all references to UnixUserID and UnixGroupID and updated generated files accordingly.

Please wait for tests to pass then review that as there may still be tests that are failing.

Fixes #47448

Adding release note just because the original PR has a release note. If possible, we should remove both release notes as they cancel each other.

**Release note**: (removed by caesarxuchao)

UnixUserID and UnixGroupID is reverted back as int64 to keep backward compatibility.
2017-06-21 15:21:14 -07:00
Kubernetes Submit Queue 2f4df7ffa6 Merge pull request #47819 from verult/AlphaStorageStatus
Automatic merge from submit-queue (batch tested with PRs 34515, 47236, 46694, 47819, 47792)

Adding alpha feature gate to node statuses from local storage capacity isolation.

**What this PR does / why we need it**: The Capacity.storage node attribute should not be exposed since it's part of an alpha feature. Added an feature gate.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #47809 

There should be a test for new statuses in the alpha feature. Will include in a different PR.
2017-06-21 13:30:17 -07:00
mbohlool 70c4fe7f4f update generated files 2017-06-21 04:09:08 -07:00
mbohlool c91a12d205 Remove all references to types.UnixUserID and types.UnixGroupID 2017-06-21 04:09:07 -07:00
Seth Jennings 9fcc25d1ed don't pass CRI error through to waiting state reason 2017-06-20 23:34:08 -05:00
Kubernetes Submit Queue 8316bbc14c Merge pull request #47818 from Random-Liu/change-cri-package-name
Automatic merge from submit-queue (batch tested with PRs 45268, 47573, 47632, 47818)

Change CRI package name to runtime.

Fixes https://github.com/kubernetes/kubernetes/issues/47814.

@yujuhong @feiskyer /cc @kubernetes/sig-node-bugs
2017-06-20 18:19:02 -07:00
Cheng Xing de3bf36b61 Fixing node statuses related to local storage capacity isolation.
- Wrapping all node statuses from local storage capacity isolation under an alpha feature check. Currently there should not be any storage statuses.
- Replaced all "storage" statuses with "storage.kubernetes.io/scratch". "storage" should never be exposed as a status.
2017-06-20 17:34:59 -07:00
Random-Liu d779e9c956 Change CRI package name to runtime. 2017-06-20 15:43:11 -07:00
NickrenREN 6de7e3f3dc Make different container runtimes constant 2017-06-20 19:58:39 +08:00
Kubernetes Submit Queue cfdbc9c028 Merge pull request #46731 from rmmh/test-only-once
Automatic merge from submit-queue

Don't rerun certificate manager tests 1000 times.

**What this PR does / why we need it**:
Running every testcase 1000 times needlessly bloats the logs.

**Release note**:
```release-note
NONE
```
2017-06-19 17:13:06 -07:00
Kubernetes Submit Queue a73bf4e917 Merge pull request #40284 from chentao1596/sliceutils-unittest
Automatic merge from submit-queue (batch tested with PRs 47669, 40284, 47356, 47458, 47701)

add unit test cases for kubelet.util.sliceutils

What this PR does / why we need it:
I have not found any unit test case for this file, so i do it, thank you!

Fixes #47001
2017-06-19 15:24:59 -07:00
NickrenREN 312cd1bbe6 Modify NewVolumeManager() function return value
Since function NewVolumeManager() will always return vm and nil, we do not need the second return value, it will always be nil.
2017-06-17 23:33:12 +08:00
Kubernetes Submit Queue 098e1df3b6 Merge pull request #47290 from jhorwit2/jah/hostpath-psp-backstep-check
Automatic merge from submit-queue (batch tested with PRs 47626, 47674, 47683, 47290, 47688)

validate host paths on the kubelet for backsteps

**What this PR does / why we need it**:

This PR adds validation on the kubelet to ensure the host path does not contain backsteps that could allow the volume to escape the PSP's allowed host paths. Currently, there is validation done at in API server; however, that does not account for mismatch of OS's on the kubelet vs api server. 

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #47107

**Special notes for your reviewer**:

cc @liggitt

**Release note**:


```release-note
Paths containing backsteps (for example, "../bar") are no longer allowed in hostPath volume paths, or in volumeMount subpaths
```
2017-06-16 19:57:01 -07:00
Josh Horwitz 48b3fb84ab do not allow backsteps in host volume plugin
Fixes #47107
2017-06-16 16:48:24 -04:00
Jacob Simpson 334de1cbe1 Auto approve kubelet certificate signing requests. 2017-06-16 08:47:12 -07:00
Kubernetes Submit Queue 509bf69a2d Merge pull request #47612 from freehan/hostport-bug-fix
Automatic merge from submit-queue (batch tested with PRs 47523, 47438, 47550, 47450, 47612)

append KUBE-HOSTPORTS to system chains instead of prepend

Bug fix for conflicting iptables rules between hostport and kube-proxy
2017-06-15 18:54:08 -07:00
Minhan Xia e6add2072b append KUBE-HOSTPORTS to system chains instead of prepend 2017-06-15 12:06:46 -07:00
Kubernetes Submit Queue 562e721ece Merge pull request #47462 from derekwaynecarr/strip-container-id-events
Automatic merge from submit-queue

Strip container id from events

**What this PR does / why we need it**:
reduces spam events from kubelet in bad pod scenarios

**Which issue this PR fixes**:
relates to https://github.com/kubernetes/kubernetes/issues/47366

**Special notes for your reviewer**:
pods in permanent failure states created unique events

**Release note**:
```release-note
None
```
2017-06-14 23:26:01 -07:00
Casey Callendrello 14ad62b924 cni: Don't try and map ports with an unset HostPort 2017-06-14 22:31:42 +02:00
Derek Carr 36619fa217 surface rpc error desc only in events 2017-06-13 23:42:15 -04:00
Derek Carr 4a5a221d8f parse executable not found error 2017-06-13 23:31:56 -04:00
Derek Carr a02f10fa3a Strip containerID from events to reduce spam 2017-06-13 23:31:56 -04:00
Kubernetes Submit Queue 22dc980aa4 Merge pull request #46823 from dcbw/fix-up-runtime-GetNetNS2
Automatic merge from submit-queue (batch tested with PRs 46441, 43987, 46921, 46823, 47276)

kubelet/network: report but tolerate errors returned from GetNetNS() v2

Runtimes should never return "" and nil errors, since network plugin
drivers need to treat netns differently in different cases. So return
errors when we can't get the netns, and fix up the plugins to do the
right thing.

Namely, we don't need a NetNS on pod network teardown. We do need
a netns for pod Status checks and for network setup.

V2: don't return errors from getIP(), since they will block pod status :(  Just log them.  But even so, this still fixes the original problem by ensuring we don't log errors when the network isn't ready.

@freehan @yujuhong 

Fixes: https://github.com/kubernetes/kubernetes/issues/42735
Fixes: https://github.com/kubernetes/kubernetes/issues/44307
2017-06-13 13:55:50 -07:00
Kubernetes Submit Queue 17244ea5d9 Merge pull request #47124 from andyxning/remove_sync_loop_health_check
Automatic merge from submit-queue (batch tested with PRs 47000, 47188, 47094, 47323, 47124)

fix sync loop health check

This PR will do error logging about the fall behind sync for kubelet instead of sync loop healthz checking.

The reason is kubelet can not do sync loop and therefore can not update sync loop time when there is any runtime error, such as docker hung. 

When there is any runtime error, according to current implementation, kubelet will not do sync operation and thus kubelet's sync loop time will not be updated. This will make when there is any runtime error, kubelet will also return non 200 response status code when accessing healthz endpoint. This is contrary with #37865 which prevents kubelet from being killed when docker hangs.

**Release note**:
```release-note
fix sync loop health check with seperating runtime errors
```

/cc @yujuhong @Random-Liu @dchen1107
2017-06-12 18:19:51 -07:00
Dan Williams f76cc7642c dockershim: don't spam logs with pod IP errors before networking is ready
GenericPLEG's 1s relist() loop races against pod network setup.  It
may be called after the infra container has started but before
network setup is done, since PLEG and the runtime's SyncPod() run
in different goroutines.

Track network setup status and don't bother trying to read the pod's
IP address if networking is not yet ready.

See also: https://bugzilla.redhat.com/show_bug.cgi?id=1434950

Mar 22 12:18:17 ip-172-31-43-89 atomic-openshift-node: E0322
   12:18:17.651013   25624 docker_manager.go:378] NetworkPlugin
   cni failed on the status hook for pod 'pausepods22' - Unexpected
   command output Device "eth0" does not exist.
2017-06-12 15:07:38 -05:00
Dan Williams 45dffed8ac kubelet/network: return but tolerate errors returned from GetNetNS()
Runtimes should never return "" and nil errors, since network plugin
drivers need to treat netns differently in different cases.  So return
errors when we can't get the netns, and fix up the plugins to do the
right thing.

Namely, we don't need a NetNS on pod network teardown.  We do need
a netns for pod Status checks and for network setup.
2017-06-12 14:46:13 -05:00
Dan Williams 72710b7542 Revert "Return empty network namespace if the infra container has exited"
This reverts commit fee4c9a7d9.

This is not the correct fix for the problem; and it causes other problems
like continuous:

docker_sandbox.go:234] NetworkPlugin cni failed on the status hook for pod
"someotherdc-1-deploy_default": Unexpected command output nsenter: cannot
open : No such file or directory with error: exit status 1

Because GetNetNS() is returning an empty network namespace.  That is
not helpful nor should really be allowed; that's what the error return
from GetNetNS() is for.
2017-06-12 14:46:13 -05:00
Dong Liu a82b8f1094 Fix hostconfig device map logic in dockershim. 2017-06-12 11:15:46 +08:00
Andy Xie 96cb43993a fix sync loop health check 2017-06-10 11:25:59 +08:00
Dawn Chen 2a5ac62dd4 Merge pull request #47212 from MrHohn/kubelet-iptables-lock
Make kubelet touch iptables lock file during initialization
2017-06-09 16:44:00 -07:00
Zihong Zheng d5c9d27ed7 Make kubelet touch iptables lock file during initialization 2017-06-09 09:34:48 -07:00
Pengfei Ni 22e99504d7 Update CRI references 2017-06-09 10:16:40 +08:00
Pengfei Ni 9cc2ecc347 CRI: rename package name to pkg/kubelet/apis/cri/v1alpha1/runtime 2017-06-09 10:13:34 +08:00
Davanum Srinivas 7e5c43a042 Run cAdvisor on the same interface as kubelet
cAdvisor currently binds to all interfaces. Currently the only
solution is to use iptables to block access to the port. We
are better off making cAdvisor to bind to the interface that
kubelet uses for better security.

Fixes #11710
2017-06-08 16:43:38 -04:00
Kubernetes Submit Queue 69a9759d90 Merge pull request #46744 from karataliu/wincri4
Automatic merge from submit-queue

Support windows in dockershim

**What this PR does / why we need it**:
This is the 2nd part for https://github.com/kubernetes/kubernetes/issues/45927 .

The non-cri implementation dockertools was removed from kubelet v1.7 .
Part of previous work for supporting windows container lies in v1.6 dockertools, this PR is to port them to dockershim.

Main reference file in v1.6 dockertools windows support:
https://github.com/kubernetes/kubernetes/blob/v1.6.4/pkg/kubelet/dockertools/docker_manager_windows.go

**Which issue this PR fixes**
45927, for now catching up the implementation of v1.6

**Special notes for your reviewer**:
The code change includes 4 parts, put them together as we discussed in https://github.com/kubernetes/kubernetes/pull/46089

1. Update go-winio package to a newer version
  'go-winio' package is used by docker client.
  This change is to bring the support for Go v1.8, specifically included in the PR: https://github.com/Microsoft/go-winio/pull/48 
Otherwise it will produce a lot of error like in: https://github.com/fsouza/go-dockerclient/issues/648 

2. Add os dependent getSecurityOpts helper method. 
seccomp not supported on windows
  Corresponding code in v1.6: https://github.com/kubernetes/kubernetes/blob/v1.6.4/pkg/kubelet/dockertools/docker_manager_windows.go#L78

3. Add updateCreateConfig.
Allow user specified network mode setting. This is to be compatible with what kube-proxy package does on Windows. 
  Also, there is a Linux section in both sandbox config and container config: LinuxPodSandboxConfig, LinuxContainerConfig.
And that section later goes to Config and HostConfig section under docker container createConfig. Ideally hostconfig section should be dependent on host os, while config should depend on container image os.
  To simplify the case, here it assumes that windows host supports windows type container image only. It needs to be updated when kubernetes is to support windows host running linux container image or the like.
  Corresponding code in v1.6: https://github.com/kubernetes/kubernetes/blob/v1.6.4/pkg/kubelet/dockertools/docker_manager_windows.go#L57

4. Add podIpCache in dockershim. 
  For v1.6 windows implementation, it still does not use sandbox, thus only allow single container to be exposed.
  Here added a cache for saving container IP, to get adapted to the new CRI api.
Corresponding code in v1.6:
No sandbox: https://github.com/kubernetes/kubernetes/blob/v1.6.4/pkg/kubelet/dockertools/docker_manager_windows.go#L66
Use container id as pod ip: https://github.com/kubernetes/kubernetes/blob/v1.6.4/pkg/kubelet/dockertools/docker_manager.go#L2727

**Release note**:
2017-06-07 20:03:19 -07:00
ublubu 46465c0a5a Kubelet doesn't override addrs from Cloud provider 2017-06-07 22:27:18 -04:00
Kubernetes Submit Queue 56baaaae73 Merge pull request #46087 from tianshapjq/gpu-info-error-in-restart
Automatic merge from submit-queue (batch tested with PRs 45877, 46846, 46630, 46087, 47003)

gpusInUse info error when kubelet restarts

**What this PR does / why we need it**:
In my test, I found 2 errors in the nvidia_gpu_manager.go.
1. the number of activePods in gpusInUse() equals to 0 when kubelet restarts. It seems the Start() method was called before pods recovery which caused this error. So I decide not to call gpusInUse() in the Start() function, just let it happen when new pod needs to be created.
2. the container.ContainerID in line 242 returns the id in format of "docker://<container_id>", this will make the client failed to inspect the container by id. We have to erase the prefix of "docker://".

**Special notes for your reviewer**:

**Release note**:

```
Avoid assigning the same GPU to multiple containers.
```
2017-06-07 17:55:50 -07:00