Commit Graph

6571 Commits (df439192d7d09f107b444ed1784fc2681a00f0fa)

Author SHA1 Message Date
Michael Taufen 8e217f7102 port setNodeStatusImages to Setter abstraction, add test 2018-07-16 09:09:47 -07:00
Michael Taufen b7ec333f01 port setNodeStatusDaemonEndpoints to Setter abstraction 2018-07-16 09:09:47 -07:00
Michael Taufen 59bb21051e port setNodeStatusVersionInfo to Setter abstraction, add test 2018-07-16 09:09:47 -07:00
Michael Taufen 596fa89af0 port setNodeStatusMachineInfo to Setter abstraction, add test 2018-07-16 09:09:47 -07:00
Michael Taufen aa94a3ba4e lift node-info setters into defaultNodeStatusFuncs
Instead of hiding these behind a helper, we just register them in a
uniform way. We are careful to keep the call-order of the setters the
same, though we can consider re-ordering in a future PR to achieve
fewer appends.
2018-07-16 09:09:47 -07:00
Michael Taufen 2df7e1ad5c port setNodeVolumesInUseStatus to Setter abstraction, add test 2018-07-16 09:09:47 -07:00
Michael Taufen 3e03e0611e port setNodeReadyCondition to Setter abstraction, add test 2018-07-16 09:09:47 -07:00
Michael Taufen e0b6ae219f port setNodePIDPressureCondition to Setter abstraction, add test 2018-07-16 09:09:47 -07:00
Michael Taufen b26e4dfa7f port setNodeDiskPressureCondition to Setter abstraction, add test 2018-07-16 09:09:47 -07:00
Michael Taufen f057c9a4ae port setNodeMemoryPressureCondition to Setter abstraction, add test 2018-07-16 09:09:47 -07:00
Michael Taufen c33f321acd port setNodeOODCondition to Setter abstraction 2018-07-16 09:09:47 -07:00
Michael Taufen 15b03b8c0c port setNodeAddress to Setter abstraction, port test
also put cloud_request_manager.go in its own package
2018-07-16 09:09:47 -07:00
Michael Taufen a3cbbbd931 move call to defaultNodeStatusFuncs to after the rest of the Kubelet is constructed 2018-07-16 09:03:13 -07:00
Michael Taufen 08c94e0616 add nodestatus package with Setter abstraction for composable node constructors 2018-07-16 09:03:13 -07:00
Michael Taufen d245e72bae remove incorrect comment referencing removed functionality
The cbr0 configuration behavior this comment references was removed in #34906
2018-07-16 09:03:13 -07:00
linyouchong 6ff285bce3 fix nil pointer dereference in node_container_manager#enforceExistingCgroup 2018-07-14 10:42:42 +08:00
Joonyoung Park e6d02e9410 fix metrics help comment
pod_start_latency_microseconds is not broken down by podname.
2018-07-13 10:26:35 +09:00
Kubernetes Submit Queue 337dfe0a9c
Merge pull request #65594 from liggitt/node-csr-addresses-2
Automatic merge from submit-queue (batch tested with PRs 65052, 65594). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Derive kubelet serving certificate CSR template from node status addresses

xref https://github.com/kubernetes/features/issues/267
fixes #55633

Builds on https://github.com/kubernetes/kubernetes/pull/65587

* Makes the cloud provider authoritative when recording node status addresses
* Makes the node status addresses authoritative for the kube-apiserver determining how to speak to a kubelet (stops paying attention to the hostname label when determining how to reach a kubelet, which was only done to support kubelets < 1.5)
* Updates kubelet certificate rotation to be driven from node status
  * Avoids needing to compute node addresses a second time, and differently, in order to request serving certificates.
  * Allows the kubelet to react to changes in its status addresses by updating its serving certificate
  * Allows the kubelet to be driven by external cloud providers recording node addresses on the node status

test procedure:
```sh
# setup
export FEATURE_GATES=RotateKubeletServerCertificate=true
export KUBELET_FLAGS="--rotate-server-certificates=true --cloud-provider=external"

# cleanup from previous runs
sudo rm -fr /var/lib/kubelet/pki/

# startup
hack/local-up-cluster.sh

# wait for a node to register, verify it didn't set addresses
kubectl get nodes 
kubectl get node/127.0.0.1 -o jsonpath={.status.addresses}

# verify the kubelet server isn't available, and that it didn't populate a serving certificate
curl --cacert _output/certs/server-ca.crt -v https://localhost:10250/pods
ls -la /var/lib/kubelet/pki

# set an address on the node
curl -X PATCH http://localhost:8080/api/v1/nodes/127.0.0.1/status \
  -H "Content-Type: application/merge-patch+json" \
  --data '{"status":{"addresses":[{"type":"Hostname","address":"localhost"}]}}'

# verify a csr was submitted with the right SAN, and approve it
kubectl describe csr
kubectl certificate approve csr-...

# verify the kubelet connection uses a cert that is properly signed and valid for the specified hostname, but NOT the IP
curl --cacert _output/certs/server-ca.crt -v https://localhost:10250/pods
curl --cacert _output/certs/server-ca.crt -v https://127.0.0.1:10250/pods
ls -la /var/lib/kubelet/pki

# set an hostname and IP address on the node
curl -X PATCH http://localhost:8080/api/v1/nodes/127.0.0.1/status \
  -H "Content-Type: application/merge-patch+json" \
  --data '{"status":{"addresses":[{"type":"Hostname","address":"localhost"},{"type":"InternalIP","address":"127.0.0.1"}]}}'

# verify a csr was submitted with the right SAN, and approve it
kubectl describe csr
kubectl certificate approve csr-...

# verify the kubelet connection uses a cert that is properly signed and valid for the specified hostname AND IP
curl --cacert _output/certs/server-ca.crt -v https://localhost:10250/pods
curl --cacert _output/certs/server-ca.crt -v https://127.0.0.1:10250/pods
ls -la /var/lib/kubelet/pki
```

```release-note
* kubelets that specify `--cloud-provider` now only report addresses in Node status as determined by the cloud provider
* kubelet serving certificate rotation now reacts to changes in reported node addresses, and will request certificates for addresses set by an external cloud provider
```
2018-07-11 22:25:07 -07:00
Seth Jennings f2a7654978 move feature gate checks inside IsCriticalPod 2018-07-11 16:10:05 -05:00
Kubernetes Submit Queue 0972ce1acc
Merge pull request #65649 from rsc/fix-printf
Automatic merge from submit-queue (batch tested with PRs 66076, 65792, 65649). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

kubernetes: fix printf format errors

These are all flagged by Go 1.11's
more accurate printf checking in go vet,
which runs as part of go test.

```release-note
NONE
```
2018-07-11 14:09:08 -07:00
jiaxuanzhou 6ac4a8588e fix bug for garbage collection 2018-07-11 09:33:08 +08:00
Russ Cox 2bd91dda64 kubernetes: fix printf format errors
These are all flagged by Go 1.11's
more accurate printf checking in go vet,
which runs as part of go test.

Lubomir I. Ivanov <neolit123@gmail.com>
applied ammend for:
  pkg/cloudprovider/provivers/vsphere/nodemanager.go
2018-07-11 00:10:15 +03:00
Kubernetes Submit Queue 421789328f
Merge pull request #65997 from tallclair/writer
Automatic merge from submit-queue (batch tested with PRs 66030, 65997). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Remove unused io util writer & volume host GetWriter()

Cleanup unused code.
Fixes https://github.com/kubernetes/kubernetes/issues/16971

**Release note**:
```release-note
NONE
```

/kind cleanup
/sig storage
2018-07-10 12:46:09 -07:00
Jordan Liggitt 7828e5d0f9
Make cloud provider authoritative for node status address reporting 2018-07-10 14:33:48 -04:00
Jordan Liggitt db9d3c2d10
Derive kubelet serving certificate CSR template from node status addresses 2018-07-10 14:33:48 -04:00
Kubernetes Submit Queue 13f9c26fd7
Merge pull request #65902 from wojtek-t/kube_proxy_less_allocations_2
Automatic merge from submit-queue (batch tested with PRs 65902, 65781). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Avoid unnecessary allocations in kube-proxy
2018-07-09 23:07:01 -07:00
Kubernetes Submit Queue 55620e2be6
Merge pull request #65987 from Random-Liu/fix-pod-worker-deadlock
Automatic merge from submit-queue (batch tested with PRs 65987, 65962). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix pod worker deadlock.

Preemption will stuck forever if `killPodNow` timeout once. The sequence is:
* `killPodNow` create the response channel (size 0) and send it to pod worker.
* `killPodNow` timeout and return.
*  Pod worker finishes killing the pod, and tries to send back response via the channel.

However, because the channel size is 0, and the receiver has exited, the pod worker will stuck forever.

In @jingxu97's case, this causes a critical system pod (apiserver) unable to come up, because the csi pod can't be preempted.

I checked the history, and the bug was introduced 2 years ago 6fefb428c1.

I think we should at least cherrypick this to `1.11` since preemption is beta and enabled by default in 1.11.

@kubernetes/sig-node-bugs @derekwaynecarr @dashpole @yujuhong 
Signed-off-by: Lantao Liu <lantaol@google.com>

```release-note
none
```
2018-07-09 16:53:59 -07:00
Tim Allclair b1012b2543
Remove unused io util writer & volume host GetWriter() 2018-07-09 14:09:48 -07:00
Lantao Liu 0f4c739b2c Fix pod worker deadlock.
Signed-off-by: Lantao Liu <lantaol@google.com>
2018-07-09 11:45:26 -07:00
Kubernetes Submit Queue f70410959d
Merge pull request #65226 from ingvagabund/store-cloud-provider-latest-node-addresses
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Store the latest cloud provider node addresses

**What this PR does / why we need it**:
Buffer the recently retrieved node address so they can be used as soon as the next node status update is run.


**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #65814

**Special notes for your reviewer**:

**Release note**:

```release-note
None
```
2018-07-09 10:47:07 -07:00
Kubernetes Submit Queue e943d09fa3
Merge pull request #63194 from m1093782566/cni-ts
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Adding traffic shaping support for CNI network driver

**What this PR does / why we need it**:

Adding traffic shaping support for CNI network driver - it's also a sub-task of kubenet deprecation work.

Design document is available here: https://github.com/kubernetes/community/pull/1893

**Which issue(s) this PR fixes**:
Fixes #

**Special notes for your reviewer**:

/cc @freehan @jingax10 @caseydavenport @dcbw 

/sig network
/sig node

**Release note**:

```release-note
Support traffic shaping for CNI network driver
```
2018-07-08 23:54:25 -07:00
liangwei 34d848eb1a add cni bandwidth test 2018-07-09 09:51:33 +08:00
m1093782566 8038a0dfa6 add traffic shaping support for CNI network driver 2018-07-08 22:22:25 +08:00
wojtekt 6e50f39dbd Avoid allocations when parsing iptables 2018-07-08 10:55:19 +02:00
Kubernetes Submit Queue 097f300a4d
Merge pull request #65707 from dims/remove-deprecated-cadvisor-port
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Remove --cadvisor-port - has been deprecated since v1.10

**What this PR does / why we need it**:

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #56523

**Special notes for your reviewer**:
- Deprecated in https://github.com/kubernetes/kubernetes/pull/59827 (v1.10)
- Disabled in https://github.com/kubernetes/kubernetes/pull/63881 (v1.11)

**Release note**:

```release-note
[action required] The formerly publicly-available cAdvisor web UI that the kubelet started using `--cadvisor-port` is now entirely removed in 1.12. The recommended way to run cAdvisor if you still need it, is via a DaemonSet.
```
2018-07-07 05:28:13 -07:00
Lantao Liu 3193a4a469 Fix RunAsGroup. 2018-07-06 15:42:26 -07:00
choury 8e4b62a74b
Remove duplicate check line
There is a same [line](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cm/cpumanager/policy_static.go#L81).
2018-07-05 11:07:56 +08:00
Jordan Liggitt b7b4b84afe
Add healthz check to ensure logging is not blocked 2018-07-03 22:27:23 -04:00
Jan Chaloupka 9d9fb4de29 Put all the node address cloud provider retrival complex logic into cloudResourceSyncManager 2018-07-03 20:11:35 +02:00
Davanum Srinivas 5feab86329
Remove --cadvisor-port - has been deprecated since v1.10
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2018-07-02 08:54:14 -04:00
wojtekt e50c0b904f Speed up cluster startup in GCE 2018-07-02 10:22:32 +02:00
Kubernetes Submit Queue b265f7c682
Merge pull request #65582 from dtaniwaki/fix-test-failure-of-truncated-time
Automatic merge from submit-queue (batch tested with PRs 65582, 65480, 65310, 65644, 65645). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix test failure of truncated time

**What this PR does / why we need it**:

The test of `TestFsStoreAssignedModified` in `pkg/kubelet/kubeletconfig/checkpoint/store` fails in my environment like below.

```
$ make test WHAT=./pkg/kubelet/kubeletconfig/checkpoint/store/
Running tests for APIVersion: v1,admissionregistration.k8s.io/v1alpha1,admissionregistration.k8s.io/v1beta1,admission.k8s.io/v1beta1,apps/v1beta1,apps/v1beta2,apps/v1,authentication.k8s.io/v1,authentication.k8s.io/v1beta1,authorization.k8s.io/v1,authorization.k8s.io/v1beta1,autoscaling/v1,autoscaling/v2beta1,batch/v1,batch/v1beta1,batch/v2alpha1,certificates.k8s.io/v1beta1,coordination.k8s.io/v1beta1,extensions/v1beta1,events.k8s.io/v1beta1,imagepolicy.k8s.io/v1alpha1,networking.k8s.io/v1,policy/v1beta1,rbac.authorization.k8s.io/v1,rbac.authorization.k8s.io/v1beta1,rbac.authorization.k8s.io/v1alpha1,scheduling.k8s.io/v1alpha1,scheduling.k8s.io/v1beta1,settings.k8s.io/v1alpha1,storage.k8s.io/v1beta1,storage.k8s.io/v1,storage.k8s.io/v1alpha1,
+++ [0628 22:53:39] Running tests without code coverage
--- FAIL: TestFsStoreAssignedModified (0.00s)
        fsstore_test.go:316: expect "2018-06-28T22:53:43+09:00" but got "2018-06-28T22:53:43+09:00"
FAIL
FAIL    k8s.io/kubernetes/pkg/kubelet/kubeletconfig/checkpoint/store    0.236s
make: *** [test] Error 1
```

My environment is
OS: macOS Sierra Version 10.12.6
File System: Journaled HFS+

The error message confused me because the comparing times looked the same in the error log. If we know certain systems truncate times, I think we can just compare less precise times to avoid confusions in tests.

**Special notes for your reviewer**:
N/A

**Release note**:

```release-note
NONE
```
2018-06-29 20:14:06 -07:00
Daisuke Taniwaki 7d4c85b02c
Fix test failure of truncated time 2018-06-30 01:14:44 +09:00
Kubernetes Submit Queue 93f3249e3c
Merge pull request #65595 from sjenning/feature-gate-lsi-capacity
Automatic merge from submit-queue (batch tested with PRs 60150, 65467, 65487, 65595, 65374). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

kubelet: feature gate LSI capacity calculation

Currently if `cm.cadvisorInterface.RootFsInfo()` fails, the whole kubelet bails.  If `/var/lib/kubelet` is on a tmpfs or bindmount, this can happen (this is the case for some of our CI envs https://github.com/openshift/origin/issues/19948).

We would be able to workaround this, in the short term, by disabling the LSI feature gate if the capacity calculate was protected by the gate, but currently it isn't.

This PR adds the gate check around setting the ephemeral storage capacity.

@liggitt @derekwaynecarr @dashpole 

It might be a different discussion about whether or not this should be fatal.  If it isn't fatal, seems that it would just prevent pods that had a ephemeral storage request from being scheduled.

/sig node
2018-06-28 19:15:15 -07:00
Kubernetes Submit Queue c57cdc1d35
Merge pull request #65587 from liggitt/node-csr-addresses-1
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Revert "certs: only append locally discovered addresses when we got none from the cloudprovider"

This reverts commit 7354bbe5ac.

https://github.com/kubernetes/kubernetes/pull/61869 caused a mismatch between the requested CSR and the addresses in node status.

Instead of computing addresses in two places, the cert manager should derive its CSR request from the addresses in node status. This would enable the kubelet to react to address changes, as well as be driven by an external cloud provider.

/cc @mikedanese

```release-note
NONE
```
2018-06-28 17:36:45 -07:00
Kubernetes Submit Queue 44073e6f43
Merge pull request #64660 from figo/master
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add support for plugin directory hierarchy

**What this PR does / why we need it**:

Add hierarchy support for plugin directory, it traverses and 
watch plugin directory and its sub directory recursively.

plugin socket file only need be unique within one directory,
``` 
 plugin socket directory  
    |  
    ---->sub directory 1
    |              |  
    |              ----->  socket1,  socket2 ...
    ----->sub directory 2
                  |
                  ------> socket1, socket2 ...  
```
the design itself allow sub directory be anything,
but in practical, each plugin type could just use one sub directory.

**Which issue(s) this PR fixes**:
Fixes #64003

**Special notes for your reviewer**:

twos bonus changes added as below

1) propose to let pluginWatcher bookkeeping registered plugins,
to make sure plugin name is unique within one plugin type.  
arguably, we could let each handler do the same work, but it requires
every handler repeat the same thing.    
 
2) extract example handler out from test, it is easier to read the code with the
seperation.  


**Release note**:

```release-note
N/A
```

/sig node
/cc @vikaschoudhary16  @jiayingz @RenaudWasTaken @vishh @derekwaynecarr  @saad-ali @vladimirvivien @dchen1107 @yujuhong @tallclair @Random-Liu @anfernee @akutz
2018-06-28 14:53:44 -07:00
Seth Jennings 3234b0fa5b feature gate LSI capacity calculation 2018-06-28 14:01:08 -05:00
Jordan Liggitt f1adf74b4e
Revert "certs: only append locally discovered addresses when we got none from the cloudprovider"
This reverts commit 7354bbe5ac.
2018-06-28 12:36:24 -04:00
Kubernetes Submit Queue 270b675c61
Merge pull request #65513 from tallclair/test-cleanup2
Automatic merge from submit-queue (batch tested with PRs 65453, 65523, 65513, 65560). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Cleanup verbose cAdvisor mocking in Kubelet unit tests

These tests had a lot of duplicate code to set up the cAdvisor mock, but weren't really depending on the mock functionality. By moving the tests to use the fake cAdvisor, most of the setup can be cleaned up.

/kind cleanup
/sig node

```release-note
NONE
```
2018-06-27 22:30:12 -07:00
Tim Allclair 5955b839ff
Cleanup verbose cAdvisor mocking in Kubelet unit tests 2018-06-27 11:53:41 -07:00
stewart-yu d5513c6d14 fix wrong output messages about EnforceNodeAllocatable 2018-06-27 15:31:32 +08:00
Kubernetes Submit Queue 991a84758f
Merge pull request #59214 from kdembler/cpumanager-checkpointing
Automatic merge from submit-queue (batch tested with PRs 59214, 65330). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Migrate cpumanager to use checkpointing manager

**What this PR does / why we need it**:
This PR migrates `cpumanager` to use new kubelet level node checkpointing feature (#56040) to decrease code redundancy and improve consistency.

**Which issue(s) this PR fixes**:
Fixes #58339

**Notes**:
At point of submitting PR the most straightforward approach was used - `state_checkpoint` implementation of `State` interface was added. However, with checkpointing implementation there might be no point to keep `State` interface and just use single implementation with checkpoint backend and in case of different backend than filestore needed just supply `cpumanager` with custom `CheckpointManager` implementation.

/kind feature
/sig node
cc @flyingcougar @ConnorDoyle
2018-06-25 18:19:00 -07:00
hui luo d04f596829 Add hierarchy support for plugin directory
it traverses and watch plugin directory and its sub directory recursively,
plugin socket file only need be unique within one directory,

- plugin socket directory
-    |
-    ---->sub directory 1
-    |              |
-    |              ----->  socket1,  socket2 ...
-    ----->sub directory 2
-                  |
-                  ------> socket1, socket2 ...

the design itself allow sub directory be anything,
but in practical, each plugin type could just use one sub directory.

four bonus changes added as below

1. extract example handler out from test, it is easier to read the code
with the seperation.

2. there are two variables here: "Watcher" and "watcher".
"Watcher" is the plugin watcher, and "watcher" is the fsnotify watcher.
so rename the "watcher" to "fsWatcher" to make code easier to
understand.

3. change RegisterCallbackFn() return value order, it is
conventional to return error last, after this change,
the pkg/volume/csi is compliance with golint, so remove it
from hack/.golint_failures

4. refactor errors handling at invokeRegistrationCallbackAtHandler()
to make error message more clear.
2018-06-25 17:32:18 -07:00
Jeff Grafton 23ceebac22 Run hack/update-bazel.sh 2018-06-22 16:22:57 -07:00
Jeff Grafton a725660640 Update to gazelle 0.12.0 and run hack/update-bazel.sh 2018-06-22 16:22:18 -07:00
Jeff Grafton 01f94051c8 Remove the go_default_library_protos filegroups using buildozer 2018-06-22 16:22:18 -07:00
Kubernetes Submit Queue f09a938bcd
Merge pull request #64675 from yue9944882/fix-data-race-cli-file-linux
Automatic merge from submit-queue (batch tested with PRs 61330, 64793, 64675, 65059, 65368). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fixes data races for pkg/kubelet/config/file_linux_test.go

**What this PR does / why we need it**:

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #64655

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2018-06-22 14:52:37 -07:00
Kubernetes Submit Queue 1ca851baec
Merge pull request #64860 from wgliang/master.kubelet-check-limit
Automatic merge from submit-queue (batch tested with PRs 65290, 65326, 65289, 65334, 64860). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

checkLimitsForResolvConf for the  pod create and update events instead of checking period

**What this PR does / why we need it**:

- Check for the same at pod create and update events instead of checking continuously for every 30 seconds.
- Increase the logging level to 4 or higher since the event is not catastrophic to cluster health .


**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #64849

**Special notes for your reviewer**:
@ravisantoshgudimetla 

**Release note**:

```release-note
checkLimitsForResolvConf for the  pod create and update events instead of checking period
```
2018-06-22 04:43:16 -07:00
Kubernetes Submit Queue 96c7f3a34a
Merge pull request #64752 from wojtek-t/default_to_watching_managers
Automatic merge from submit-queue (batch tested with PRs 65187, 65206, 65223, 64752, 65238). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Kubelet watches necessary secrets/configmaps instead of periodic polling
2018-06-21 19:48:14 -07:00
Kubernetes Submit Queue 02dba36128
Merge pull request #65019 from mirake/fix-typo-toto
Automatic merge from submit-queue (batch tested with PRs 65265, 64822, 65026, 65019, 65077). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Typo fix: toto -> to
2018-06-21 11:25:16 -07:00
Kubernetes Submit Queue d1f5cb2348
Merge pull request #65050 from sttts/sttts-deepcopy-update
Automatic merge from submit-queue (batch tested with PRs 64895, 64938, 63700, 65050, 64957). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Bump gengo to include uniform pointer deepcopy

This bumps k8s.io/gengo with uniform pointer support in deepcopy-gen.

Fixes https://github.com/kubernetes/code-generator/issues/45.
2018-06-21 04:15:16 -07:00
Kubernetes Submit Queue 332da0a943
Merge pull request #64491 from hzxuzhonghu/kubelet-node-schedule-event-record
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

move oldNodeUnschedulable pkg var to kubelet struct

**What this PR does / why we need it**:

move oldNodeUnschedulable pkg var to kubelet struct


**Release note**:

```release-note
NONE
```
2018-06-20 23:02:52 -07:00
Kubernetes Submit Queue ce09da5653
Merge pull request #64880 from dixudx/manifest_file_not_found
Automatic merge from submit-queue (batch tested with PRs 58690, 64773, 64880, 64915, 64831). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

ignore not found file error when watching manifests

**What this PR does / why we need it**:
An alternative of #63910.

When using vim to create a new file in manifest folder, a temporary file, with an arbitrary number (like 4913) as its name, will be created to check if a directory is writable and see the resulting ACL.

These temporary files will be deleted later, which should by ignored when watching the manifest folder.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #55928, #59009, #48219

**Special notes for your reviewer**:
/cc dims luxas yujuhong liggitt tallclair

**Release note**:

```release-note
ignore not found file error when watching manifests
```
2018-06-20 14:21:17 -07:00
Kubernetes Submit Queue aa25539ef6
Merge pull request #64451 from wgliang/master.remove-kubelet
Automatic merge from submit-queue (batch tested with PRs 64688, 64451, 64504, 64506, 56358). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

cleanup some dead kubelet code

**Release note**:

```release-note
NONE
```
2018-06-20 05:48:11 -07:00
Kubernetes Submit Queue a622f1404c
Merge pull request #64672 from mcluseau/wip-remote-grpc-message-size
Automatic merge from submit-queue (batch tested with PRs 65032, 63471, 64104, 64672, 64427). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

pkg: kubelet: remote: increase grpc client default size to 16MiB

**What this PR does / why we need it**:

Increase the gRPC max message size to 16MB in the remote container runtime. I've seen sizes over 8MB in clusters with big (256GB RAM) nodes.

**Release note**:
```release-note
Increase the gRPC max message size to 16MB in the remote container runtime.
```
2018-06-20 04:23:21 -07:00
Kubernetes Submit Queue 381b663b66
Merge pull request #63580 from dixudx/fix_cni_flag_binding
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

bind alpha feature network plugin flags correctly

**What this PR does / why we need it**:
When working #63542, I found the flags, like `--cni-conf-dir` and `cni-bin-dir`, were not correctly bound.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:
/cc kubernetes/sig-node-pr-reviews

**Release note**:

```release-note
None
```
2018-06-20 01:26:52 -07:00
Kubernetes Submit Queue 148350d3c4
Merge pull request #64426 from cofyc/remove_unnecessary_fakemounters
Automatic merge from submit-queue (batch tested with PRs 64142, 64426, 62910, 63942, 64548). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Clean up fake mounters.

**What this PR does / why we need it**:

Fixes https://github.com/kubernetes/kubernetes/issues/61502

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

list of fake mounters:

- (keep) pkg/util/mount.FakeMounter
- (removed) pkg/kubelet/cm.fakeMountInterface:
- (inherit from mount.FakeMounter) pkg/util/mount.fakeMounter
- (inherit from mount.FakeMounter) pkg/util/removeall.fakeMounter
- (removed) pkg/volume/host_path.fakeFileTypeChecker

**Release note**:

```release-note
NONE
```
2018-06-20 00:05:10 -07:00
Kubernetes Submit Queue c399c306e2
Merge pull request #59174 from tianshapjq/todo-already-done
Automatic merge from submit-queue (batch tested with PRs 65230, 57355, 59174, 63698, 63659). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

TODO has already been implemented

**What this PR does / why we need it**:
TODO has already been implemented, remove the TODO tag.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note

```NONE
2018-06-19 20:19:17 -07:00
wojtekt 72a0f4d167 Enable watching secret and configmap manager 2018-06-19 22:13:18 +02:00
wojtekt ffb32472bb Kubelet manager configuration 2018-06-19 22:12:55 +02:00
vikaschoudhary16 e8119dc134 Start plugin watcher after initialization of all kubelet components 2018-06-14 01:03:37 -04:00
Andrew Lytvynov 2c0f043957 Re-use private key after failed CSR
If we create a new key on each CSR, if CSR fails the next attempt will
create a new one instead of reusing previous CSR.

If approver/signer don't handle CSRs as quickly as new nodes come up,
they can pile up and approver would keep handling old abandoned CSRs and
Nodes would keep timing out on startup.
2018-06-13 13:12:43 -07:00
Dr. Stefan Schimanski 1208437f84 Update generated files 2018-06-13 12:35:13 +02:00
Kubernetes Submit Queue bb7e14429d
Merge pull request #64922 from dcbw/dcbw-dockershim-network-approver
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

dockershim/network: add dcbw to OWNERS as an approver

I've been involved with the kubelet network code, including most
of this code, for a couple years and contributed a good number
of PRs for these directories. I've also been a SIG Network
co-lead for couple years.

I've also been on the CNI maintainers team for a couple years.

```release-note
NONE
```
@freehan @thockin @kubernetes/sig-network-pr-reviews
2018-06-12 13:31:15 -07:00
Kubernetes Submit Queue 67ebbc675a
Merge pull request #64862 from feiskyer/win-cni
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Revert #64189: Fix Windows CNI for the sandbox case

**What this PR does / why we need it**:

This reverts PR #64189, which breaks DNS for Windows containers.

Refer https://github.com/kubernetes/kubernetes/pull/64189#issuecomment-395248704

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #64861

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```

cc @madhanrm @PatrickLang @alinbalutoiu @dineshgovindasamy
2018-06-12 11:18:01 -07:00
ruicao 95c232ee07 Typo fix: toto -> to 2018-06-12 23:12:39 +08:00
Kubernetes Submit Queue 8e03228c1a
Merge pull request #64643 from dashpole/memcg_poll
Automatic merge from submit-queue (batch tested with PRs 64503, 64903, 64643, 64987). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Use unix.EpollWait to determine when memcg events are available to be Read

**What this PR does / why we need it**:
This fixes a file descriptor leak introduced in https://github.com/kubernetes/kubernetes/pull/60531 when the `--experimental-kernel-memcg-notification` kubelet flag is enabled.  The root of the issue is that `unix.Read` blocks indefinitely when reading from an event file descriptor and there is nothing to read.  Since we refresh the memcg notifications, these reads accumulate until the memcg threshold is crossed, at which time all reads complete.  However, if the node never comes under memory pressure, the node can run out of file descriptors.

This PR changes the eviction manager to use `unix.EpollWait` to wait, with a 10 second timeout, for events to be available on the eventfd.  We only read from the eventfd when there is an event available to be read, preventing an accumulation of `unix.Read` threads, and allowing the event file descriptors to be reclaimed by the kernel.

This PR also breaks the creation, and updating of the memcg threshold into separate portions, and performs creation before starting the periodic synchronize calls.  It also moves the logic of configuring memory thresholds into memory_threshold_notifier into a separate file.

This also reverts https://github.com/kubernetes/kubernetes/pull/64582, as the underlying leak that caused us to disable it for testing is fixed here.

Fixes #62808

**Release note**:
```release-note
NONE
```

/sig node
/kind bug
/priority critical-urgent
2018-06-11 17:29:19 -07:00
David Ashpole b7deb6d9e0 fix eviction event formatting 2018-06-11 11:38:00 -07:00
David Ashpole 93b6d026d9 fix memcg fd leak 2018-06-11 11:37:50 -07:00
Dan Williams 37792076b4 dockershim/network: add dcbw to OWNERS as an approver
I've been involved with the kubelet network code, including most
of this code, for a couple years and contributed a good number
of PRs for these directories. I've also been a SIG Network
co-lead for couple years.

I've also been on the CNI maintainers team for a couple years.
2018-06-08 10:06:19 -05:00
yue9944882 d467b29c5c remove duplicated cleaning up func 2018-06-08 14:28:19 +08:00
WanLinghao 52140ea1d3 fix a bug of wrong parameters which could cause token projection failure 2018-06-08 12:00:58 +08:00
Kubernetes Submit Queue 38beee65d3
Merge pull request #63905 from feiskyer/win-dns
Automatic merge from submit-queue (batch tested with PRs 63905, 64855). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Setup dns servers and search domains for Windows Pods

**What this PR does / why we need it**:

Kubelet is depending on docker container's ResolvConfPath (e.g. /var/lib/docker/containers/439efe31d70fc17485fb6810730679404bb5a6d721b10035c3784157966c7e17/resolv.conf) to setup dns servers and search domains. While this is ok for Linux containers, ResolvConfPath is always an empty string for windows containers. So that the DNS setting for windows containers is always not set.

This PR setups DNS for Windows sandboxes. In this way, Windows Pods could also use kubernetes dns policies.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #61579

**Special notes for your reviewer**:

Requires Docker EE version >= 17.10.0.

**Release note**:

```release-note
Setup dns servers and search domains for Windows Pods in dockershim. Docker EE version >= 17.10.0 is required for propagating DNS to containers.
```

/cc @PatrickLang @taylorb-microsoft @michmike @JiangtianLi
2018-06-07 11:40:11 -07:00
Di Xu 6d14771fd8 ignore not found file error when watching manifests 2018-06-07 22:02:53 +08:00
Klaudiusz Dembler a9df2acc4b Typo fix 2018-06-07 12:08:48 +02:00
Pengfei Ni d0cd1d17ae Add clarification for Windows DNS setup flow 2018-06-07 16:26:13 +08:00
yue9944882 a221218681 fixes data races 2018-06-07 11:24:35 +08:00
Guoliang Wang 4f9d2047dd checkLimitsForResolvConf for the pod create and update events instead of checking period 2018-06-07 10:14:22 +08:00
Pengfei Ni 10b6f405e1 Revert "Fix Windows CNI for the sandbox case"
This reverts commit 49e762ab3a.
2018-06-07 09:56:13 +08:00
Kubernetes Submit Queue 8013bdb180
Merge pull request #64749 from Random-Liu/fix-standalone-dockershim
Automatic merge from submit-queue (batch tested with PRs 64749, 64797). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix standalone dockershim.

Ref https://github.com/kubernetes-incubator/cri-tools/pull/320#issuecomment-394554484.

This PR fixes a bug that standalone dockershim exits immediately.

This PR:
1) Changes standalone dockershim to wait on `stopCh`, so that it won't exit immediately.
2) Removes `stopCh` from dockershim internal. It doesn't help much for graceful stop, because kubelet will exit immediately anyway. https://github.com/kubernetes/kubernetes/blob/master/cmd/kubelet/app/server.go#L748

@kubernetes/sig-node-pr-reviews @yujuhong @feiskyer 

**Release note**:

```release-note
none
```
2018-06-06 10:08:12 -07:00
Kubernetes Submit Queue f54593b740
Merge pull request #64795 from mikedanese/fixgke
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

auth: standalone kubelets shouldn't start a token manager

fixes https://github.com/kubernetes/kubernetes/issues/64789
2018-06-06 06:58:28 -07:00
Kubernetes Submit Queue f4668d281c
Merge pull request #64800 from dashpole/cadvisor_godep
Automatic merge from submit-queue (batch tested with PRs 63717, 64646, 64792, 64784, 64800). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Update cadvisor godeps to v0.30.0

**What this PR does / why we need it**:
cAdvisor godep update corresponding to 1.11

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #63204

**Release note**:
```release-note
Use IONice to reduce IO priority of du and find
cAdvisor ContainerReference no longer contains Labels. Use ContainerSpec instead.
Fix a bug where cadvisor failed to discover a sub-cgroup that was created soon after the parent cgroup.
```

/sig node
/kind bug
/priority critical-urgent

/assign @dchen1107
2018-06-06 01:24:26 -07:00
Kubernetes Submit Queue a32e5b6a59
Merge pull request #64784 from jiayingz/status-ready
Automatic merge from submit-queue (batch tested with PRs 63717, 64646, 64792, 64784, 64800). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Reconcile extended resource capacity after kubelet restart.

**What this PR does / why we need it**:

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/kubernetes/issues/64632

**Special notes for your reviewer**:

**Release note**:

```release-note
Kubelet will set extended resource capacity to zero after it restarts. If the extended resource is exported by a device plugin, its capacity will change to a valid value after the device plugin re-connects with the Kubelet. If the extended resource is exported by an external component through direct node status capacity patching, the component should repatch the field after kubelet becomes ready again. During the time gap, pods previously assigned with such resources may fail kubelet admission but their controller should create new pods in response to such failures.
```
2018-06-06 01:24:21 -07:00
Kubernetes Submit Queue 0b8394a1f4
Merge pull request #64646 from freehan/pod-ready-plus2-new
Automatic merge from submit-queue (batch tested with PRs 63717, 64646, 64792, 64784, 64800). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add ContainersReady condition into Pod Status

**Last 3 commits are new**

Follow up PR of: https://github.com/kubernetes/kubernetes/pull/64057 and https://github.com/kubernetes/kubernetes/pull/64344

Have a single PR for adding ContainersReady per https://github.com/kubernetes/kubernetes/pull/64344#issuecomment-394038384

```release-note
Introduce ContainersReady condition in Pod Status
```


/assign yujuhong for review
/assign thockin for the tiny API change
2018-06-06 01:24:14 -07:00
Kubernetes Submit Queue b6f75ac30e
Merge pull request #63717 from ingvagabund/promote-sysctl-annotations-to-fields
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Promote sysctl annotations to fields

#


**What this PR does / why we need it**:

Promoting experimental sysctl feature from annotations to API fields.

**Special notes for your reviewer**:

Following sysctl KEP: https://github.com/kubernetes/community/pull/2093

**Release note**:

```release-note
The Sysctls experimental feature has been promoted to beta (enabled by default via the `Sysctls` feature flag). PodSecurityPolicy and Pod objects now have fields for specifying and controlling sysctls. Alpha sysctl annotations will be ignored by 1.11+ kubelets. All alpha sysctl annotations in existing deployments must be converted to API fields to be effective.
```

**TODO**:

* [x] - Promote sysctl annotation in Pod spec
* [x] - Promote sysctl annotation in PodSecuritySpec spec
* [x] - Feature gate the sysctl
* [x] - Promote from alpha to beta
* [x] - docs PR - https://github.com/kubernetes/website/pull/8804
2018-06-06 00:47:36 -07:00
Kubernetes Submit Queue 0e44d8c40b
Merge pull request #64354 from mtaufen/dkcfg-safe-fields
Automatic merge from submit-queue (batch tested with PRs 64009, 64780, 64354, 64727, 63650). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

per-field dynamic config advice

Dynamic Kubelet config gives cluster admins and k8s-as-a-service providers a lot of flexibility around reconfiguring the Kubelet in live environments. With great power comes great responsibility. These comments intend to provide more nuanced guidance around using dynamic Kubelet config by adding items to consider when changing various fields and pointing out where cluster admins and k8s-as-service providers should maintain extra caution.

@kubernetes/sig-node-pr-reviews PLEASE provide feedback and help fill in the blanks here, I don't have domain expertise in all of these features.

https://github.com/kubernetes/features/issues/281

```release-note
NONE
```
2018-06-05 22:24:46 -07:00
Kubernetes Submit Queue 999b2da440
Merge pull request #64009 from feiskyer/windows-security-context
Automatic merge from submit-queue (batch tested with PRs 64009, 64780, 64354, 64727, 63650). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Kubelet: Add security context for Windows containers

**What this PR does / why we need it**:

This PR adds windows containers to Kubelet CRI and also implements security context setting for docker containers.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

RunAsUser from Kubernetes API only accept int64 today, which is not supported on Windows. It should be changed to intstr for working with both Windows and Linux containers in a separate PR.

**Release note**:

```release-note
Kubelet: Add security context for Windows containers
```

/cc @PatrickLang @taylorb-microsoft @michmike @JiangtianLi @yujuhong @dchen1107
2018-06-05 22:24:38 -07:00
Mike Danese 90ba15ee74 auth: standalone kubelets shouldn't start a token manager 2018-06-05 17:31:26 -07:00
David Ashpole 4afcccd225 disable process scheduler metrics 2018-06-05 17:12:56 -07:00
Seth Jennings 6729add11c sysctls: create feature gate to track promotion 2018-06-06 00:23:11 +02:00