Commit Graph

172 Commits (d5e7059bf3210db88858e587ec1edd9072caf16c)

Author SHA1 Message Date
Kubernetes Submit Queue 8e2a444b6d
Merge pull request #66593 from stewart-yu/stewart-kubelet-commentclean
Automatic merge from submit-queue (batch tested with PRs 66593, 66727, 66558). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

remove the outdate comments in tryRegisterWithAPIServer

**What this PR does / why we need it**:
some judgement about ExternalID removed in #61877, so remove the outdate comments in tryRegisterWithAPIServer


**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2018-07-27 18:05:00 -07:00
stewart-yu ffbd7b22b3 remove the unnecessary comments in tryRegisterWithAPIServer for externalID removed in PR#61877 2018-07-25 11:23:56 +08:00
Da K. Ma aac9f1cbaa Taint node when initializing node.
Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>
2018-07-23 12:52:05 +08:00
Kubernetes Submit Queue 53ee0c8652
Merge pull request #65660 from mtaufen/incremental-refactor-kubelet-node-status
Automatic merge from submit-queue (batch tested with PRs 66152, 66406, 66218, 66278, 65660). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Refactor kubelet node status setters, add test coverage

This internal refactor moves the node status setters to a new package, explicitly injects dependencies to facilitate unit testing, and adds individual unit tests for the setters.

I gave each setter a distinct commit to facilitate review.

Non-goals:
- I intentionally excluded the class of setters that return a "modified" boolean, as I want to think more carefully about how to cleanly handle the behavior, and this PR is already rather large.
- I would like to clean up the status update control loops as well, but that belongs in a separate PR.

```release-note
NONE
```
2018-07-20 12:12:24 -07:00
Shimin Guo e8cd28ae57 fix a panic due to assignment to nil map 2018-07-17 12:34:20 -07:00
Michael Taufen 0f8976bf84 eliminate unnecessary helper 2018-07-16 09:09:48 -07:00
Michael Taufen 0170826542 port setVolumeLimits to Setter abstraction, add test 2018-07-16 09:09:48 -07:00
Michael Taufen c5a5e21639 port setNodeStatusGoRuntime to Setter abstraction 2018-07-16 09:09:48 -07:00
Michael Taufen 8e217f7102 port setNodeStatusImages to Setter abstraction, add test 2018-07-16 09:09:47 -07:00
Michael Taufen b7ec333f01 port setNodeStatusDaemonEndpoints to Setter abstraction 2018-07-16 09:09:47 -07:00
Michael Taufen 59bb21051e port setNodeStatusVersionInfo to Setter abstraction, add test 2018-07-16 09:09:47 -07:00
Michael Taufen 596fa89af0 port setNodeStatusMachineInfo to Setter abstraction, add test 2018-07-16 09:09:47 -07:00
Michael Taufen aa94a3ba4e lift node-info setters into defaultNodeStatusFuncs
Instead of hiding these behind a helper, we just register them in a
uniform way. We are careful to keep the call-order of the setters the
same, though we can consider re-ordering in a future PR to achieve
fewer appends.
2018-07-16 09:09:47 -07:00
Michael Taufen 2df7e1ad5c port setNodeVolumesInUseStatus to Setter abstraction, add test 2018-07-16 09:09:47 -07:00
Michael Taufen 3e03e0611e port setNodeReadyCondition to Setter abstraction, add test 2018-07-16 09:09:47 -07:00
Michael Taufen e0b6ae219f port setNodePIDPressureCondition to Setter abstraction, add test 2018-07-16 09:09:47 -07:00
Michael Taufen b26e4dfa7f port setNodeDiskPressureCondition to Setter abstraction, add test 2018-07-16 09:09:47 -07:00
Michael Taufen f057c9a4ae port setNodeMemoryPressureCondition to Setter abstraction, add test 2018-07-16 09:09:47 -07:00
Michael Taufen c33f321acd port setNodeOODCondition to Setter abstraction 2018-07-16 09:09:47 -07:00
Michael Taufen 15b03b8c0c port setNodeAddress to Setter abstraction, port test
also put cloud_request_manager.go in its own package
2018-07-16 09:09:47 -07:00
Michael Taufen d245e72bae remove incorrect comment referencing removed functionality
The cbr0 configuration behavior this comment references was removed in #34906
2018-07-16 09:03:13 -07:00
Jordan Liggitt 7828e5d0f9
Make cloud provider authoritative for node status address reporting 2018-07-10 14:33:48 -04:00
Jordan Liggitt db9d3c2d10
Derive kubelet serving certificate CSR template from node status addresses 2018-07-10 14:33:48 -04:00
Jan Chaloupka 9d9fb4de29 Put all the node address cloud provider retrival complex logic into cloudResourceSyncManager 2018-07-03 20:11:35 +02:00
Kubernetes Submit Queue 332da0a943
Merge pull request #64491 from hzxuzhonghu/kubelet-node-schedule-event-record
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

move oldNodeUnschedulable pkg var to kubelet struct

**What this PR does / why we need it**:

move oldNodeUnschedulable pkg var to kubelet struct


**Release note**:

```release-note
NONE
```
2018-06-20 23:02:52 -07:00
Jiaying Zhang 35efc4f96a Reconcile extended resource capacity after kubelet restart. 2018-06-05 14:38:49 -07:00
Hemant Kumar 1f9404dfc0 Implement kubelet side changes for writing volume limit to node
Add tests for checking node limits
2018-06-01 19:17:30 -04:00
Michael Taufen 0539086ff3 add a flag to control the cap on images reported in node status
While I normally try to avoid adding flags, this is a short term
scalability fix for v1.11, and there are other long-term solutions in
the works, so we shouldn't commit to this in the v1beta1 Kubelet config.
Flags are our escape hatch.
2018-05-30 12:54:30 -07:00
xuzhonghu 9492cf368e move oldNodeUnschedulable pkg var to kubelet struct 2018-05-30 14:09:13 +08:00
Kubernetes Submit Queue 792832bafc
Merge pull request #62242 from feiskyer/pod-cidr
Automatic merge from submit-queue (batch tested with PRs 63314, 63884, 63799, 63521, 62242). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Check CIDR before updating node status

**What this PR does / why we need it**:

Check CIDR before updating node status.  See #62164.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #62164

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2018-05-15 19:55:19 -07:00
Kubernetes Submit Queue 8220171d8a
Merge pull request #63492 from liggitt/node-heartbeat-close-connections
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

track/close kubelet->API connections on heartbeat failure

xref #48638
xref https://github.com/kubernetes-incubator/kube-aws/issues/598

we're already typically tracking kubelet -> API connections and have the ability to force close them as part of client cert rotation. if we do that tracking unconditionally, we gain the ability to also force close connections on heartbeat failure as well. it's a big hammer (means reestablishing pod watches, etc), but so is having all your pods evicted because you didn't heartbeat.

this intentionally does minimal refactoring/extraction of the cert connection tracking transport in case we want to backport this

* first commit unconditionally sets up the connection-tracking dialer, and moves all the cert management logic inside an if-block that gets skipped if no certificate manager is provided (view with whitespace ignored to see what actually changed)
* second commit plumbs the connection-closing function to the heartbeat loop and calls it on repeated failures

follow-ups:
* consider backporting this to 1.10, 1.9, 1.8
* refactor the connection managing dialer to not be so tightly bound to the client certificate management

/sig node
/sig api-machinery

```release-note
kubelet: fix hangs in updating Node status after network interruptions/changes between the kubelet and API server
```
2018-05-14 16:56:35 -07:00
Jordan Liggitt 814b065928
Close all kubelet->API connections on heartbeat failure 2018-05-07 15:06:31 -04:00
Micah Hausler 1a218aaee2 Report node DNS info with --node-ip
```release-note
Report node DNS info with --node-ip flag
```
2018-04-27 13:18:40 -07:00
Pengfei Ni 335d70a6d1 Check CIDR before updating node status 2018-04-27 11:07:48 +08:00
Kubernetes Submit Queue 5b77996433
Merge pull request #62543 from ingvagabund/timeout-on-cloud-provider-request
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Timeout on instances.NodeAddresses cloud provider request

**What this PR does / why we need it**:

In cases the cloud provider does not respond before the node gets evicted.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:


**Release note**:
```release-note
stop kubelet to cloud provider integration potentially wedging kubelet sync loop
```
2018-04-23 09:12:42 -07:00
Jan Chaloupka 61efc29394 Timeout on instances.NodeAddresses cloud provider request 2018-04-23 13:28:43 +02:00
Kubernetes Submit Queue 48243a9c24
Merge pull request #62780 from RenaudWasTaken/master
Automatic merge from submit-queue (batch tested with PRs 62780, 62886). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Change Capacity log verbosity in status update

*What this PR does / why we need it:*

While in production we noticed that the log verbosity for the Capacity field in the node status was to high.
This log message is called for every device plugin resource at every update.

A proposed solution is to tune it down from V(2) to V(5). In a normal setting you'll be able to see the effect by looking at the node status.

Release note:
```
NONE
```

/sig node
/area hw-accelerators
/assign @vikaschoudhary16 @jiayingz @vishh
2018-04-20 20:06:10 -07:00
Renaud Gaubert 7297dd33bb Change Capacity log verbosity in node status update 2018-04-20 16:11:02 +02:00
Mike Danese d02cf10123 remove last usage of external ID 2018-04-18 09:54:56 -07:00
Kubernetes Submit Queue 09ec7bf548
Merge pull request #60692 from adnavare/bug/60466
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Cleanup the use of ExternalID as it is deprecated

The patch removes ExternalID usage from node_controller
and node_lifecycle_oontroller. The code instead uses InstanceID
which returns the cloud provider ID as well.

fixes #60466
2018-04-09 11:58:12 -07:00
Kubernetes Submit Queue 1d030799e3
Merge pull request #61183 from ingvagabund/node-status-be-more-verbose
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Node status be more verbose

**What this PR does / why we need it**:
Improve logging ability of node status so it is easier to debug update of a node status

```release-note
NONE
```
2018-04-06 19:25:19 -07:00
Rohit Agarwal 87dda3375b Delete in-tree support for NVIDIA GPUs.
This removes the alpha Accelerators feature gate which was deprecated in 1.10.
The alternative feature DevicePlugins went beta in 1.10.
2018-04-02 20:17:01 -07:00
Anup Navare 1335e6e2d4 Cleanup the use of ExternalID as it is deprecated
The patch removes ExternalID usage from node_controller
and node_lifecycle_oontroller. The code instead uses InstanceID
which returns the cloud provider ID as well.
2018-04-02 10:15:32 -07:00
Jan Chaloupka 6d820d5a66 Node status be more verbose 2018-03-14 17:02:28 +01:00
Jing Xu b2e744c620 Promote LocalStorageCapacityIsolation feature to beta
The LocalStorageCapacityIsolation feature added a new resource type
ResourceEphemeralStorage "ephemeral-storage" so that this resource can
be allocated, limited, and consumed as the same way as CPU/memory. All
the features related to resource management (resource request/limit, quota, limitrange) are avaiable for local ephemeral storage.

This local ephemeral storage represents the storage for root file system, which will be consumed by containers' writtable layer and logs. Some volumes such as emptyDir might also consume this storage.
2018-03-02 15:10:08 -08:00
Yang Guo 8d880506fe Support cluster-level extended resources in kubelet and kube-scheduler
Co-authored-by: Yang Guo <ygg@google.com>
Co-authored-by: Chun Chen <chenchun.feed@gmail.com>
2018-02-27 17:25:30 -08:00
wackxu f737ad62ed update import 2018-02-27 20:23:35 +08:00
Kubernetes Submit Queue 244549f02a
Merge pull request #59769 from dashpole/capacity_ephemeral_storage
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Collect ephemeral storage capacity on initialization

**What this PR does / why we need it**:
We have had some node e2e flakes where a pod can be rejected if it requests ephemeral storage.  This is because we don't set capacity and allocatable for ephemeral storage on initialization.
This PR causes cAdvisor to do one round of stats collection during initialization, which will allow it to get the disk capacity when it first sets the node status.
It also sets the node to NotReady if capacities have not been initialized yet.

**Special notes for your reviewer**:

**Release note**:
```release-note
NONE
```
/assign @jingxu97 @Random-Liu 

/sig node
/kind bug
/priority important-soon
2018-02-16 11:17:02 -08:00
Kubernetes Submit Queue eac5bc0035
Merge pull request #57136 from k82cn/k8s_54313
Automatic merge from submit-queue (batch tested with PRs 57136, 59920). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Updated PID pressure node condition.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
part of #54313 

**Release note**:

```release-note
Updated PID pressure node condition
```
2018-02-16 10:35:33 -08:00
David Ashpole b259543985 collect ephemeral storage capacity on initialization 2018-02-15 17:33:22 -08:00