Commit Graph

125 Commits (8ad1c6655bde4ca8d61c3574980cffce5ee937a5)

Author SHA1 Message Date
Kubernetes Submit Queue 244549f02a
Merge pull request #59769 from dashpole/capacity_ephemeral_storage
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Collect ephemeral storage capacity on initialization

**What this PR does / why we need it**:
We have had some node e2e flakes where a pod can be rejected if it requests ephemeral storage.  This is because we don't set capacity and allocatable for ephemeral storage on initialization.
This PR causes cAdvisor to do one round of stats collection during initialization, which will allow it to get the disk capacity when it first sets the node status.
It also sets the node to NotReady if capacities have not been initialized yet.

**Special notes for your reviewer**:

**Release note**:
```release-note
NONE
```
/assign @jingxu97 @Random-Liu 

/sig node
/kind bug
/priority important-soon
2018-02-16 11:17:02 -08:00
Kubernetes Submit Queue eac5bc0035
Merge pull request #57136 from k82cn/k8s_54313
Automatic merge from submit-queue (batch tested with PRs 57136, 59920). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Updated PID pressure node condition.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
part of #54313 

**Release note**:

```release-note
Updated PID pressure node condition
```
2018-02-16 10:35:33 -08:00
David Ashpole b259543985 collect ephemeral storage capacity on initialization 2018-02-15 17:33:22 -08:00
Walter Fender e18e8ec3c0 Add context to all relevant cloud APIs
This adds context to all the relevant cloud provider interface signatures.
Callers of those APIs are currently satisfied using context.TODO().
There will be follow on PRs to push the context through the stack.
For an idea of the full scope of this change please look at PR #58532.
2018-02-06 12:49:17 -08:00
Da K. Ma 9a78753144 Updated PID pressure node condition.
Signed-off-by: Da K. Ma <madaxa@cn.ibm.com>
2018-01-14 18:26:00 +08:00
Kubernetes Submit Queue f2e46a2147
Merge pull request #57266 from vikaschoudhary16/unhealthy_device
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Handle Unhealthy devices

Update node capacity with sum of both healthy and unhealthy devices.
Node allocatable reflect only healthy devices.



**What this PR does / why we need it**:
Currently node capacity only reflects healthy devices. Unhealthy devices are ignored totally while updating node status. This PR handles unhealthy devices while updating node status. 

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #57241

**Special notes for your reviewer**:

**Release note**:
<!--  Write your release note:
Handle Unhealthy devices

```release-note
Handle Unhealthy devices
```
/cc @tengqm @ConnorDoyle @jiayingz @vishh @jeremyeder @sjenning @resouer @ScorpioCPH @lichuqiang @RenaudWasTaken @balajismaniam 

/sig node
2018-01-12 19:55:54 -08:00
vikaschoudhary16 e9cf3f1ac4 Handle Unhealthy devices
Update node capacity with sum of both healthy and unhealthy devices.
Node allocatable reflect only healthy devices.
2018-01-09 11:38:48 -05:00
Jonathan Basseri 30b89d830b Move scheduler code out of plugin directory.
This moves plugin/pkg/scheduler to pkg/scheduler and
plugin/cmd/kube-scheduler to cmd/kube-scheduler.

Bulk of the work was done with gomvpkg, except for kube-scheduler main
package.
2018-01-05 15:05:01 -08:00
Kubernetes Submit Queue 27d2ffb32f
Merge pull request #49856 from dixudx/polish_UpdateNodeStatus
Automatic merge from submit-queue (batch tested with PRs 49856, 56257, 57027, 57695, 57432). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Change to pkg/util/node.UpdateNodeStatus

**What this PR does / why we need it**:

> // TODO: Change to pkg/util/node.UpdateNodeStatus.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:
/cc @brendandburns @dchen1107 @lavalamp 

**Release note**:

```release-note
None
```
2018-01-02 13:15:42 -08:00
stewart-yu 50520be649 completely remove the option to use auto-detect 2017-11-28 09:54:28 +08:00
Jiaying Zhang 1eb4e79453 Extends deviceplugin to gracefully handle full device plugin lifecycle.
- Instead of using cm.capacity field to communicate device plugin resource
capacity, this PR changes to use an explicit cm.GetDevicePluginResourceCapacity()
function that returns device plugin resource capacity as well as any inactive
device plugin resource. Kubelet syncNodeStatus call this function during its
periodic run to update node status capacity and allocatable. After this call,
device plugin can remove the inactive device plugin resource from its allDevices
field as the update is already pushed to API server.
- Extends device plugin checkpoint data to record registered resources
so that we can finish resource removing even upon kubelet restarts.
- Passes sourcesReady from kubelet to device plugin to avoid removing
inactive pods during grace period of kubelet restart.
2017-11-20 23:40:14 -08:00
Michael Taufen 523c68ff65 Move ungated 'alpha' KubeletConfiguration fields and self-registration fields to KubeletFlags 2017-11-15 17:47:10 -08:00
Daneyon Hansen 7ac6fe9c5d Adds Support for Node Resource IPv6 Addressing
Adds support for the following:

1. A node resource to be assigned an IPv6 address.
2. Expands IPv4/v6 address validation checks.

Which issue this PR fixes:
fixes #44848 in combination with PR #45116

Special notes for your reviewer:

Release note:
With this PR, nodes can be assigned an IPv6 address. An IPv4 address is
preferred over an IPv6 address. IP address validation has been expanded
to check for multicast, link-local and unspecified addresses.
2017-11-10 15:13:53 -08:00
Dr. Stefan Schimanski 012b085ac8 pkg/apis/core: mechanical import fixes in dependencies 2017-11-09 12:14:08 +01:00
Dr. Stefan Schimanski d13b936a2a pkg/apis/core: fixup conversion func names in dependencies 2017-11-09 12:14:07 +01:00
Di Xu 13a355c837 refactor method to pkg/util/node 2017-11-06 09:51:09 +08:00
fisherxu 04b876e63c fix panic in kubelet 2017-11-01 17:06:17 +08:00
Kevin 4c8539cece use core client with explicit version globally 2017-10-27 15:48:32 +08:00
Jordan Liggitt 9df1f7ef11
Do not remove kubelet labels during startup 2017-10-17 11:49:02 -04:00
Michael Taufen 8180536bed Mulligan: Remove deprecated and experimental fields from KubeletConfiguration
Revert "Merge pull request #51857 from kubernetes/revert-51307-kc-type-refactor"

This reverts commit 9d27d92420, reversing
changes made to 2e69d4e625.

See original: #51307

We punted this from 1.8 so it could go through an API review. The point
of this PR is that we are trying to stabilize the kubeletconfig API so
that we can move it out of alpha, and unblock features like Dynamic
Kubelet Config, Kubelet loading its initial config from a file instead
of flags, kubeadm and other install tools having a versioned API to rely
on, etc.

We shouldn't rev the version without both removing all the deprecated
junk from the KubeletConfiguration struct, and without (at least
temporarily) removing all of the fields that have "Experimental" in
their names. It wouldn't make sense to lock in to deprecated fields.
"Experimental" fields can be audited on a 1-by-1 basis after this PR,
and if found to be stable (or sufficiently alpha-gated), can be restored
to the KubeletConfiguration without the "Experimental" prefix.
2017-10-11 09:52:39 -07:00
Dr. Stefan Schimanski ed586da147 apimachinery: remove Scheme.DeepCopy 2017-10-06 14:59:17 +02:00
Jiaying Zhang 6fecd04924 Fixes a regression introduced by PR 52290 that extended resource
capacity may temporarily drop to zero after kubelet restarts and
PODs restarted during that time window could fail to be scheduled.
2017-10-03 10:26:53 -07:00
Kubernetes Submit Queue 2c5413b379 Merge pull request #50422 from karataliu/apid
Automatic merge from submit-queue (batch tested with PRs 50294, 50422, 51757, 52379, 52014). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..

Fix AnnotationProvidedIPAddr annotation for externalCloudProvider

**What this PR does / why we need it**:
In #44258, it introduced `AnnotationProvidedIPAddr`. When kubelet has 'node-ip' parameter set, and cloud provider not set, this annotation would be populated, and then will be validated by cloud-controller-manager:
https://github.com/kubernetes/kubernetes/pull/44258/files#diff-6b0808bd1afb15f9f77986f4459601c2R465

Later with #47152, externalCloudProvider is checked and func returns before that annotation got set. In this case, that annotation will not get populated.

This fix is to bring that annotation assignment to a proper location.

Please correct me if I have any misunderstanding.
@wlan0 @ublubu 

**Which issue this PR fixes**

**Special notes for your reviewer**:

**Release note**:
2017-09-23 11:40:47 -07:00
Kubernetes Submit Queue 3277de69b4 Merge pull request #52176 from liggitt/heartbeat-timeout
Automatic merge from submit-queue (batch tested with PRs 52176, 43152). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..

Eliminate hangs/throttling of node heartbeat

Fixes https://github.com/kubernetes/kubernetes/issues/48638
Fixes #50304

Stops kubelet from wedging when updating node status if unable to establish tcp connection.

 Notes that this only affects the node status loop. The pod sync loop would still hang until the dead TCP connections timed out,  so more work is needed to keep the sync loop responsive in the face of network issues, but this change lets existing pods coast without the node controller trying to evict them

```release-note
kubelet to master communication when doing node status updates now has a timeout to prevent indefinite hangs
```
2017-09-16 09:45:29 -07:00
Jiaying Zhang 5cac9fc984 Fixes device plugin re-registration handling logic to make sure:
- If a device plugin exits, its exported resource will be removed.
- No capacity change if a new device plugin instance comes up to replace the old instance.
2017-09-14 15:24:46 -07:00
Jordan Liggitt f8f57d8959
Use separate client for node status loop 2017-09-14 15:56:22 -04:00
Kubernetes Submit Queue a51eb2ac4e Merge pull request #49202 from cbonte/node-addresses
Automatic merge from submit-queue (batch tested with PRs 51728, 49202)

Fix setNodeAddress when a node IP and a cloud provider are set

**What this PR does / why we need it**:
When a node IP is set and a cloud provider returns the same address with
several types, only the first address was accepted. With the changes made
in PR #45201, the vSphere cloud provider returned the ExternalIP first,
which led to a node without any InternalIP.

The behaviour is modified to return all the address types for the
specified node IP.

**Which issue this PR fixes**: fixes #48760

**Special notes for your reviewer**:
* I'm not a golang expert, is it possible to mock `kubelet.validateNodeIP()` to avoid the need of real host interface addresses in the test ?
* It would be great to have it backported for a next 1.6.8 release.

**Release note**:
```release-note
NONE
```
2017-09-06 20:01:00 -07:00
Derek Carr 1ec2a69d9a Kubelet changes to support hugepages 2017-09-05 09:46:08 -04:00
Jiaying Zhang 02001af752 Kubelet side extension to support device allocation 2017-09-01 11:56:35 -07:00
Renaud Gaubert c4a1c97329 Device Plugin Kubelet integration 2017-09-01 11:47:09 -07:00
Cyril Bonté 2b2a5c6500 Fix setNodeAddress when a node IP and a cloud provider are set
When a node IP is set and a cloud provider returns the same address with
several types, on the first address was accepted. With the changes made
in PR #45201, the vSphere cloud provider returned the ExternalIP first,
which led to a node without any InternalIP.

The behaviour is modified to return all the address types for the
specified node IP.

Issue #48760
2017-08-29 17:09:25 +02:00
Kubernetes Submit Queue c17d70c240 Merge pull request #47044 from kubermatic/kubelet-update-default-labels
Automatic merge from submit-queue

Always check if default labels on node need to be updated in kubelet

**What this PR does / why we need it**:
Nodes join again but maybe OS/Arch/Instance-Type has changed in the meantime.
In this case the kubelet needs to check if the default labels are still correct and if not it needs to update them.

```release-note
Kubelet updates default labels if those are deprecated
```
2017-08-28 08:20:19 -07:00
NickrenREN 27901ad5df Change eviction policy to manage one single local storage resource 2017-08-26 05:14:49 +08:00
Henrik Schmidt 80156474cf Always check if default labels on node need to be updated in kubelet 2017-08-22 12:54:07 +02:00
Connor Doyle 630af5422b OIR predicate includes namespaced resources. 2017-08-16 15:29:24 -07:00
Dong Liu c52bdc8e74 Fix AnnotationProvidedIPAddr for externalCloudProvider 2017-08-10 10:49:55 +08:00
Kubernetes Submit Queue 58819b0204 Merge pull request #47416 from allencloud/simplify-if-else
Automatic merge from submit-queue

simplify if and else for code

Signed-off-by: allencloud <allen.sun@daocloud.io>

**What this PR does / why we need it**:
This PR tries to simplify the code of if and else, and this could make code a little bit cleaner.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
NONE

**Special notes for your reviewer**:
NONE

**Release note**:

```release-note
NONE
```
2017-08-05 03:10:10 -07:00
David Ashpole 177d64213c fix outofdisk condition not reported 2017-08-03 13:44:31 -07:00
David Ashpole 8a518099ca set nodeOODCondition 2017-07-31 11:38:20 -07:00
allencloud 6300361961 simplify if and else for code
Signed-off-by: allencloud <allen.sun@daocloud.io>
2017-07-26 10:41:23 +08:00
David Ashpole 7a23f8b018 remove deprecated flags LowDiskSpaceThresholdMB and OutOfDiskTransitionFrequency 2017-07-20 13:23:13 -07:00
Klaus Ma 63b78a37e0 Added golint check for pkg/kubelet. 2017-07-19 11:33:06 +08:00
Tim Allclair a2f2e1d491 Name change: s/timstclair/tallclair/ 2017-07-10 14:05:46 -07:00
Vishnu kannan 82f7820066 Kubelet:
Centralize Capacity discovery of standard resources in Container manager.
Have storage derive node capacity from container manager.
Move certain cAdvisor interfaces to the cAdvisor package in the process.

This patch fixes a bug in container manager where it was writing to a map without synchronization.

Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-06-27 18:45:02 -07:00
Kubernetes Submit Queue 7800b3ffef Merge pull request #47152 from ublubu/cloud-addresses
Automatic merge from submit-queue

kubelet should let cloud-controller-manager set the node addresses

*Before this change:*

1. cloud-controller-manager sets all the addresses for a node.
2. kubelet on that node replaces these addresses with an incomplete set. (i.e. replace InternalIP and Hostname and delete all other addresses--ExternalIP, etc.)

*After this change:*

kubelet doesn't touch its node's addresses when there is an external cloudprovider.

Fixes #47155

```release-note
NONE
```
2017-06-24 09:31:15 -07:00
Chao Xu f4989a45a5 run root-rewrite-v1-..., compile 2017-06-22 10:25:57 -07:00
Cheng Xing de3bf36b61 Fixing node statuses related to local storage capacity isolation.
- Wrapping all node statuses from local storage capacity isolation under an alpha feature check. Currently there should not be any storage statuses.
- Replaced all "storage" statuses with "storage.kubernetes.io/scratch". "storage" should never be exposed as a status.
2017-06-20 17:34:59 -07:00
ublubu 46465c0a5a Kubelet doesn't override addrs from Cloud provider 2017-06-07 22:27:18 -04:00
Kubernetes Submit Queue b8c9ee8abb Merge pull request #46456 from jingxu97/May/allocatable
Automatic merge from submit-queue

Add local storage (scratch space) allocatable support

This PR adds the support for allocatable local storage (scratch space).
This feature is only for root file system which is shared by kubernetes
componenets, users' containers and/or images. User could use
--kube-reserved flag to reserve the storage for kube system components.
If the allocatable storage for user's pods is used up, some pods will be
evicted to free the storage resource.

This feature is part of local storage capacity isolation and described in the proposal https://github.com/kubernetes/community/pull/306

**Release note**:

```release-note
This feature exposes local storage capacity for the primary partitions, and supports & enforces storage reservation in Node Allocatable 
```
2017-06-03 00:24:29 -07:00
Jing Xu dd67e96c01 Add local storage (scratch space) allocatable support
This PR adds the support for allocatable local storage (scratch space).
This feature is only for root file system which is shared by kubernetes
componenets, users' containers and/or images. User could use
--kube-reserved flag to reserve the storage for kube system components.
If the allocatable storage for user's pods is used up, some pods will be
evicted to free the storage resource.
2017-06-01 15:57:50 -07:00