Commit Graph

103 Commits (45ed7ae05179ee4c5f5646bea9a748d07892fe3c)

Author SHA1 Message Date
Kubernetes Prow Robot 3e748958dc
Merge pull request #73333 from yujuhong/os-arch-labels
kubelet: promote OS & arch labels to GA
2019-02-15 16:45:57 -08:00
Yu-Ju Hong 5fd27c38da Move beta OS/Arch labels back to the kubelet package
These labels are being deprecated
2019-02-13 18:09:49 -08:00
Kubernetes Prow Robot a684bd5eb1
Merge pull request #73556 from msau42/triage-72931
Mark volume as in use even when node status didn't change
2019-02-12 17:29:05 -08:00
Michelle Au 80a2698a02 Add unit tests for volumesinuse during node status update 2019-02-12 13:46:30 -08:00
Yu-Ju Hong 04575f01b0 kubelet: promote OS & arch labels to GA
kubelet now applies both the beta and the GA labels to ensure backward
compatibility.
2019-02-11 11:24:58 -08:00
Davanum Srinivas b975573385
move pkg/kubelet/apis/well_known_labels.go to staging/src/k8s.io/api/core/v1/
Co-Authored-By: Weibin Lin <linweibin1@huawei.com>

Change-Id: I163b2f2833e6b8767f72e2c815dcacd0f4e504ea
2019-02-05 13:39:07 -05:00
Yu-Ju Hong 00d93f0cc3 kubelet: promote OS & arch labels to GA
kubelet now applies both the beta and the GA labels to ensure backward
compatibility.
2019-01-17 14:57:43 -08:00
Zhen Wang 98fc4a107a Update kubelet node status report logic with node lease feature
When node lease feature is enabled, kubelet reports node status to api server
only if there is some change or it didn't report over last report interval.
2018-11-07 11:59:42 -08:00
k8s-ci-robot 77fd16e0e7
Merge pull request #69266 from SataQiu/fix-20180930
Remove the redundant space
2018-11-02 08:38:37 -07:00
SataQiu c9bc625428 Remove the redundant space 2018-11-02 15:45:11 +08:00
Pingan2017 36997bae77 remove OutOfDisk condition in kubelet 2018-10-23 11:01:26 +08:00
tanshanshan b7c7966b9f Move pkg/scheduler/algorithm/well_known_labels.go out 2018-10-13 09:10:00 +08:00
Michael Taufen 1b7d06e025 Kubelet creates and manages node leases
This extends the Kubelet to create and periodically update leases in a
new kube-node-lease namespace. Based on [KEP-0009](https://github.com/kubernetes/community/blob/master/keps/sig-node/0009-node-heartbeat.md),
these leases can be used as a node health signal, and will allow us to
reduce the load caused by over-frequent node status reporting.

- add NodeLease feature gate
- add kube-node-lease system namespace for node leases
- add Kubelet option for lease duration
- add Kubelet-internal lease controller to create and update lease
- add e2e test for NodeLease feature
- modify node authorizer and node restriction admission controller
to allow Kubelets access to corresponding leases
2018-08-26 16:03:36 -07:00
Da K. Ma aac9f1cbaa Taint node when initializing node.
Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>
2018-07-23 12:52:05 +08:00
Kubernetes Submit Queue 53ee0c8652
Merge pull request #65660 from mtaufen/incremental-refactor-kubelet-node-status
Automatic merge from submit-queue (batch tested with PRs 66152, 66406, 66218, 66278, 65660). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Refactor kubelet node status setters, add test coverage

This internal refactor moves the node status setters to a new package, explicitly injects dependencies to facilitate unit testing, and adds individual unit tests for the setters.

I gave each setter a distinct commit to facilitate review.

Non-goals:
- I intentionally excluded the class of setters that return a "modified" boolean, as I want to think more carefully about how to cleanly handle the behavior, and this PR is already rather large.
- I would like to clean up the status update control loops as well, but that belongs in a separate PR.

```release-note
NONE
```
2018-07-20 12:12:24 -07:00
Shimin Guo e8cd28ae57 fix a panic due to assignment to nil map 2018-07-17 12:34:20 -07:00
Michael Taufen 0170826542 port setVolumeLimits to Setter abstraction, add test 2018-07-16 09:09:48 -07:00
Michael Taufen 8e217f7102 port setNodeStatusImages to Setter abstraction, add test 2018-07-16 09:09:47 -07:00
Michael Taufen 596fa89af0 port setNodeStatusMachineInfo to Setter abstraction, add test 2018-07-16 09:09:47 -07:00
Michael Taufen 15b03b8c0c port setNodeAddress to Setter abstraction, port test
also put cloud_request_manager.go in its own package
2018-07-16 09:09:47 -07:00
Kubernetes Submit Queue f70410959d
Merge pull request #65226 from ingvagabund/store-cloud-provider-latest-node-addresses
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Store the latest cloud provider node addresses

**What this PR does / why we need it**:
Buffer the recently retrieved node address so they can be used as soon as the next node status update is run.


**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #65814

**Special notes for your reviewer**:

**Release note**:

```release-note
None
```
2018-07-09 10:47:07 -07:00
Jan Chaloupka 9d9fb4de29 Put all the node address cloud provider retrival complex logic into cloudResourceSyncManager 2018-07-03 20:11:35 +02:00
Tim Allclair 5955b839ff
Cleanup verbose cAdvisor mocking in Kubelet unit tests 2018-06-27 11:53:41 -07:00
Kubernetes Submit Queue a32e5b6a59
Merge pull request #64784 from jiayingz/status-ready
Automatic merge from submit-queue (batch tested with PRs 63717, 64646, 64792, 64784, 64800). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Reconcile extended resource capacity after kubelet restart.

**What this PR does / why we need it**:

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/kubernetes/issues/64632

**Special notes for your reviewer**:

**Release note**:

```release-note
Kubelet will set extended resource capacity to zero after it restarts. If the extended resource is exported by a device plugin, its capacity will change to a valid value after the device plugin re-connects with the Kubelet. If the extended resource is exported by an external component through direct node status capacity patching, the component should repatch the field after kubelet becomes ready again. During the time gap, pods previously assigned with such resources may fail kubelet admission but their controller should create new pods in response to such failures.
```
2018-06-06 01:24:21 -07:00
Jiaying Zhang 35efc4f96a Reconcile extended resource capacity after kubelet restart. 2018-06-05 14:38:49 -07:00
Hemant Kumar 32b69193c6 Fix panic caused by no cloudprovider in test
We should not panic when no cloudprovider is present
2018-06-04 14:50:18 -04:00
Hemant Kumar 1f9404dfc0 Implement kubelet side changes for writing volume limit to node
Add tests for checking node limits
2018-06-01 19:17:30 -04:00
Kubernetes Submit Queue e978c47f5e
Merge pull request #64170 from mtaufen/cap-node-num-images
Automatic merge from submit-queue (batch tested with PRs 61803, 64305, 64170, 64361, 64339). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

add a flag to control the cap on images reported in node status

While I normally try to avoid adding flags, this is a short term
scalability fix for v1.11, and there are other long-term solutions in
the works, so we shouldn't commit to this in the v1beta1 Kubelet config.
Flags are our escape hatch here.

```release-note
NONE
```
2018-05-30 17:34:18 -07:00
Michael Taufen 0539086ff3 add a flag to control the cap on images reported in node status
While I normally try to avoid adding flags, this is a short term
scalability fix for v1.11, and there are other long-term solutions in
the works, so we shouldn't commit to this in the v1beta1 Kubelet config.
Flags are our escape hatch.
2018-05-30 12:54:30 -07:00
Minhan Xia cb9ac04777 fix unit tests using Patch in fake client 2018-05-30 11:33:55 -07:00
Kubernetes Submit Queue 8220171d8a
Merge pull request #63492 from liggitt/node-heartbeat-close-connections
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

track/close kubelet->API connections on heartbeat failure

xref #48638
xref https://github.com/kubernetes-incubator/kube-aws/issues/598

we're already typically tracking kubelet -> API connections and have the ability to force close them as part of client cert rotation. if we do that tracking unconditionally, we gain the ability to also force close connections on heartbeat failure as well. it's a big hammer (means reestablishing pod watches, etc), but so is having all your pods evicted because you didn't heartbeat.

this intentionally does minimal refactoring/extraction of the cert connection tracking transport in case we want to backport this

* first commit unconditionally sets up the connection-tracking dialer, and moves all the cert management logic inside an if-block that gets skipped if no certificate manager is provided (view with whitespace ignored to see what actually changed)
* second commit plumbs the connection-closing function to the heartbeat loop and calls it on repeated failures

follow-ups:
* consider backporting this to 1.10, 1.9, 1.8
* refactor the connection managing dialer to not be so tightly bound to the client certificate management

/sig node
/sig api-machinery

```release-note
kubelet: fix hangs in updating Node status after network interruptions/changes between the kubelet and API server
```
2018-05-14 16:56:35 -07:00
Kubernetes Submit Queue 6017f6daef
Merge pull request #63170 from micahhausler/node-ip-fix
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Report node DNS info with --node-ip

**What this PR does / why we need it**:
This PR adds `ExternalDNS`, `InternalDNS`, and `ExternalIP` info for kubelets with the `--nodeip` flag enabled. 

**Which issue(s) this PR fixes** 
Fixes #63158

**Special notes for your reviewer**:

I added a field to the Kubelet to make IP validation more testable (`validateNodeIP` relies on the `net` package and the IP address of the host that is executing the test.) I also converted the test to use a table so new cases could be added more easily.

**Release Notes**
```release-note
Report node DNS info with --node-ip flag
```

@andrewsykim
@nckturner 

/sig node
/sig network
2018-05-11 15:46:35 -07:00
Jordan Liggitt 814b065928
Close all kubelet->API connections on heartbeat failure 2018-05-07 15:06:31 -04:00
Kubernetes Submit Queue bc1c2de163
Merge pull request #62914 from sjenning/kubelet-unit-flake
Automatic merge from submit-queue (batch tested with PRs 62914, 63431). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

kubelet: fix flake in TestUpdateExistingNodeStatusTimeout

xref https://github.com/openshift/origin/issues/19443

There are cases where some process, outside the test, attempts to connect to the port we are using to do the test, leading to a attempt count greater than what we expect.

To deal with this, just ensure that we have seen *at least* the number of connection attempts we expect.

@liggitt 

```release-note
NONE
```
2018-05-07 10:44:05 -07:00
Micah Hausler 1a218aaee2 Report node DNS info with --node-ip
```release-note
Report node DNS info with --node-ip flag
```
2018-04-27 13:18:40 -07:00
Jan Chaloupka 61efc29394 Timeout on instances.NodeAddresses cloud provider request 2018-04-23 13:28:43 +02:00
Seth Jennings dfb8870000 kubelet: fix flake in TestUpdateExistingNodeStatusTimeout 2018-04-20 11:48:51 -05:00
Mike Danese f427531179 boring 2018-04-18 09:55:57 -07:00
Guangya Liu 8d92814bd0 Extract validateNodeIP test to node status test file.
The function of `validateNodeIP` is belong to kubelet_node_status,
so the unit test of this function should be in node status test file.
2018-04-09 12:02:06 +08:00
Jing Xu b2e744c620 Promote LocalStorageCapacityIsolation feature to beta
The LocalStorageCapacityIsolation feature added a new resource type
ResourceEphemeralStorage "ephemeral-storage" so that this resource can
be allocated, limited, and consumed as the same way as CPU/memory. All
the features related to resource management (resource request/limit, quota, limitrange) are avaiable for local ephemeral storage.

This local ephemeral storage represents the storage for root file system, which will be consumed by containers' writtable layer and logs. Some volumes such as emptyDir might also consume this storage.
2018-03-02 15:10:08 -08:00
wackxu f737ad62ed update import 2018-02-27 20:23:35 +08:00
Kubernetes Submit Queue 244549f02a
Merge pull request #59769 from dashpole/capacity_ephemeral_storage
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Collect ephemeral storage capacity on initialization

**What this PR does / why we need it**:
We have had some node e2e flakes where a pod can be rejected if it requests ephemeral storage.  This is because we don't set capacity and allocatable for ephemeral storage on initialization.
This PR causes cAdvisor to do one round of stats collection during initialization, which will allow it to get the disk capacity when it first sets the node status.
It also sets the node to NotReady if capacities have not been initialized yet.

**Special notes for your reviewer**:

**Release note**:
```release-note
NONE
```
/assign @jingxu97 @Random-Liu 

/sig node
/kind bug
/priority important-soon
2018-02-16 11:17:02 -08:00
Kubernetes Submit Queue eac5bc0035
Merge pull request #57136 from k82cn/k8s_54313
Automatic merge from submit-queue (batch tested with PRs 57136, 59920). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Updated PID pressure node condition.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
part of #54313 

**Release note**:

```release-note
Updated PID pressure node condition
```
2018-02-16 10:35:33 -08:00
David Ashpole b259543985 collect ephemeral storage capacity on initialization 2018-02-15 17:33:22 -08:00
Tim Hockin 3586986416 Switch to k8s.gcr.io vanity domain
This is the 2nd attempt.  The previous was reverted while we figured out
the regional mirrors (oops).

New plan: k8s.gcr.io is a read-only facade that auto-detects your source
region (us, eu, or asia for now) and pulls from the closest.  To publish
an image, push k8s-staging.gcr.io and it will be synced to the regionals
automatically (similar to today).  For now the staging is an alias to
gcr.io/google_containers (the legacy URL).

When we move off of google-owned projects (working on it), then we just
do a one-time sync, and change the google-internal config, and nobody
outside should notice.

We can, in parallel, change the auto-sync into a manual sync - send a PR
to "promote" something from staging, and a bot activates it.  Nice and
visible, easy to keep track of.
2018-02-07 21:14:19 -08:00
Da K. Ma 9a78753144 Updated PID pressure node condition.
Signed-off-by: Da K. Ma <madaxa@cn.ibm.com>
2018-01-14 18:26:00 +08:00
Tim Hockin e9dd8a68f6 Revert k8s.gcr.io vanity domain
This reverts commit eba5b6092a.

Fixes https://github.com/kubernetes/kubernetes/issues/57526
2017-12-22 14:36:16 -08:00
Tim Hockin eba5b6092a Use k8s.gcr.io vanity domain for container images 2017-12-18 09:18:34 -08:00
Daneyon Hansen 7ac6fe9c5d Adds Support for Node Resource IPv6 Addressing
Adds support for the following:

1. A node resource to be assigned an IPv6 address.
2. Expands IPv4/v6 address validation checks.

Which issue this PR fixes:
fixes #44848 in combination with PR #45116

Special notes for your reviewer:

Release note:
With this PR, nodes can be assigned an IPv6 address. An IPv4 address is
preferred over an IPv6 address. IP address validation has been expanded
to check for multicast, link-local and unspecified addresses.
2017-11-10 15:13:53 -08:00
Jordan Liggitt 9df1f7ef11
Do not remove kubelet labels during startup 2017-10-17 11:49:02 -04:00