Commit Graph

281 Commits (f3512cbbb93bba43773da64f09f4f9296fa0936c)

Author SHA1 Message Date
Kubernetes Submit Queue c817765b0e
Merge pull request #58445 from hanxiaoshuai/typo
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

fix some typos in comments

**What this PR does / why we need it**:

Fixes # fix some typos in comments
2018-01-30 19:44:44 -08:00
Kubernetes Submit Queue bf111161b7
Merge pull request #57973 from dims/set-pids-limit-at-pod-level
Automatic merge from submit-queue (batch tested with PRs 57973, 57990). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Set pids limit at pod level

**What this PR does / why we need it**:

Add a new Alpha Feature to set a maximum number of pids per Pod.
This is to allow the use case where cluster administrators wish
to limit the pids consumed per pod (example when running a CI system).

By default, we do not set any maximum limit, If an administrator wants
to enable this, they should enable `SupportPodPidsLimit=true` in the
`--feature-gates=` parameter to kubelet and specify the limit using the
`--pod-max-pids` parameter.

The limit set is the total count of all processes running in all
containers in the pod.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #43783

**Special notes for your reviewer**:

**Release note**:

```release-note
New alpha feature to limit the number of processes running in a pod. Cluster administrators will be able to place limits by using the new kubelet command line parameter --pod-max-pids. Note that since this is a alpha feature they will need to enable the "SupportPodPidsLimit" feature.
```
2018-01-25 18:29:31 -08:00
Connor Doyle e5667cf426 Rename package deviceplugin => devicemanager. 2018-01-24 22:32:43 -08:00
Kubernetes Submit Queue 62616d79ad
Merge pull request #58053 from tianshapjq/nit-errUnsupportedVersion
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

typo of errUnsuportedVersion

**What this PR does / why we need it**:
typo of errUnsuportedVersion in pkg/kubelet/cm/deviceplugin/types.go

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note

```NONE
2018-01-19 03:26:34 -08:00
Kubernetes Submit Queue 44d0ba29d3
Merge pull request #56960 from islinwb/remove_unused_code_ut_pkg
Automatic merge from submit-queue (batch tested with PRs 53631, 56960). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Remove unused code in UT files in pkg/

**What this PR does / why we need it**:
Remove unused code in UT files in pkg/ .

**Release note**:

```release-note
NONE
```
2018-01-18 02:41:29 -08:00
hangaoshuai 005f8c4926 fix some typos in comments 2018-01-18 17:07:51 +08:00
vikaschoudhary16 9c847fc4d6 Call Dial in blocking mode 2018-01-16 10:50:17 -05:00
linweibin fa8afc1d39 Remove unused code in UT files in pkg/ 2018-01-15 16:02:35 +08:00
Kubernetes Submit Queue 9007df35b9
Merge pull request #55921 from ScorpioCPH/fix-endpoint-ut
Automatic merge from submit-queue (batch tested with PRs 58216, 58193, 53033, 58219, 55921). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix device plugin endpoint UT

**What this PR does / why we need it**:
Fix some issues in device plugin endpoint UT.

**Which issue(s) this PR fixes**:
Fixes #55920

**Special notes for your reviewer**:

@jiayingz @RenaudWasTaken @lichuqiang PTAL.

/sig node

**Release note**:

```release-note
None
```
2018-01-13 03:34:57 -08:00
Kubernetes Submit Queue f2e46a2147
Merge pull request #57266 from vikaschoudhary16/unhealthy_device
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Handle Unhealthy devices

Update node capacity with sum of both healthy and unhealthy devices.
Node allocatable reflect only healthy devices.



**What this PR does / why we need it**:
Currently node capacity only reflects healthy devices. Unhealthy devices are ignored totally while updating node status. This PR handles unhealthy devices while updating node status. 

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #57241

**Special notes for your reviewer**:

**Release note**:
<!--  Write your release note:
Handle Unhealthy devices

```release-note
Handle Unhealthy devices
```
/cc @tengqm @ConnorDoyle @jiayingz @vishh @jeremyeder @sjenning @resouer @ScorpioCPH @lichuqiang @RenaudWasTaken @balajismaniam 

/sig node
2018-01-12 19:55:54 -08:00
Penghao Cen b96c383ef7 Check grpc server ready properly 2018-01-13 05:47:49 +08:00
Penghao Cen 90bc1265cf Fix endpoint not work issue 2018-01-12 20:09:07 +08:00
Davanum Srinivas ecd6361ff0 Set pids limit at pod level
Add a new Alpha Feature to set a maximum number of pids per Pod.
This is to allow the use case where cluster administrators wish
to limit the pids consumed per pod (example when running a CI system).

By default, we do not set any maximum limit, If an administrator wants
to enable this, they should enable `SupportPodPidsLimit=true` in the
`--feature-gates=` parameter to kubelet and specify the limit using the
`--pod-max-pids` parameter.

The limit set is the total count of all processes running in all
containers in the pod.
2018-01-11 21:22:38 -05:00
Penghao Cen 671c4eb2b7 Add e2e test logic for device plugin 2018-01-11 14:41:45 +08:00
Penghao Cen dc5384a139 Don't rewrite device health 2018-01-11 14:18:13 +08:00
tianshapjq e8005face7 typo of errUnsuportedVersion 2018-01-10 15:47:11 +08:00
vikaschoudhary16 e9cf3f1ac4 Handle Unhealthy devices
Update node capacity with sum of both healthy and unhealthy devices.
Node allocatable reflect only healthy devices.
2018-01-09 11:38:48 -05:00
Jonathan Basseri 85c5862552 Fix scheduler refs in BUILD files.
Update references to moved scheduler code.
2018-01-05 15:05:01 -08:00
Jonathan Basseri 30b89d830b Move scheduler code out of plugin directory.
This moves plugin/pkg/scheduler to pkg/scheduler and
plugin/cmd/kube-scheduler to cmd/kube-scheduler.

Bulk of the work was done with gomvpkg, except for kube-scheduler main
package.
2018-01-05 15:05:01 -08:00
Kubernetes Submit Queue 4d215fd235
Merge pull request #56611 from tianshapjq/testcase-cgroup_manager_linux.go
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

new testcase to cgroup_manager_linux.go

a new test case to adaptName(), for testing "cgroupManagerType != libcontainerSystemd"
2017-12-28 11:11:47 -08:00
Kubernetes Submit Queue a4eb2f96d0
Merge pull request #57610 from vikaschoudhary16/remove-redundant-sleep
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Remove redundant sleep from ReRegistration unit test case

/kind cleanup
/sig node

**What this PR does / why we need it**:
Once upon a time, there was a race in the device plugin registration logic.  At that time, [list()](5cac9fc984/pkg/kubelet/deviceplugin/manager.go (L206)) and [listAndWatch()](5cac9fc984/pkg/kubelet/deviceplugin/manager.go (L224)) used to be separate functions. Race was there for taking manager.mutex lock from two places. [One, from within the m.addEndpoint()](5cac9fc984/pkg/kubelet/deviceplugin/manager.go (L214)) and the [second, from within m.Devices()](5cac9fc984/pkg/kubelet/deviceplugin/manager.go (L137)).  This race was making `TestDevicePluginReRegistration` flaky as explained below.
 	
```
1.     p1.Register(socketName, testResourceName)
2.  	// Wait for the first callback to be issued.
3.  	<-callbackChan
4.        devices := m.Devices()  
```
* L#1 leads to eventually **asynchronous** invocation of m.addEndpoint(), let say **thread1**.
* L#3 holds the test case execution till the [callback gets invoked](5cac9fc984/pkg/kubelet/deviceplugin/endpoint.go (L108)). This means test case execution waits on channel till the **thread1**  reaches the point where [e.list() call completes in the addEndpoint.](5cac9fc984/pkg/kubelet/deviceplugin/manager.go (L206)) 
* L#4 triggers a new thread. thread1 and this new thread are both racing for m.mutex.Lock(). Former, in the addEndpoint() and later one in the m.Devices(). If m.Devices wins the race, result is the test case failure because endpoint gets added in the manager only after taking mutex.Lock() in the addEndpoint().

To deal with this flake, we added `Sleep` between L#3 and L#4.  `Sleep` was getting some extra time to addEndpoint() and thus making thread1 win the race each time.

Above explained race scenario got fixed and merged sometime back in this PR:
[Deviceplugin refactoring: merge func list and listwatch in endpoint into one](https://github.com/kubernetes/kubernetes/pull/52149)
With the above PR, callback function is invoked from e.run() which makes sure that test case waits on channel till the endpoint is added and devices are updated
Above explained race scenario does not exist now, therefore removing redundant sleeps from the test case.

Tested:
go test -race -count 500 k8s.io/kubernetes/pkg/kubelet/cm/deviceplugin -run TestDevicePluginReRegistration  -timeout 5h

Related #52616 #56026 

**Special notes for your reviewer**:

**Release note**:

```release-note
None
```
/cc @vishh @derekwaynecarr @jiayingz @RenaudWasTaken @lichuqiang @ScorpioCPH @tengqm @mindprince @ConnorDoyle @jeremyeder
2017-12-27 14:53:21 -08:00
vikaschoudhary16 5d10dcd983 Remove redundant sleep from ReRegistration unit test case 2017-12-27 03:02:21 -05:00
Kubernetes Submit Queue e67294105a
Merge pull request #57274 from vikaschoudhary16/reviewr
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add vikaschoudhary16 as reviewer in pkg/kubelet/cm/deviceplugin

**What this PR does / why we need it**:
Add github user vikaschoudhary16 (me) to the reviewers list for pkg/kubelet/cm/deviceplugin

**Special notes for your reviewer**:
I would like to help with the review load in this package.

```release-note
None
```
/sig node
/cc @vishh @jiayingz @derekwaynecarr @mindprince @RenaudWasTaken @ConnorDoyle
2017-12-25 08:43:10 -08:00
Kubernetes Submit Queue 7dd82519da
Merge pull request #57369 from vikaschoudhary16/revert-to-limits
Automatic merge from submit-queue (batch tested with PRs 57591, 57369). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Revert back #57278

**What this PR does / why we need it**:
This PR reverts back to behavior of scanning Limits.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Related #
#57276
#57170
**Special notes for your reviewer**:

**Release note**:

```release-note
None
```
/sig node

/cc @vishh @ConnorDoyle @jiayingz
2017-12-24 23:37:37 -08:00
Kubernetes Submit Queue 92e1028ac7
Merge pull request #57591 from vikaschoudhary16/fix-race
Automatic merge from submit-queue (batch tested with PRs 57591, 57369). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix a race in the endpoint.go

**What this PR does / why we need it**:
This PR fixes a race in the endpoint.go

Fixes #56026


-->
```release-note
None
```

/sig node
/cc @RenaudWasTaken @ConnorDoyle @jiayingz @mindprince @ScorpioCPH @resouer @tengqm @vishh
2017-12-24 23:37:34 -08:00
Jeff Grafton efee0704c6 Autogenerate BUILD files 2017-12-23 13:12:11 -08:00
vikaschoudhary16 cc4d2cbe9d Fix a race in the endpoint.go 2017-12-23 03:02:33 -05:00
vikaschoudhary16 8749c5c989 Revert back #57278 2017-12-22 18:55:53 -05:00
Kubernetes Submit Queue eddb00e7c6
Merge pull request #57247 from dixudx/cm_return_err
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

cpumanager: Propagate error up instead panic

**What this PR does / why we need it**:

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #57239

**Special notes for your reviewer**:
/assign @sjenning 
**Release note**:

```release-note
None
```
2017-12-18 10:18:09 -08:00
Kubernetes Submit Queue 0a55f4105c
Merge pull request #57278 from vikaschoudhary16/limit
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix device manager to scan resources.Requests

**What this PR does / why we need it**:
This PR makes device manager to scan resources.Requests from the container spec. Currently
it scans resources.Limits. For extended resources, it is not mandatory for resources.Limits to be present in the container spec and if Limits are present, validation logic ensures that Limits will always be equal to Requests. 

Fixes #57276 

**Special notes for your reviewer**:

**Release note**:

```release-note
None
```
/sig node

/cc @ConnorDoyle @vishh @jiayingz @RenaudWasTaken @tengqm @resouer @mindprince
2017-12-17 23:43:59 -08:00
Di Xu d474b86e05 Propagate error up instead panic 2017-12-18 14:05:06 +08:00
vikaschoudhary16 bf1fb46347 Look for requested resources in the Requests 2017-12-17 22:56:45 -05:00
tianshapjq 7a43f736c4 correct the annotations in container_manager.go 2017-12-18 09:01:36 +08:00
vikaschoudhary16 8c51d235d6 Refactor TestPodContainerDeviceAllocation to make it readable and extensible 2017-12-16 20:32:08 -05:00
Kubernetes Submit Queue a902959544
Merge pull request #56911 from WanLinghao/projected_test_fix
Automatic merge from submit-queue (batch tested with PRs 56650, 55813, 56911, 56921, 56871). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

fix deviceplugin test file create leak file problem

When execute make test, this test file will create a file named "kubelet_internal_checkpoint" in k8s directory and not delete it.

This patch fix this error




**What this PR does / why we need it**:

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #56365

**Special notes for your reviewer**:

**Release note**:

```release-note

```
2017-12-16 12:10:48 -08:00
Kubernetes Submit Queue 8415e0c608
Merge pull request #56661 from xiangpengzhao/move-kubelet-constants
Automatic merge from submit-queue (batch tested with PRs 56410, 56707, 56661, 54998, 56722). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Move some kubelet constants to a common place

**What this PR does / why we need it**:
More context, see: https://github.com/kubernetes/kubernetes/issues/56516
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #56516
[thanks @ixdy for verifying this!]

**Special notes for your reviewer**:
@ixdy how can I verify #56516 against this locally?

/cc @ixdy @mtaufen 

**Release note**:

```release-note
NONE
```
2017-12-16 05:46:35 -08:00
vikaschoudhary16 a71d1680d4 Add vikaschoudhary16 as reviewer in pkg/kubelet/cm/deviceplugin 2017-12-16 08:19:36 -05:00
Kubernetes Submit Queue fa0a1a3d7a
Merge pull request #56337 from mindprince/container-manager-cleanup
Automatic merge from submit-queue (batch tested with PRs 56337, 56546, 56550, 56633, 56635). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Remove redundant code in container manager.

- Reuse stub implementations from unsupported implementations.
- Delete test file that didn't contain any tests.

**Release note**:
```release-note
NONE
```

/kind cleanup
/sig node
2017-12-16 01:53:42 -08:00
Kubernetes Submit Queue 578f3db8d5
Merge pull request #55382 from vikaschoudhary16/checkpoint
Automatic merge from submit-queue (batch tested with PRs 57172, 55382, 56147, 56146, 56158). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Use file store utility for device plugin checkpointing

Partially address issue #54088
cc @sjenning @jeremyeder @jiayingz @vishh 

/sig node
2017-12-14 12:38:13 -08:00
Kubernetes Submit Queue 7908e96539
Merge pull request #56191 from ConnorDoyle/cpu-manager-panic-state-init-error
Automatic merge from submit-queue (batch tested with PRs 54410, 56184, 56199, 56191, 56231). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

CPU Manager panics on state initialization error.

**What this PR does / why we need it**:

- CPU Manager panics on state initialization error.
- Update unit tests accordingly.
- Minor related cleanup in `state_file.go`.

**Special notes for your reviewer**:

**Release note**:
```release-note
NONE
```

/kind bug
/sig node
/priority important-soon
Blocks #52031
/assign @balajismaniam 
cc @flyingcougar
2017-12-14 05:33:17 -08:00
Kubernetes Submit Queue e9a9da8aa3
Merge pull request #54410 from intelsdi-x/cpu-reconcile-state
Automatic merge from submit-queue (batch tested with PRs 54410, 56184, 56199, 56191, 56231). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Cpu manager reconcile loop - restore state

**What this PR does / why we need it**:
Cpu manager reconcile loop can add orphaned containers to `State` calling `policy.AddContainer()`
Previous PR: #54409 
e2e tests PR: #53378

Blocked by #56191
2017-12-14 05:33:08 -08:00
Kubernetes Submit Queue 65a7ecf147
Merge pull request #57045 from ConnorDoyle/add-connor-containermanager-owners
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add ConnorDoyle as approver in /pkg/kubelet/cm.

**What this PR does / why we need it**:
- Add github user `ConnorDoyle` (me) to the approvers list for `pkg/kubelet/cm`.

**Special notes for your reviewer**:
I would like to help with the review load in this package. I believe I have demonstrated good stewardship of sub-packages in this part of the code base.

```release-note
NONE
```

/sig node
/kind cleanup
/assign @derekwaynecarr
2017-12-13 19:32:53 -08:00
WanLinghao 3e7e4ab397 old test file will create a leak file in current directory.
this patch fix this.
	modified:   pkg/kubelet/cm/deviceplugin/manager_test.go
2017-12-07 11:57:17 +08:00
Connor Doyle 4207b4fd2c Add ConnorDoyle as approver in /pkg/kubelet/cm. 2017-12-06 09:05:59 -06:00
Jiaying Zhang d4244f3ded Re-uses device plugin resources allocated to init containers.
Implements option 2 mentioned in
https://github.com/kubernetes/kubernetes/issues/56022#issuecomment-348286184
2017-12-04 22:01:28 -08:00
xiangpengzhao 8048823d0e Auto generated BUILD files. 2017-12-01 11:24:41 +08:00
xiangpengzhao 1f2262e6b0 Move some kubelet constants to a common place. 2017-12-01 11:24:04 +08:00
tianshapjq 0cc6a4d937 new testcase to cgroup_manager_linux.go 2017-11-30 14:14:59 +08:00
Szymon Scharmach 552e4d3a9d Cpu manager reconclie loop can restore state 2017-11-27 11:22:21 +01:00
vikaschoudhary16 de358fb21f Use file store utility for device plugin check-pointing 2017-11-24 08:41:11 -05:00