Commit Graph

121 Commits (8592098e607e7e8f8a595219ee50cc69801b2cbe)

Author SHA1 Message Date
Robert Krawitz 2597a1d97e Implement SupportNodePidsLimit, hand-tested 2019-02-13 14:56:17 -05:00
Andrew Kim 84191eb99b replace pkg/util/file with k8s.io/utils/path 2019-01-29 15:20:13 -05:00
yuexiao-wang f3353c358d [scheduler cleanup phase 2]: Rename to
Signed-off-by: yuexiao-wang <wang.yuexiao@zte.com.cn>
2018-12-11 11:21:12 +08:00
David Ashpole 630cb53f82 add kubelet grpc server for pod-resources service 2018-11-15 09:43:20 -08:00
Davanum Srinivas 954996e231
Move from glog to klog
- Move from the old github.com/golang/glog to k8s.io/klog
- klog as explicit InitFlags() so we add them as necessary
- we update the other repositories that we vendor that made a similar
change from glog to klog
  * github.com/kubernetes/repo-infra
  * k8s.io/gengo/
  * k8s.io/kube-openapi/
  * github.com/google/cadvisor
- Entirely remove all references to glog
- Fix some tests by explicit InitFlags in their init() methods

Change-Id: I92db545ff36fcec83afe98f550c9e630098b3135
2018-11-10 07:50:31 -05:00
Benjamin Elder 8b56eb8588 hack/update-gofmt.sh 2018-09-24 12:21:29 -07:00
Benjamin Elder 088cf3c37b find & replace version import 2018-09-24 12:03:24 -07:00
Renaud Gaubert 8dd1d27c03 Updated the device manager pluginwatcher handler 2018-09-06 15:34:46 +02:00
Sandor Szücs 588d2808b7
fix #51135 make CFS quota period configurable, adds a cli flag and config option to kubelet to be able to set cpu.cfs_period and defaults to 100ms as before.
It requires to enable feature gate CustomCPUCFSQuotaPeriod.

Signed-off-by: Sandor Szücs <sandor.szuecs@zalando.de>
2018-09-01 20:19:59 +02:00
Kubernetes Submit Queue c2536e2b0d
Merge pull request #61159 from linyouchong/linyouchong-20180314
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Skip checking when failSwapOn=false

**What this PR does / why we need it**:
Skip checking when failSwapOn=false

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:
NONE
**Release note**:
```
NONE
```
2018-08-02 14:09:39 -07:00
vikaschoudhary16 a5842503eb Use probe based plugin discovery mechanism in device manager 2018-07-17 04:02:31 -04:00
Seth Jennings 3234b0fa5b feature gate LSI capacity calculation 2018-06-28 14:01:08 -05:00
Guoliang Wang 761cf41427 Move pkg/scheduler/schedulercache -> pkg/scheduler/cache 2018-05-31 22:55:34 +08:00
Kubernetes Submit Queue 1929e0d86d
Merge pull request #63298 from dims/kubelet-remove-unused-code
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

kubelet - Remove unused code

**What this PR does / why we need it**:

Looks like we have a bunch of unused methods. Let's clean them up

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2018-05-04 04:20:06 -07:00
Filipe Brandenburger b230fb8ac4 Use a []string for CgroupName, which is a more accurate internal representation
The slice of strings more precisely captures the hierarchic nature of
the cgroup paths we use to represent pods and their groupings.

It also ensures we're reducing the chances of passing an incorrect path
format to a cgroup driver that requires a different path naming, since
now explicit conversions are always needed.

The new constructor NewCgroupName starts from an existing CgroupName,
which enforces a hierarchy where a root is always needed. It also
performs checking on the component names to ensure invalid characters
("/" and "_") are not in use.

A RootCgroupName for the top of the cgroup hierarchy tree is introduced.

This refactor results in a net reduction of around 30 lines of code,
mainly with the demise of ConvertCgroupNameToSystemd which had fairly
complicated logic in it and was doing just too many things.

There's a small TODO in a helper updateSystemdCgroupInfo that was
introduced to make this commit possible. That logic really belongs in
libcontainer, I'm planning to send a PR there to include it there.
(The API already takes a field with that information, only that field is
only processed in cgroupfs and not systemd driver, we should fix that.)

Tested by running the e2e-node tests on both Ubuntu 16.04 (with cgroupfs
driver) and CentOS 7 (with systemd driver.)
2018-05-01 08:29:06 -07:00
Davanum Srinivas 4bacd77321 Remove unused code 2018-04-30 14:57:26 -04:00
Derek Carr f68f3ff783 Fix cpu cfs quota flag with pod cgroups 2018-03-16 15:27:11 -04:00
linyouchong 32c265f60c Skip checking when failSwapOn=false 2018-03-14 14:36:30 +08:00
vikaschoudhary16 defcab81d5 Invoke PreStart RPC call before container start, if desired by plugin
Signed-off-by: vikaschoudhary16 <vichoudh@redhat.com>
2018-02-21 01:25:24 -05:00
David Ashpole 960856f4e8 collect metrics on the /kubepods cgroup on-demand 2018-02-17 12:32:40 -08:00
David Ashpole b259543985 collect ephemeral storage capacity on initialization 2018-02-15 17:33:22 -08:00
Kubernetes Submit Queue c02b784b76
Merge pull request #58172 from NVIDIA/annotations
Automatic merge from submit-queue (batch tested with PRs 58184, 59307, 58172). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add annotations to the device plugin API

**What this PR does / why we need it**:

**Which issue(s) this PR fixes** : Related to #56649 but does not fix it

This adds the ability for the device plugins to annotate containers.
Product wise, this allows the NVIDIA device plugin to support CRI-O (which allows hooks through container annotations).

**Special notes for your reviewer**:
/area hw-accelerators
/cc @vishh @jiayingz @vikaschoudhary16 

I'm wondering if it would make sense to fire a blank call to `newContainerAnnotations` at the start of the deviceplugin to get Annotations that are forbidden.
Current behavior is that any Annotations that conflicts with Kubelet will be overwritten by Kubelet.

**Release note**:
```release-note
NONE
```
2018-02-05 13:50:35 -08:00
Renaud Gaubert db537e5954 Add Annotations from the deviceplugin to the runtime 2018-02-03 19:53:20 +01:00
Kubernetes Submit Queue bf111161b7
Merge pull request #57973 from dims/set-pids-limit-at-pod-level
Automatic merge from submit-queue (batch tested with PRs 57973, 57990). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Set pids limit at pod level

**What this PR does / why we need it**:

Add a new Alpha Feature to set a maximum number of pids per Pod.
This is to allow the use case where cluster administrators wish
to limit the pids consumed per pod (example when running a CI system).

By default, we do not set any maximum limit, If an administrator wants
to enable this, they should enable `SupportPodPidsLimit=true` in the
`--feature-gates=` parameter to kubelet and specify the limit using the
`--pod-max-pids` parameter.

The limit set is the total count of all processes running in all
containers in the pod.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #43783

**Special notes for your reviewer**:

**Release note**:

```release-note
New alpha feature to limit the number of processes running in a pod. Cluster administrators will be able to place limits by using the new kubelet command line parameter --pod-max-pids. Note that since this is a alpha feature they will need to enable the "SupportPodPidsLimit" feature.
```
2018-01-25 18:29:31 -08:00
Connor Doyle e5667cf426 Rename package deviceplugin => devicemanager. 2018-01-24 22:32:43 -08:00
Kubernetes Submit Queue f2e46a2147
Merge pull request #57266 from vikaschoudhary16/unhealthy_device
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Handle Unhealthy devices

Update node capacity with sum of both healthy and unhealthy devices.
Node allocatable reflect only healthy devices.



**What this PR does / why we need it**:
Currently node capacity only reflects healthy devices. Unhealthy devices are ignored totally while updating node status. This PR handles unhealthy devices while updating node status. 

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #57241

**Special notes for your reviewer**:

**Release note**:
<!--  Write your release note:
Handle Unhealthy devices

```release-note
Handle Unhealthy devices
```
/cc @tengqm @ConnorDoyle @jiayingz @vishh @jeremyeder @sjenning @resouer @ScorpioCPH @lichuqiang @RenaudWasTaken @balajismaniam 

/sig node
2018-01-12 19:55:54 -08:00
Davanum Srinivas ecd6361ff0 Set pids limit at pod level
Add a new Alpha Feature to set a maximum number of pids per Pod.
This is to allow the use case where cluster administrators wish
to limit the pids consumed per pod (example when running a CI system).

By default, we do not set any maximum limit, If an administrator wants
to enable this, they should enable `SupportPodPidsLimit=true` in the
`--feature-gates=` parameter to kubelet and specify the limit using the
`--pod-max-pids` parameter.

The limit set is the total count of all processes running in all
containers in the pod.
2018-01-11 21:22:38 -05:00
vikaschoudhary16 e9cf3f1ac4 Handle Unhealthy devices
Update node capacity with sum of both healthy and unhealthy devices.
Node allocatable reflect only healthy devices.
2018-01-09 11:38:48 -05:00
Jonathan Basseri 30b89d830b Move scheduler code out of plugin directory.
This moves plugin/pkg/scheduler to pkg/scheduler and
plugin/cmd/kube-scheduler to cmd/kube-scheduler.

Bulk of the work was done with gomvpkg, except for kube-scheduler main
package.
2018-01-05 15:05:01 -08:00
Jiaying Zhang 1eb4e79453 Extends deviceplugin to gracefully handle full device plugin lifecycle.
- Instead of using cm.capacity field to communicate device plugin resource
capacity, this PR changes to use an explicit cm.GetDevicePluginResourceCapacity()
function that returns device plugin resource capacity as well as any inactive
device plugin resource. Kubelet syncNodeStatus call this function during its
periodic run to update node status capacity and allocatable. After this call,
device plugin can remove the inactive device plugin resource from its allDevices
field as the update is already pushed to API server.
- Extends device plugin checkpoint data to record registered resources
so that we can finish resource removing even upon kubelet restarts.
- Passes sourcesReady from kubelet to device plugin to avoid removing
inactive pods during grace period of kubelet restart.
2017-11-20 23:40:14 -08:00
Niklas Q. Nielsen b16bfc768d Merging handler into manager API 2017-11-20 21:37:46 +00:00
Kubernetes Submit Queue c60b35bcd3
Merge pull request #52977 from yanxuean/improvecgroup
Automatic merge from submit-queue (batch tested with PRs 54837, 55970, 55912, 55898, 52977). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Improve kubelet cgroup

**What this PR does / why we need it**:
1.Use arg cgroupRoot,not nodeConfig.CgroupRoot
    Using both arg cgroupRoot and nodeConfig.CgroupRoot is confused in function NewQOSContainerManager
2.improve cgroupmanager in qosContainerManager
3. improve arg "cgroupRoot" type in NewQOSContainerManager

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
```
2017-11-18 13:13:28 -08:00
Szymon Scharmach 7e7301ffaf Enable file state in static policy 2017-11-14 18:25:58 +01:00
lichuqiang ebd445eb8c add admission handler for device resources allocation 2017-11-02 09:17:48 +08:00
Jiaying Zhang ff4e8d429e Device plugin code refactoring to cope with file move.
While moving device_plugin_handler_test.go from pkg/kubelet/cm/ to
pkg/kubelet/cm/deviceplugin/, we can no longer uses cm in its tests
because that would cause a cycle dependency. To solve this problem,
I moved the main cm GetResources functionality as well as part of the
current device plugin handler Allocate functionality into a new device
plugin handler function, GetDeviceRunContainerOptions(). This
refactoring is also needed by another PR 51895 that moves device
allocation into admission phase. Now device plugin handler Allocate()
first checks whether there is cached device runtime state and only
issues Allocate grpc call if there is no cached state available.
The new GetDeviceRunContainerOptions() function simply returns device
runtime config from the cached state. To support this change, extended the
podDevices struct and checkpoint data structure with device runtime state.
2017-10-24 14:38:15 -07:00
Vishnu kannan 18eee1eaa0 Make AllocateResponse artifacts global across all devices per container in device plugin API
There is no use case known for passing artifacts per device as it currently exists. The current API is also
complex to use for simple clients. Hence this PR creates a flat namespace where artifacts like environment variables
and mount points apply globally to all devices returned as part of AllocateResponse proto.

Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-10-19 10:34:00 -07:00
Kubernetes Submit Queue 1d8f1e268f Merge pull request #47699 from supereagle/fix-typos
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

fix typos: remove duplicated word in comments

**What this PR does / why we need it**: Remove the duplicated word `the` in comments

**Which issue this PR fixes** : fixes #

**Special notes for your reviewer**:

```release-note
NONE
```
2017-10-17 02:35:52 -07:00
yanxuean 5d5fee8cab capitalize the first letter
capitalize the first letter for the field comment of containerManagerImpl

Signed-off-by: yanxuean <yan.xuean@zte.com.cn>
2017-10-13 14:54:06 +08:00
Euan Kemp 7aa88b5103 kubelet/cm: remove unneeded fork of 'cat'
Reading a file in Go is perfectly possible without invoking cat.

I also removed an outdated comment.
2017-10-10 21:53:35 -07:00
yanxuean f011c044d4 improve cgroupmanager in qosContainerManager
improve arg "cgroupRoot" type in NewQOSContainerManager

Signed-off-by: yanxuean <yan.xuean@zte.com.cn>
2017-09-25 16:59:15 +08:00
supereagle 87c29a08e1 fix typos: remove duplicated word in comments 2017-09-16 14:38:10 +08:00
Connor Doyle 81ccd396d7 Fixed nil InternalContainerLifecycle in cm stubs. 2017-09-04 07:24:59 -07:00
Connor Doyle ec706216e6 Un-revert "CPU manager wiring and `none` policy"
This reverts commit 8d2832021a.
2017-09-04 07:24:59 -07:00
Jiaying Zhang 02001af752 Kubelet side extension to support device allocation 2017-09-01 11:56:35 -07:00
Renaud Gaubert c4a1c97329 Device Plugin Kubelet integration 2017-09-01 11:47:09 -07:00
Shyam JVS 8d2832021a Revert "CPU manager wiring and `none` policy" 2017-09-01 18:17:36 +02:00
Connor Doyle 50674ec614 Added cpu-manager-reconcile-period config.
- Defaults to sync-frequency.
2017-08-30 23:42:32 -07:00
Connor Doyle 7c6e31617d CPU Manager initialization and lifecycle calls. 2017-08-30 08:50:41 -07:00
NickrenREN 27901ad5df Change eviction policy to manage one single local storage resource 2017-08-26 05:14:49 +08:00
NickrenREN eadb7ca8c0 Fix node allocatable resource validation
GetNodeAllocatableReservation gets all the reserved resource, and we need to compare it with capacity
2017-08-14 10:20:40 +08:00