This commit addresses the issue described here
https://github.com/kubernetes-incubator/cri-containerd/issues/341
The changes include using cadvisor stats in addition to CRI stats
for CRI runtimes. As described in the issue above , the CRI stats
currently doesnt provide all the necessary stats for the kubelet.
This commit addreses the need to extract stats from cadvisor which
is not available as CRI stats.
Signed-off-by: abhi <abhi@docker.com>
Automatic merge from submit-queue (batch tested with PRs 55952, 49112, 55450, 56178, 56151). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix the wrong localhost seccomp path of CRI
**What this PR does / why we need it**:
Fix the wrong seccomp path comment.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#55359
**Special notes for your reviewer**:
**Release note**:
```release-note
Fix CRI localhost seccomp path in format localhost//profileRoot/profileName.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Expose single annotation/label via downward API
**What this PR does / why we need it**:
https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/annotations-downward-api.md
Support exposing single annotation via both env and volume downward API using the following syntax:
```
metadata.annotations['key']
metadata.labels['key']
```
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
#31218
**Special notes for your reviewer**:
This PR takes over the work in https://github.com/kubernetes/kubernetes/pull/41648.
**Release note**:
```
A single value in metadata.annotations/metadata.labels can be passed into the containers via Downward API
```
/assign @thockin @vishh
This PR adds the pod-level metrics for CPU and memory stats. cAdvisor
can get all pod cgroup information so we can add this pod-level CPU and
memory stats information from the corresponding pod cgroup
Automatic merge from submit-queue (batch tested with PRs 56115, 55143, 56179). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Use GetVersion() API instead of ver command
**What this PR does / why we need it**:
Should use GetVersion vs Shelling out to ver.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#55083
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Block volumes Support: CRI, volumemanager and operationexecutor changes
**What this PR does / why we need it**:
This PR contains following items to enable block volumes support feature.
- container runtime interface change
- volumemanager changes
- operationexecuto changes
**Which issue this PR fixes**:
Based on this proposal (kubernetes/community#805) and this feature issue: kubernetes/features#351
**Special notes for your reviewer**:
There are another PRs related to this functionality.
(#50457) API Change
(#53385) VolumeMode PV-PVC Binding change
(#51494) Container runtime interface change, volumemanager changes, operationexecutor changes
(#55112) Block volume: Command line printer update
Plugins
(#51493) Block volumes Support: FC plugin update
(#54752) Block volumes Support: iSCSI plugin update
**Release note**:
```
Adds alpha support for block volume, which allows the users to attach raw block volume to their pod without filesystem on top of the volume.
```
/cc @msau42 @liggitt @jsafrane @saad-ali @erinboyd @screeley44
Automatic merge from submit-queue (batch tested with PRs 55340, 55329, 56168, 56170, 56105). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Adds device plugin allocation latency metric.
For #53497
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 55812, 55752, 55447, 55848, 50984). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Initial basic bootstrap-checkpoint support
**What this PR does / why we need it**:
Adds initial support for Pod checkpointing to allow for controlled recovery of the control plane during self host failure conditions.
fixes#49236
xref https://github.com/kubernetes/features/issues/378
**Special notes for your reviewer**:
Proposal is here: https://docs.google.com/document/d/1hhrCa_nv0Sg4O_zJYOnelE8a5ClieyewEsQM6c7-5-o/edit?ts=5988fba8#
1. Controlled tests work, but I have not tested the self hosted api-server recovery, that requires validation and logs. /cc @luxas
2. In adding hooks for checkpoint manager much of the tests around basicpodmanager appears to be stub'd. This has become an anti-pattern in the code and should be avoided.
3. I need a node-e2e to ensure consistency of behavior.
**Release note**:
```
Add basic bootstrap checkpointing support to the kubelet for control plane recovery
```
/cc @kubernetes/sig-cluster-lifecycle-misc @kubernetes/sig-node-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 55812, 55752, 55447, 55848, 50984). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add Custom Pod DNS to Kubernetes API
**What this PR does / why we need it**:
Ref:
- Feature issue: https://github.com/kubernetes/features/issues/504
- Proposal: https://github.com/kubernetes/community/pull/1276
This PR adds the relevant APIs, validation check and the underlying kubelet changes.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #NONE
**Special notes for your reviewer**:
/sig network
@kubernetes/sig-network-api-reviews
/assign @bowei @thockin
**Release note**:
```release-note
Add DNSConfig field to PodSpec and support "None" mode for DNSPolicy (Alpha).
```
Automatic merge from submit-queue (batch tested with PRs 55812, 55752, 55447, 55848, 50984). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add Pod-level local ephemeral storage metric in Summary API
This PR adds pod-level ephemeral storage metric into Summary API.
Pod-level ephemeral storage usage is the sum of all containers and local
ephemeral volume including EmptyDir (if not backed up by memory or
hugepages), configueMap, and downwardAPI.
Address issue #55978
**Release note**:
```release-note
Add pod-level local ephemeral storage metric in Summary API. Pod-level ephemeral storage reports the total filesystem usage for the containers and emptyDir volumes in the measured Pod.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Take disk requests into account during evictions
fixes#54314
This PR is part of the local storage feature, and it makes the eviction manager take disk requests into account during disk evictions.
This uses the same eviction strategy as we do for memory.
Disk requests are only considered when the LocalStorageCapacityIsolation feature gate is enabled. This is enforced by adding a check for the feature gate in getRequests().
I have added unit testing to ensure that previous behavior is preserved when the feature gate is disabled.
Most of the changes are testing. Reviewers should focus on changes in **eviction/helpers.go**
/sig node
/assign @jingxu97 @vishh
Automatic merge from submit-queue (batch tested with PRs 56021, 55843, 55088, 56117, 55859). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Extends deviceplugin to gracefully handle full device plugin lifecycle.
**What this PR does / why we need it**:
- Instead of using cm.capacity field to communicate device plugin resource capacity,
this PR changes to use an explicit cm.GetDevicePluginResourceCapacity() function
that returns device plugin resource capacity as well as any inactive device plugin resource.
Kubelet syncNodeStatus call this function during its periodic run to update node status
capacity and allocatable. After this call, device plugin can remove the inactive device
plugin resource from its allDevices field as the update is already pushed to API server.
- Extends device plugin checkpoint data to record registered resources
so that we can finish resource removing even upon kubelet restarts.
- Passes sourcesReady from kubelet to device plugin to avoid removing
inactive pods during grace period of kubelet restart.
- Extends gpu_device_plugin e2e_node test to verify that scheduled pods
can continue to run even after device plugin deletion and kubelet
restarts.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Together with https://github.com/kubernetes/kubernetes/pull/54488, fixes https://github.com/kubernetes/kubernetes/issues/53395
**Special notes for your reviewer**:
**Release note**:
```release-note
Extends deviceplugin to gracefully handle full device plugin lifecycle.
```
Automatic merge from submit-queue (batch tested with PRs 55938, 56055, 53385, 55796, 55922). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add partial CRI container log support.
For https://github.com/kubernetes/kubernetes/issues/44976.
New CRI log format:
```
TIMESTAMP STREAM TAG CONTENT
2016-10-06T00:17:09.669794202Z stdout P log content 1
2016-10-06T00:17:09.669794203Z stdout P log content 2
```
Although unlikely, if in the future we need more metadata in each line, we could extend TAG into multiple tags splitted by `:`.
@yujuhong @feiskyer @crassirostris @mrunalp @abhi @mikebrow
/cc @kubernetes/sig-node-api-reviews @kubernetes/sig-instrumentation-api-reviews
**Release note**:
```release-note
A new field is added to CRI container log format to support splitting a long log line into multiple lines.
```
- Instead of using cm.capacity field to communicate device plugin resource
capacity, this PR changes to use an explicit cm.GetDevicePluginResourceCapacity()
function that returns device plugin resource capacity as well as any inactive
device plugin resource. Kubelet syncNodeStatus call this function during its
periodic run to update node status capacity and allocatable. After this call,
device plugin can remove the inactive device plugin resource from its allDevices
field as the update is already pushed to API server.
- Extends device plugin checkpoint data to record registered resources
so that we can finish resource removing even upon kubelet restarts.
- Passes sourcesReady from kubelet to device plugin to avoid removing
inactive pods during grace period of kubelet restart.
Automatic merge from submit-queue (batch tested with PRs 54824, 55911, 55730, 55979, 55961). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add kubeletconfig round trip test
I noticed we were missing one of these.
fixes#55959
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 54824, 55911, 55730, 55979, 55961). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Device plugin API merge of handler and manager
**What this PR does / why we need it**: We are trying different approaches to make the device plugin implementation simpler and more robust. One option is to merge the notion of the `device_plugin_handler` into the `device_manager`. This is for several reasons:
1) Some calls go directly from handler to manager, adding little value.
2) The separation of concern is not clear between the two components. They have a 1:1 relationship.
3) The separation and abstractions needed are at a different level. Code that can be refactored will most likely live in abstractions which hide details around lock acquisition and check pointing.
In this PR, we will **just** merge the two interfaces. After this, there is several opportunities for simplifying and cleaning up the device plugin.
Fixes#55180
**Special notes for your reviewer**: This is a WIP. May very well get dropped, but keeping up for the sake of early sharing and showing the progress of the code move.
**Release note**:
```release-note
NONE
```
This PR adds pod-level ephemeral storage metric into Summary API.
Pod-level ephemeral storage usage is the sum of all containers and local
ephemeral volume including EmptyDir (if not backed up by memory or
hugepages), configueMap, and downwardAPI.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fix cadvisor.New signature for cross build
**What this PR does / why we need it**: fixes the `pkg/kubelet/cadvisor.New` signature on non-linux platforms to match the new one on linux. This should fix the cross build
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#56002
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
/sig release
Automatic merge from submit-queue (batch tested with PRs 55839, 54495, 55884, 55983, 56069). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
seccomp is an alpha feature and not feature gated
Move SeccompProfileRoot to KubeletFlags and document flag as alpha.
wrt https://github.com/kubernetes/kubernetes/pull/53833#issuecomment-345396575, seccomp is an alpha feature, but this isn't clearly documented anywhere (the annotation just has the word "alpha" in it, and that's your signal that it's alpha).
Since seccomp was around before feature gates, it doesn't have one.
Thus SeccompProfileRoot should not be part of KubeletConfiguration, and this PR moves it to KubeletFlags, and amends the help text to note the alpha state of the feature.
fixes: #56087
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 55839, 54495, 55884, 55983, 56069). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
deviceplugin: fix race when multiple plugins are registered
**What this PR does / why we need it**:
When registering multiple device plugins to Kubelet concurrently, there exists a race that crashes the Kubelet.
Consider two plugins: D1 and D2. The call order method is roughly
D1 -> manager.go:register -> endpoint.go:listAndWatch -> device_plugin_handler.go:(*D1).callback
D2 -> manager.go:register -> endpoint.go:listAndWatch -> device_plugin_handler.go:(*D2).callback
The callback function accesses HandlerImpl's allDevices map that maps (resourceName -> DeviceID). If both plugins reach these accesses at the same time, Kubelet crashes with "fatal error: concurrent map read and map write".
This can be solved by making sure handler is locked when allDevices are being updated. The functionality is needed to avoid Kubelet crashes when multiple device plugins are trying to register with Kubelet at the same moment. Occurs frequently when single binary tries to register itself as multiple plugins.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 55841, 55948, 55945). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
CPU Manager: file state for all policies
**What this PR does / why we need it**:
Before this change, the new file-backed state was only enabled for the static CPU manager policy. This patch enables persistent state for all policies.
This PR fixes#55736 and the potential CPU resource leak described in that issue.
**Release note**:
```release-note
NONE
```
/kind bug
/sig node
/assign @balajismaniam
Automatic merge from submit-queue (batch tested with PRs 54837, 55970, 55912, 55898, 52977). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Improve kubelet cgroup
**What this PR does / why we need it**:
1.Use arg cgroupRoot,not nodeConfig.CgroupRoot
Using both arg cgroupRoot and nodeConfig.CgroupRoot is confused in function NewQOSContainerManager
2.improve cgroupmanager in qosContainerManager
3. improve arg "cgroupRoot" type in NewQOSContainerManager
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 50457, 55558, 53483, 55731, 52842). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
improve the logic setting cgroupparent in RunPodSandbox
Signed-off-by: yanxuean <yan.xuean@zte.com.cn>
**What this PR does / why we need it**:
The setting of cgroupparent is too confused!
The old logic is:
1. set CgroupParent correctly
2. reset CgroupParent incorrectly
3. set CgroupParent again (refer to #42055 )
The login is too confused, and It is sure that there are many people who drop in trap.
We only need to set it in one place.
kubernetes/pkg/kubelet/dockershim/docker_sandbox.go
```
func (ds *dockerService) makeSandboxDockerConfig(c *runtimeapi.PodSandboxConfig, image string) (*dockertypes.ContainerCreateConfig, error) {
....
// Apply linux-specific options.
if lc := c.GetLinux(); lc != nil {
if err := ds.applySandboxLinuxOptions(hc, lc, createConfig, image, securityOptSep); err != nil {
return nil, err
}
}
// Apply resource options.
setSandboxResources(hc) **<-- reset the CgroupParent incorrectly**
// Apply cgroupsParent derived from the sandbox config.
if lc := c.GetLinux(); lc != nil {
// Apply Cgroup options.
cgroupParent, err := ds.GenerateExpectedCgroupParent(lc.CgroupParent)
if err != nil {
return nil, fmt.Errorf("failed to generate cgroup parent in expected syntax for container %q: %v", c.Metadata.Name, err)
}
hc.CgroupParent = cgroupParent
}
```
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 55642, 55897, 55835, 55496, 55313). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix issue #55031 to remove dependence on package lxn/win
**What this PR does / why we need it**:
This PR fixes issue #55031 where kubelet.exe crashes on Windows Server Core. The root cause is that kubelet.exe depends on package lxn/win pdh and kernel32 wrapper for node metrics. However, opengl32.dll is not available in Server Core and lxn/win requires the presence of all win32 DLLs.
This PR uses a slim win32 package JeffAshton/win_pdh since most win32 APIs needed are PDH API. Also this PR makes own implementation of GetPhysicallyInstalledSystemMemory until golang Windows syscall has it or lxn/win fixes opengl32 issue. Also this PR modifies the way to get Windows version.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#55031
**Special notes for your reviewer**:
**Release note**:
```release-note
```
/sig windows
/sig node
Automatic merge from submit-queue (batch tested with PRs 55642, 55897, 55835, 55496, 55313). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Disable container disk metrics when using the CRI stats integration
Issue: https://github.com/kubernetes/kubernetes/issues/51798
As explained in the issue, runtimes which make use of the CRI Stats API still have the performance overhead of collecting those same stats through cAdvisor.
The CRI Stats API has metrics for CPU, Memory, and Disk. This PR significantly reduces the added overhead due to collecting these stats in both cAdvisor and in the runtime.
This PR disables container disk metrics, which are very expensive to collect.
This PR does not disable node-level disk stats, as the "Raw" container handler does not currently respect ignoring DiskUsageMetrics.
This PR factors out the logic for determining whether or not to use the CRI stats provider into a helper function, as cAdvisor is instantiated before it is passed to the kubelet as a dependency.
cc @kubernetes/sig-node-pr-reviews @derekwaynecarr
/kind feature
/sig node
/assign @Random-Liu @derekwaynecarr
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Filter out duplicated container stats
**What this PR does / why we need it**:
**Which issue this PR fixes** *
fixes#53514
**Special notes for your reviewer**:
/cc @Random-Liu
Signed-off-by: Yanqiang Miao <miao.yanqiang@zte.com.cn>