Automatic merge from submit-queue (batch tested with PRs 54656, 54552, 54389, 53634, 54408). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add file backed state to cpu manager
**What this PR does / why we need it**:
Adds file backed `State` implementation to cpu manger with tests.
Reads from `State` are done from memory, while each write triggers state save to a file.
Any failure in reading the state file results in empty state
Next PR: #54409
Automatic merge from submit-queue (batch tested with PRs 54593, 54607, 54539, 54105). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Removed containers are not always waiting
fixes#54499
The issue was that a container that is removed (during pod deletion, for example), is assumed to be in a "waiting" state.
Instead, we should use the previous container state.
Fetching the most recent status is required to ensure that we accurately reflect the previous state. The status attached to a pod object is often stale.
I verified this by looking through the kubelet logs during a deletion, and verifying that the status updates do not transition from terminated -> pending.
cc @kubernetes/sig-node-bugs @sjenning @smarterclayton @derekwaynecarr @dchen1107
```release-note
Fix an issue where pods were briefly transitioned to a "Pending" state during the deletion process.
```
Automatic merge from submit-queue (batch tested with PRs 54597, 54593, 54081, 54271, 54600). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubelet: check for illegal container state transition
supersedes https://github.com/kubernetes/kubernetes/pull/54530
Puts a state transition check in the kubelet status manager to detect and block illegal transitions; namely from terminated to non-terminated.
@smarterclayton @derekwaynecarr @dashpole @joelsmith @frobware
I confirmed that the reproducer in #54499 does not work with this check in place. The erroneous kubelet status update is rejected:
```
status_manager.go:301] Status update on pod default/test aborted: terminated container test-container attempted illegal transition to non-terminated state
```
After fix https://github.com/kubernetes/kubernetes/pull/54593, I do not see the message with the above mentioned reproducer.
While moving device_plugin_handler_test.go from pkg/kubelet/cm/ to
pkg/kubelet/cm/deviceplugin/, we can no longer uses cm in its tests
because that would cause a cycle dependency. To solve this problem,
I moved the main cm GetResources functionality as well as part of the
current device plugin handler Allocate functionality into a new device
plugin handler function, GetDeviceRunContainerOptions(). This
refactoring is also needed by another PR 51895 that moves device
allocation into admission phase. Now device plugin handler Allocate()
first checks whether there is cached device runtime state and only
issues Allocate grpc call if there is no cached state available.
The new GetDeviceRunContainerOptions() function simply returns device
runtime config from the cached state. To support this change, extended the
podDevices struct and checkpoint data structure with device runtime state.
Automatic merge from submit-queue (batch tested with PRs 53743, 53564). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubelet: remove the --network-plugin-dir flag
**What this PR does / why we need it**:
This flag has been replaced with `--cni-bin-dir`, and has been deprecated in Kubernetes 1.7.
It is safe to remove in Kubernetes 1.9 according to the deprecation policy.
**Which issue this PR fixes**: fixes#46410
**Special notes for your reviewer**:
/assign @mtaufen @freehan @dchen1107
**Release note**:
```release-note
Remove the --network-plugin-dir flag.
```
Automatic merge from submit-queue (batch tested with PRs 52747, 54329). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Device Plugin Endpoints correctly closes client connexion
**What this PR does / why we need it**:
Endpoints in the device plugin may unexpectedly end. Currently the connexion will not be properly closed.
This commit aims to fix this
Related issues #51993
**Special notes for your reviewer**: @jiayingz @mindprince @vishh
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 52147, 54309). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Deviceplugin refactoring: cleanup some unnecessary functions
**What this PR does / why we need it**:
cleanup some of unnecessary functions of deviceplugin to improve code readability
**Which issue this PR fixes**
fixes#51993
Part1
**Special notes for your reviewer**:
Currently, it seems that func `IsResourceNameValid` is not used outside and could be changed to internal.
But as Renaud commented, it is designed for potential usage from users outside.
And I could not find a better position to place the func, except for file `utils.go`. So, shall we just retain the `utils.go` file, and maybe for possible expansion in the future?
/cc @jiayingz @RenaudWasTaken @vishh
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
avoid kubelet converts and validates pods multiple times
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#53171
**Special notes for your reviewer**:
/assign @yujuhong @caesarxuchao
**Release note**:
```release-note
None
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix dockershim panic when listing images
**What this PR does / why we need it**:
dockershim panic when listing containers because of `opts.Filters` not initialized:
505ccb88da/pkg/kubelet/dockershim/docker_image.go (L35-L39)
Also when imgSpec.Image is empty string, dockershim returns an empty image list which is not expected. (We should not set opts.Filters in this case).
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#54122
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Make AllocateResponse artifacts global across all devices per container in device plugin API
The current version of Device plugins returns artifacts (env vars, mounts, devices) per device, per container. This is not necessary and results in complex merging issues on kubelet side
There can still be a conflict if the artifacts returned by device plugin conflicts with the pod spec. In that case, I'd recommend failing pods in kubelet. This is yet to be addressed.
The go package name for device plugin apis is updated from `pkg/kubelet/apis/deviceplugin/v1alpha1` to `pkg/kubelet/apis/deviceplugin/v1alpha` (sub-version dropped) because we expect the alpha version to change until it graduates to beta and changing the go package everytime the actual alpha version changes is too tedious.
```release-note
Device plugin Alpha API no longer supports returning artifacts per device as part of AllocateResponse.
```
TODO:
- [x] Bump kubelet side API version
- [X] Post an updated device plugin image that works with the new API version
- [ ] Stabilize e2es (This PR needs to be merged since there is a dependency on the plugin side for vendoring)
There is no use case known for passing artifacts per device as it currently exists. The current API is also
complex to use for simple clients. Hence this PR creates a flat namespace where artifacts like environment variables
and mount points apply globally to all devices returned as part of AllocateResponse proto.
Signed-off-by: Vishnu kannan <vishnuk@google.com>
Automatic merge from submit-queue (batch tested with PRs 43661, 54062). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix typo in function name.
Also remove a superfluous comment.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 43661, 54062). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix#43583 (kubenet: remove code forcing bridge MAC address)
**What this PR does / why we need it**:
*kubenet: remove code forcing bridge MAC address*
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#43583
**Special notes for your reviewer**:
**Release note**:
```release-note
```
cc @dcbw @freehan
Automatic merge from submit-queue (batch tested with PRs 47717, 53896). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Delete the redundant parameter flag
What this PR does / why we need it:
Delete redundant parameter flag, otherwise the log will be show like:
Warning: path "/var/lib/kubelet/pods/3c6c4869-4d02-11e7-9685-fa163eeda0fa/volumes" does not exist: %!q(MISSING)
thank you!
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
pkg/api: extract Scheme/Registry/Codecs into pkg/api/legacyscheme
This serves as
- a preparation for the pkg/api->pkg/apis/core move
- and makes the dependency to the scheme explicit when vizualizing
left depenncies.
The later helps with our our efforts to split up the monolithic repo
into self-contained sub-repos, e.g. for kubectl, controller-manager
and kube-apiserver in the future.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
CRI: Add extra information in status functions in CRI.
Fixes https://github.com/kubernetes/kubernetes/issues/53757.
@yujuhong @feiskyer @mrunalp
/cc @kubernetes/sig-node-api-reviews
```release-note
Verbose option is added to each status function in CRI. Container runtime could return extra information in status response for debugging.
```
Automatic merge from submit-queue (batch tested with PRs 53696, 54059). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix lint warnings for useless err checks.
**What this PR does / why we need it**:
This check was recently added to golint.
**Which issue this PR fixes**
Related to #37254
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Increases test coverage for kubelet/kuberuntime
What this PR does / why we need it:
Increases test coverage for kubelet/kuberuntime
#46123
Which issue this PR fixes:
#46123
/assign @feiskyer
Automatic merge from submit-queue (batch tested with PRs 54040, 52503). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Get fallback termination msg from docker when using journald log driver
**What this PR does / why we need it**:
When using the legacy docker container runtime and when a container has `terminationMessagePolicy=FallbackToLogsOnError` and when docker is configured with a log driver other than `json-log` (such as `journald`), the kubelet should not try to get the container's log from the json log file (since it's not there) but should instead ask docker for the logs.
**Which issue this PR fixes** fixes#52502
**Special notes for your reviewer**:
**Release note**:
```release-note
Fixed log fallback termination messages when using docker with journald log driver
```
Automatic merge from submit-queue (batch tested with PRs 54040, 52503). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
falls back to parse Docker runtime version as generic if not semver
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#54039
**Special notes for your reviewer**:
/assign @tallclair @vishh
**Release note**:
```release-note
falls back to parse Docker runtime version as generic if not semver
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Do not remove kubelet labels during startup
Fixes#54070
```release-note
kubelet: prevent removal of default labels from Node API objects on startup
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Clean up kublet secret and configmap unit test
**What this PR does / why we need it**:
These changes are clean-up items to fix confusing code encountered while investigating #52043. No actual bugs are fixed here (except, maybe, correcting unit tests that had actual/expected swapped).
A summary of the changes, as listed in the commit:
* Expected value comes before actual value in assert.Equal()
* Use `assert.Equal()` instead of `assert.True()` when possible
* Add a unit test that verifies no-op pod updates to the `secret_manager` and the `configmap_manager`
* Add a clarifying comment about why it's good to seemingly delete a secret on updates.
* Fix (for now, non-buggy) variable shadowing issue
**Special notes for your reviewer**:
N/A
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fix typos: remove duplicated word in comments
**What this PR does / why we need it**: Remove the duplicated word `the` in comments
**Which issue this PR fixes** : fixes #
**Special notes for your reviewer**:
```release-note
NONE
```
Prevent a Kubelet from shutting down when the server isn't responding to
us but we cannot get a new certificate. This allows a cluster to coast
if the master is unresponsive or a node is partitioned and their client
cert expires.