Automatic merge from submit-queue (batch tested with PRs 49488, 50407, 46105, 50456, 50258)
Add UpdateContainerResources method to CRI
This is first step toward support for opinionated cpu pinning for certain guaranteed pods.
In order to do this, the kubelet needs to be able to dynamically update the cpuset at the container level, which is managed by the container runtime. Thus the kubelet needs a method to communicate over the CRI so the runtime can then modify the container cgroup.
This is used in the situation where a core is added or removed from the shared pool to become a exclusive core for a new G pod. The cpuset for all containers in the shared pool will need to be updated to add or remove that core.
Opening this up now so we can start discussion. The need for a change to the CRI might be unexpected.
@derekwaynecarr @vishh @ConnorDoyle
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50103, 49677, 49449, 43586, 48969)
Do not try to run preStopHook when the gracePeriod is 0
**What this PR does / why we need it**:
1. Sometimes when the user force deletes a POD with no gracePeriod, its possible that kubelet attempts to execute the preStopHook which will certainly fail. This PR prevents this inavitable PreStopHook failure.
```
kubectl delete --force --grace-period=0 po/<pod-name>
```
2. This also adds UT for LifeCycle Hooks
```
time go test --cover -v --run "Hook" ./pkg/kubelet/kuberuntime/
.
.
.
--- PASS: TestLifeCycleHook (0.00s)
--- PASS: TestLifeCycleHook/PreStop-CMDExec (0.00s)
--- PASS: TestLifeCycleHook/PreStop-HTTPGet (0.00s)
--- PASS: TestLifeCycleHook/PreStop-NoTimeToRun (0.00s)
--- PASS: TestLifeCycleHook/PostStart-CmdExe (0.00s)
PASS
coverage: 15.3% of statements
ok k8s.io/kubernetes/pkg/kubelet/kuberuntime 0.429s
```
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```
Do not try to run preStopHook when the gracePeriod is 0
```
Automatic merge from submit-queue (batch tested with PRs 49651, 49707, 49662, 47019, 49747)
Add support for `no_new_privs` via AllowPrivilegeEscalation
**What this PR does / why we need it**:
Implements kubernetes/community#639
Fixes#38417
Adds `AllowPrivilegeEscalation` and `DefaultAllowPrivilegeEscalation` to `PodSecurityPolicy`.
Adds `AllowPrivilegeEscalation` to container `SecurityContext`.
Adds the proposed behavior to `kuberuntime`, `dockershim`, and `rkt`. Adds a bunch of unit tests to ensure the desired default behavior and that when `DefaultAllowPrivilegeEscalation` is explicitly set.
Tests pass locally with docker and rkt runtimes. There are also a few integration tests with a `setuid` binary for sanity.
**Release note**:
```release-note
Adds AllowPrivilegeEscalation to control whether a process can gain more privileges than it's parent process
```
Automatic merge from submit-queue (batch tested with PRs 47357, 49514, 49271, 49572, 49476)
Using only the exit code to decide when to fall back on logs
We expect the exit code to be non-zero if the the container process was
OOM killed. Remove the check that uses the "Reason" field.
Automatic merge from submit-queue (batch tested with PRs 49444, 47864, 48584, 49395, 49118)
Move event type
Change SandboxChanged to a constant and move to the event package below.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Revert workaround in PR 46246 as APIs have been consistent
**What this PR does / why we need it**:
No need to convert v1.ObjectReference as APIs have been consistent in `k8s.io/api/core/v1`.
**Which issue this PR fixes** : fixes#48668
**Special notes for your reviewer**:
/assign @derekwaynecarr @caesarxuchao
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 47327, 48194)
Checked container spec when killing container.
**What this PR does / why we need it**:
Checked container spec when getting container, return error if failed.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#48173
**Release note**:
```release-note-none
```
Automatic merge from submit-queue (batch tested with PRs 44058, 48085, 48077, 48076, 47823)
don't pass CRI error through to waiting state reason
Raw gRPC errors are getting into the `Reason` field of the container status `State`, causing it to be output inline on a `kubectl get pod`
xref https://bugzilla.redhat.com/show_bug.cgi?id=1449820
Basically the issue is that the err and msg are reversed in `startContainer()`. The msg is short and the err is long. It should be the other way around.
This PR changes `startContainer()` to return a short error that becomes the Reason and the extracted gPRC error description that becomes the Message.
@derekwaynecarr @smarterclayton @eparis
The verification function is fixed to check the value of RunAsNonRoot,
not just the existence of it. Also adds unit tests to verify the correct
behavior.
Automatic merge from submit-queue
Delete all dead containers and sandboxes when under disk pressure.
This PR modifies the eviction manager to add dead container and sandbox garbage collection as a resource reclaim function for disk. It also modifies the container GC logic to allow pods that are terminated, but not deleted to be removed.
It still does not delete containers that are less than the minGcAge. This should prevent nodes from entering a permanently bad state if the entire disk is occupied by pods that are terminated (in the state failed, or succeeded), but not deleted.
There are two improvements we should consider making in the future:
- Track the disk space and inodes reclaimed by deleting containers. We currently do not track this, and it prevents us from determining if deleting containers resolves disk pressure. So we may still evict a pod even if we are able to free disk space by deleting dead containers.
- Once we can track disk space and inodes reclaimed, we should consider only deleting the containers we need to in order to relieve disk pressure. This should help avoid a scenario where we try and delete a massive number of containers all at once, and overwhelm the runtime.
/assign @vishh
cc @derekwaynecarr
```release-note
Disk Pressure triggers the deletion of terminated containers on the node.
```
Automatic merge from submit-queue (batch tested with PRs 45860, 45119, 44525, 45625, 44403)
Make a log line more clear in kuberuntime_manager.go.
Make a log in `podSandboxChanged` more clear.
@yujuhong @feiskyer
Automatic merge from submit-queue
Fix AssertCalls usage for kubelet fake runtimes unit tests
Despite its name, AssertCalls() does not assert anything. It returns an error that should be checked. This was causing false negatives for a handful of unit tests, which are also fixed here.
Tests for the image manager needed to be rearranged in order to accommodate a potentially different sequence of calls each tick because the image puller changes behavior based on prior errors.
**What this PR does / why we need it**: Fixes broken unit tests
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 45453, 45307, 44987)
Migrate the docker client code from dockertools to dockershim
Move docker client code from dockertools to dockershim/libdocker. This includes
DockerInterface (renamed to Interface), FakeDockerClient, etc.
This is part of #43234
Despite its name, AssertCalls() does not assert anything. It returns an
error that must be checked. This was causing false negatives for
a handful of unit tests.