Automatic merge from submit-queue
[kubelet] rename --cgroups-per-qos to --experimental-cgroups-per-qos
This reflects the true nature of "cgroups per qos" feature.
```release-note
* Rename `--cgroups-per-qos` to `--experimental-cgroups-per-qos` in Kubelet
```
Automatic merge from submit-queue
fix issue in reconstruct volume data when kubelet restarts
During state reconstruction when kubelet restarts, outerVolueSpecName
cannot be recovered by scanning the disk directories. But this
information is used by volume manager to check whether pod's volume is
mounted or not. There are two possible cases:
1. pod is not deleted during kubelet restarts so that desired state
should have the information. reconciler.updateState() will use this
inforamtion to update.
2. pod is deleted during this period, reconciler has to use
InnerVolumeSpecName, but it should be ok since this information will not
be used for volume cleanup (umount)
Automatic merge from submit-queue
CRI: general grammar/spelling/consistency cleanup
No semantic changes, but a lot of shuffling of docstrings to make things
more consistent. In particular, standardise on the zeroth-article (i.e.
prefer `// Version` to `// The version`) and ending all docstrings with
periods.
(This knowingly conflicts with #36446 and intentionally omits changing the
Annotations field - I'll rebase this or that respectively as necessary.)
During state reconstruction when kubelet restarts, outerVolueSpecName
cannot be recovered by scanning the disk directories. But this
information is used by volume manager to check whether pod's volume is
mounted or not. There are two possible cases:
1. pod is not deleted during kubelet restarts so that desired state
should have the information. reconciler.updateState() will use this
inforamtion to update.
2. pod is deleted during this period, reconciler has to use
InnerVolumeSpecName, but it should be ok since this information will not
be used for volume cleanup (umount)
Automatic merge from submit-queue
Fix getting cgroup pids
Fixes https://github.com/kubernetes/kubernetes/issues/35214, https://github.com/kubernetes/kubernetes/issues/33232
Verified manually, but I didn't have time to run all the e2e's yet (will check it in the morning).
This should be cherry-picked into 1.4, and merged into 1.5 (/cc @saad-ali )
```release-note
Fix fetching pids running in a cgroup, which caused problems with OOM score adjustments & setting the /system cgroup ("misc" in the summary API).
```
/cc @kubernetes/sig-node
Automatic merge from submit-queue
Add e2e node test for log path
fixes#34661
A node e2e test to check if container logs files are properly created with right content.
Since the log files under `/var/log/containers` are actually symbolic of docker containers log files, we can not use a pod to mount them in and do check (symbolic doesn't supported by docker volume).
cc @Random-Liu
Automatic merge from submit-queue
Use indirect streaming path for remote CRI shim
Last step for https://github.com/kubernetes/kubernetes/issues/29579
- Wire through the remote indirect streaming methods in the docker remote shim
- Add the docker streaming server as a handler at `<node>:10250/cri/{exec,attach,portforward}`
- Disable legacy streaming for dockershim
Note: This requires PR https://github.com/kubernetes/kubernetes/pull/34987 to work.
Tested manually on an E2E cluster.
/cc @euank @feiskyer @kubernetes/sig-node
Automatic merge from submit-queue
Cleanup kubelet eviction manager tests
It cleans up kubelet eviction manager tests
Extracted parts of tests that were similar to each other to functions
No semantic changes, but a lot of shuffling of docstrings to make things
more consistent. In particular, standardise on the zeroth-article (i.e.
prefer `// Version` to `// The version`) and ending all docstrings with
periods.
Provides an opt-in flag, --experimental-fail-swap-on (and corresponding
KubeletConfiguration value, ExperimentalFailSwapOn), which is false by default.
Automatic merge from submit-queue
CRI: Add security context for sandbox/container
Part of #29478. This PR
- adds security context for sandbox and fixes#33139
- encaps container security context to `SecurityContext` and adds missing features
- Note that capability is not fully accomplished in this PR because it is under discussion at #33614.
cc/ @yujuhong @yifan-gu @Random-Liu @kubernetes/sig-node
This allows us to interrupt/kill the executed command if it exceeds the
timeout (not implemented by this commit).
Set timeout in Exec probes. HTTPGet and TCPSocket probes respect the
timeout, while Exec probes used to ignore it.
Add e2e test for exec probe with timeout. However, the test is skipped
while the default exec handler doesn't support timeouts.
Automatic merge from submit-queue
kubelet bootstrap: start hostNetwork pods before we have PodCIDR
Network readiness was checked in the pod admission phase, but pods that
fail admission are not retried. Move the check to the pod start phase.
Issue #35409
Issue #35521
Automatic merge from submit-queue
CRI: rearrange kubelet rutnime initialization
Consolidate the code used by docker+cri and remote+cri for consistency, and to
prevent changing one without the other. Enforce that
`--experimental-runtime-integration-type` has to be set in order for kubelet
use the CRI interface, *even for out-of-process shims`. This simplifies the
temporary `if` logic in kubelet while CRI still co-exists with older logic.
Automatic merge from submit-queue
CRI: Add Status into CRI.
For https://github.com/kubernetes/kubernetes/issues/35701.
Fixes https://github.com/kubernetes/kubernetes/issues/35701.
This PR added a `Status` call in CRI, and the `RuntimeStatus` is defined as following:
``` protobuf
message RuntimeCondition {
// Type of runtime condition.
optional string type = 1;
// Status of the condition, one of true/false.
optional bool status = 2;
// Brief reason for the condition's last transition.
optional string reason = 3;
// Human readable message indicating details about last transition.
optional string message = 4;
}
message RuntimeStatus {
// Conditions is an array of current observed runtime conditions.
repeated RuntimeCondition conditions = 1;
}
```
Currently, only `conditions` is included in `RuntimeStatus`, and the definition is almost the same with `NodeCondition` and `PodCondition` in K8s api.
@yujuhong @feiskyer @bprashanth If this makes sense, I'll send a follow up PR to let dockershim return `RuntimeStatus` and let kubelet make use of it.
@yifan-gu @euank Does this make sense to rkt?
/cc @kubernetes/sig-node
Automatic merge from submit-queue
We only report diskpressure to users, and no longer report inodepressure
See #36180 for more information on why #33218 was reverted.
Automatic merge from submit-queue
CRI: stop sandbox before removing it
Stopping a sandbox includes reclaiming the network resources. By always
stopping the sandbox before removing it, we reduce the possibility of leaking
resources in some corner cases.
Automatic merge from submit-queue
Remove GetRootContext method from VolumeHost interface
Remove the `GetRootContext` call from the `VolumeHost` interface, since Kubernetes no longer needs to know the SELinux context of the Kubelet directory.
Per #33951 and #35127.
Depends on #33663; only the last commit is relevant to this PR.
Automatic merge from submit-queue
Initial work on running windows containers on Kubernetes
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
This is the first stab at getting the Kubelet running on Windows (fixes#30279), and getting it to deploy network-accessible pods that consist of Windows containers. Thanks @csrwng, @jbhurat for helping out.
The main challenge with Windows containers at this point is that container networking is not supported. In other words, each container in the pod will get it's own IP address. For this reason, we had to make a couple of changes to the kubelet when it comes to setting the pod's IP in the Pod Status. Instead of using the infra-container's IP, we use the IP address of the first container.
Other approaches we investigated involved "disabling" the infra container, either conditionally on `runtime.GOOS` or having a separate windows-docker container runtime that re-implemented some of the methods (would require some refactoring to avoid maintainability nightmare).
Other changes:
- The default docker endpoint was removed. This results in the docker client using the default for the specific underlying OS.
More detailed documentation on how to setup the Windows kubelet can be found at https://docs.google.com/document/d/1IjwqpwuRdwcuWXuPSxP-uIz0eoJNfAJ9MWwfY20uH3Q.
cc: @ikester @brendandburns @jstarks
Automatic merge from submit-queue
Don't add duplicate Hostname address
If the cloudprovider returned an address of type Hostname, we shouldn't
add a duplicate one.
Fixes#36234
Automatic merge from submit-queue
Per Volume Inode Accounting
Collects volume inode stats using the same find command as cadvisor. The command is "find _path_ -xdev -printf '.' | wc -c". The output is passed to the summary api, and will be consumed by the eviction manager.
This cannot be merged yet, as it depends on changes adding the InodesUsed field to the summary api, and the eviction manager consuming this. Expect tests to fail until this happens.
DEPENDS ON #35137
Automatic merge from submit-queue
[AppArmor] Hold bad AppArmor pods in pending rather than rejecting
Fixes https://github.com/kubernetes/kubernetes/issues/32837
Overview of the fix:
If the Kubelet needs to reject a Pod for a reason that the control plane doesn't understand (e.g. which AppArmor profiles are installed on the node), then it might contiinuously try to run the pod on the same rejecting node. This change adds a concept of "soft rejection", in which the Pod is admitted, but not allowed to run (and therefore held in a pending state). This prevents the pod from being retried on other nodes, but also prevents the high churn. This is consistent with how other missing local resources (e.g. volumes) is handled.
A side effect of the change is that Pods which are not initially runnable will be retried. This is desired behavior since it avoids a race condition when a new node is brought up but the AppArmor profiles have not yet been loaded on it.
``` release-note
Pods with invalid AppArmor configurations will be held in a Pending state, rather than rejected (failed). Check the pod status message to find out why it is not running.
```
@kubernetes/sig-node @timothysc @rrati @davidopp
Consolidate the code used by docker+cri and remote+cri for consistency, and to
prevent changing one without the other. Enforce that
`--experimental-runtime-integration-type` has to be set in order for kubelet
use the CRI interface, *even for out-of-process shims`. This simplifies the
temporary `if` logic in kubelet while CRI still co-exists with older logic.
Automatic merge from submit-queue
Separate Direct and Indirect streaming paths, implement indirect path for CRI
This PR refactors the `pkg/kubelet/container.Runtime` interface to remove the `ExecInContainer`, `PortForward` and `AttachContainer` methods. Instead, those methods are part of the `DirectStreamingRuntime` interface which all "legacy" runtimes implement. I also added an `IndirectStreamingRuntime` which handles the redirect path and is implemented by CRI runtimes. To control the size of this PR, I did not fully setup the indirect streaming path for the dockershim, so I left legacy path behind.
Most of this PR is moving & renaming associated with the refactoring. To understand the functional changes, I suggest tracing the code from `getExec` in `pkg/kubelet/server/server.go`, which calls `GetExec` in `pkg/kubelet/kubelet_pods.go` to determine whether to follow the direct or indirect path.
For https://github.com/kubernetes/kubernetes/issues/29579
/cc @kubernetes/sig-node
Automatic merge from submit-queue
CRI: Add devices to ContainerConfig
This PR adds devices to ContainerConfig and adds experimental GPU support.
cc/ @yujuhong @Hui-Zhi @vishh @kubernetes/sig-node
Stopping a sandbox includes reclaiming the network resources. By always
stopping the sandbox before removing it, we reduce the possibility of leaking
resources in some corner cases.
Automatic merge from submit-queue
Populate Node.Status.Addresses with Hostname
This PR is supposed to address #22063
Currently `NodeName` has to be a resolvable dns address on the master to allow apiserver -> kubelet communication (exec, log, port-forward operations on a pod). In some situations this is unfortunate (see the discussions on the issue).
The PR aims to do the following:
- Populate the `Type: Hostname` in the `Node.Status.Addresses` array, the type is already defined, but was not used so far.
- Add logic to resolve a Node's Hostname when the apiserver initiates communication with the Kubelet, instead of using the Nodename string as Hostname.
```release-note
The hostname of the node (as autodetected by the kubelet, specified via --hostname-override, or determined by the cloudprovider) is now recorded as an address of type "Hostname" in the status of the Node API object. The hostname is expected to be resolveable from the apiserver.
```
Automatic merge from submit-queue
pod and qos level cgroup support
```release-note
[Kubelet] Add alpha support for `--cgroups-per-qos` using the configured `--cgroup-driver`. Disabled by default.
```
Automatic merge from submit-queue
CRI: Handle empty container name in dockershim.
Fixes https://github.com/kubernetes/kubernetes/issues/35924.
Dead container may have no name, we should handle this properly.
@yujuhong @bprashanth
Automatic merge from submit-queue
CRI: Add kuberuntime container logs
Based on https://github.com/kubernetes/kubernetes/pull/34858.
The first 2 commits are from #34858. And the last 2 commits are new.
This PR added kuberuntime container logs support and add unit test for it.
I've tested all the functions manually, and I'll send another PR to write a node e2e test for container log.
**_Notice: current implementation doesn't support log rotation**_, which means that:
- It will not retrieve logs in rotated log file.
- If log rotation happens when following the log:
- If the rotation is using create mode, we'll still follow the old file.
- If the rotation is using copytruncate, we'll be reading at the original position and get nothing.
To solve these issues, kubelet needs to rotate the log itself, or at least kubelet should be able to control the the behavior of log rotator. These are doable but out of the scope of 1.5 and will be addressed in future release.
@yujuhong @feiskyer @yifan-gu
/cc @kubernetes/sig-node
Automatic merge from submit-queue
CRI: Rename container/sandbox states
The enum constants are not namespaced. The shorter, unspecifc names are likely
to cause naming conflicts in the future.
Also replace "SandBox" with "Sandbox" in the API for consistency.
/cc @kubernetes/sig-node
The enum constants are not namespaced. The shorter, unspecifc names are likely
to cause naming conflicts in the future.
Also replace "SandBox" with "Sandbox" in the API.
This change add a container manager inside the dockershim to move docker daemon
and associated processes to a specified cgroup. The original kubelet container
manager will continue checking the name of the cgroup, so that kubelet know how
to report runtime stats.
Automatic merge from submit-queue
Eviction manager evicts based on inode consumption
Fixes: #32526 Integrate Cadvisor per-container inode stats into the summary api. Make the eviction manager act based on inode consumption to evict pods using the most inodes.
This PR is pending on a cadvisor godeps update which will be included in PR #35136
Automatic merge from submit-queue
Only set sysctls for infra containers
We did set the sysctls for each container in a pod. This opens up a way to set un-whitelisted sysctls during upgrade from v1.3:
- set annotation in v1.3 with an un-whitelisted sysctl. Set restartPolicy=Always
- upgrade cluster to v1.4
- kill container process
- un-whitelisted sysctl is set on restart of the killed container.
Automatic merge from submit-queue
SELinux Overhaul
Overhauls handling of SELinux in Kubernetes. TLDR: Kubelet dir no longer has to be labeled `svirt_sandbox_file_t`.
Fixes#33351 and #33510. Implements #33951.
Automatic merge from submit-queue
Implement streaming CRI methods in dockershim
*NOTE: Temporarily includes commit from https://github.com/kubernetes/kubernetes/pull/35330 - only review the second commit.*
Builds on https://github.com/kubernetes/kubernetes/pull/35330, using the library to implement the streaming methods in various CRI shims.
This does not actually wire up the new streaming methods in the kubelet (that will be my next PR). Once the new methods are wired up, I will delete the `Legacy{Exec,Attach,PortForward}` methods.
/cc @kubernetes/sig-node @feiskyer
Automatic merge from submit-queue
Simplify negotiation in server in preparation for multi version support
This is a pre-factor for #33900 to simplify runtime.NegotiatedSerializer, tighten up a few abstractions that may break when clients can request different client versions, and pave the way for better negotiation.
View this as pure simplification.
Automatic merge from submit-queue
Fix cadvisor_unsupported and the crossbuild
Resolves a bug in the `cadvisor_unsupported.go` code.
Fixes https://github.com/kubernetes/kubernetes/issues/35735
Introduced by: https://github.com/kubernetes/kubernetes/pull/35136
We should consider to cherrypick this as #35136 also was cherrypicked
cc @kubernetes/sig-testing @vishh @dashpole @jessfraz
```release-note
Fix cadvisor_unsupported and the crossbuild
```
Automatic merge from submit-queue
[PHASE 1] Opaque integer resource accounting.
## [PHASE 1] Opaque integer resource accounting.
This change provides a simple way to advertise some amount of arbitrary countable resource for a node in a Kubernetes cluster. Users can consume these resources by including them in pod specs, and the scheduler takes them into account when placing pods on nodes. See the example at the bottom of the PR description for more info.
Summary of changes:
- Defines opaque integer resources as any resource with prefix `pod.alpha.kubernetes.io/opaque-int-resource-`.
- Prevent kubelet from overwriting capacity.
- Handle opaque resources in scheduler.
- Validate integer-ness of opaque int quantities in API server.
- Tests for above.
Feature issue: https://github.com/kubernetes/features/issues/76
Design: http://goo.gl/IoKYP1
Issues:
kubernetes/kubernetes#28312kubernetes/kubernetes#19082
Related:
kubernetes/kubernetes#19080
CC @davidopp @timothysc @balajismaniam
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
Added support for accounting opaque integer resources.
Allows cluster operators to advertise new node-level resources that would be
otherwise unknown to Kubernetes. Users can consume these resources in pod
specs just like CPU and memory. The scheduler takes care of the resource
accounting so that no more than the available amount is simultaneously
allocated to pods.
```
## Usage example
```sh
$ echo '[{"op": "add", "path": "pod.alpha.kubernetes.io~1opaque-int-resource-bananas", "value": "555"}]' | \
> http PATCH http://localhost:8080/api/v1/nodes/localhost.localdomain/status \
> Content-Type:application/json-patch+json
```
```http
HTTP/1.1 200 OK
Content-Type: application/json
Date: Thu, 11 Aug 2016 16:44:55 GMT
Transfer-Encoding: chunked
{
"apiVersion": "v1",
"kind": "Node",
"metadata": {
"annotations": {
"volumes.kubernetes.io/controller-managed-attach-detach": "true"
},
"creationTimestamp": "2016-07-12T04:07:43Z",
"labels": {
"beta.kubernetes.io/arch": "amd64",
"beta.kubernetes.io/os": "linux",
"kubernetes.io/hostname": "localhost.localdomain"
},
"name": "localhost.localdomain",
"resourceVersion": "12837",
"selfLink": "/api/v1/nodes/localhost.localdomain/status",
"uid": "2ee9ea1c-47e6-11e6-9fb4-525400659b2e"
},
"spec": {
"externalID": "localhost.localdomain"
},
"status": {
"addresses": [
{
"address": "10.0.2.15",
"type": "LegacyHostIP"
},
{
"address": "10.0.2.15",
"type": "InternalIP"
}
],
"allocatable": {
"alpha.kubernetes.io/nvidia-gpu": "0",
"cpu": "2",
"memory": "8175808Ki",
"pods": "110"
},
"capacity": {
"alpha.kubernetes.io/nvidia-gpu": "0",
"pod.alpha.kubernetes.io/opaque-int-resource-bananas": "555",
"cpu": "2",
"memory": "8175808Ki",
"pods": "110"
},
"conditions": [
{
"lastHeartbeatTime": "2016-08-11T16:44:47Z",
"lastTransitionTime": "2016-07-12T04:07:43Z",
"message": "kubelet has sufficient disk space available",
"reason": "KubeletHasSufficientDisk",
"status": "False",
"type": "OutOfDisk"
},
{
"lastHeartbeatTime": "2016-08-11T16:44:47Z",
"lastTransitionTime": "2016-07-12T04:07:43Z",
"message": "kubelet has sufficient memory available",
"reason": "KubeletHasSufficientMemory",
"status": "False",
"type": "MemoryPressure"
},
{
"lastHeartbeatTime": "2016-08-11T16:44:47Z",
"lastTransitionTime": "2016-08-10T06:27:11Z",
"message": "kubelet is posting ready status",
"reason": "KubeletReady",
"status": "True",
"type": "Ready"
},
{
"lastHeartbeatTime": "2016-08-11T16:44:47Z",
"lastTransitionTime": "2016-08-10T06:27:01Z",
"message": "kubelet has no disk pressure",
"reason": "KubeletHasNoDiskPressure",
"status": "False",
"type": "DiskPressure"
}
],
"daemonEndpoints": {
"kubeletEndpoint": {
"Port": 10250
}
},
"images": [],
"nodeInfo": {
"architecture": "amd64",
"bootID": "1f7e95ca-a4c2-490e-8ca2-6621ae1eb5f0",
"containerRuntimeVersion": "docker://1.10.3",
"kernelVersion": "4.5.7-202.fc23.x86_64",
"kubeProxyVersion": "v1.3.0-alpha.4.4285+7e4b86c96110d3-dirty",
"kubeletVersion": "v1.3.0-alpha.4.4285+7e4b86c96110d3-dirty",
"machineID": "cac4063395254bc89d06af5d05322453",
"operatingSystem": "linux",
"osImage": "Fedora 23 (Cloud Edition)",
"systemUUID": "D6EE0782-5DEB-4465-B35D-E54190C5EE96"
}
}
}
```
After patching, the kubelet's next sync fills in allocatable:
```
$ kubectl get node localhost.localdomain -o json | jq .status.allocatable
```
```json
{
"alpha.kubernetes.io/nvidia-gpu": "0",
"pod.alpha.kubernetes.io/opaque-int-resource-bananas": "555",
"cpu": "2",
"memory": "8175808Ki",
"pods": "110"
}
```
Create two pods, one that needs a single banana and another that needs a truck load:
```
$ kubectl create -f chimp.yaml
$ kubectl create -f superchimp.yaml
```
Inspect the scheduler result and pod status:
```
$ kubectl describe pods chimp
Name: chimp
Namespace: default
Node: localhost.localdomain/10.0.2.15
Start Time: Thu, 11 Aug 2016 19:58:46 +0000
Labels: <none>
Status: Running
IP: 172.17.0.2
Controllers: <none>
Containers:
nginx:
Container ID: docker://46ff268f2f9217c59cc49f97cc4f0f085d5ac0e251f508cc08938601117c0cec
Image: nginx:1.10
Image ID: docker://sha256:82e97a2b0390a20107ab1310dea17f539ff6034438099384998fd91fc540b128
Port: 80/TCP
Limits:
cpu: 500m
memory: 64Mi
pod.alpha.kubernetes.io/opaque-int-resource-bananas: 3
Requests:
cpu: 250m
memory: 32Mi
pod.alpha.kubernetes.io/opaque-int-resource-bananas: 1
State: Running
Started: Thu, 11 Aug 2016 19:58:51 +0000
Ready: True
Restart Count: 0
Volume Mounts: <none>
Environment Variables: <none>
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
No volumes.
QoS Class: Burstable
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
9m 9m 1 {default-scheduler } Normal Scheduled Successfully assigned chimp to localhost.localdomain
9m 9m 2 {kubelet localhost.localdomain} Warning MissingClusterDNS kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to DNSDefault policy.
9m 9m 1 {kubelet localhost.localdomain} spec.containers{nginx} Normal Pulled Container image "nginx:1.10" already present on machine
9m 9m 1 {kubelet localhost.localdomain} spec.containers{nginx} Normal Created Created container with docker id 46ff268f2f92
9m 9m 1 {kubelet localhost.localdomain} spec.containers{nginx} Normal Started Started container with docker id 46ff268f2f92
```
```
$ kubectl describe pods superchimp
Name: superchimp
Namespace: default
Node: /
Labels: <none>
Status: Pending
IP:
Controllers: <none>
Containers:
nginx:
Image: nginx:1.10
Port: 80/TCP
Requests:
cpu: 250m
memory: 32Mi
pod.alpha.kubernetes.io/opaque-int-resource-bananas: 10Ki
Volume Mounts: <none>
Environment Variables: <none>
Conditions:
Type Status
PodScheduled False
No volumes.
QoS Class: Burstable
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
3m 1s 15 {default-scheduler } Warning FailedScheduling pod (superchimp) failed to fit in any node
fit failure on node (localhost.localdomain): Insufficient pod.alpha.kubernetes.io/opaque-int-resource-bananas
```
- Prevents kubelet from overwriting capacity during sync.
- Handles opaque integer resources in the scheduler.
- Adds scheduler predicate tests for opaque resources.
- Validates opaque int resources:
- Ensures supplied opaque int quantities in node capacity,
node allocatable, pod request and pod limit are integers.
- Adds tests for new validation logic (node update and pod spec).
- Added e2e tests for opaque integer resources.
Alter how runtime.SerializeInfo is represented to simplify negotiation
and reduce the need to allocate during negotiation. Simplify the dynamic
client's logic around negotiating type. Add more tests for media type
handling where necessary.
Automatic merge from submit-queue
First pass at CRI stream server library implementation
This is a first pass at implementing a library for serving attach/exec/portforward calls from a CRI shim process as discussed in [CRI Streaming Requests](https://docs.google.com/document/d/1OE_QoInPlVCK9rMAx9aybRmgFiVjHpJCHI9LrfdNM_s/edit#).
Remaining library work:
- implement authn/z
- implement `stayUp=false`, a.k.a. auto-stop the server once all connections are closed
/cc @kubernetes/sig-node
Automatic merge from submit-queue
Add sysctls for dockershim
This PR adds sysctls support for dockershim. All sysctls e2e tests are passed in my local settings.
Note that sysctls runtimeAdmit is not included in this PR, it is addressed in #32803.
cc/ @yujuhong @Random-Liu