Automatic merge from submit-queue
Remove GetRootContext method from VolumeHost interface
Remove the `GetRootContext` call from the `VolumeHost` interface, since Kubernetes no longer needs to know the SELinux context of the Kubelet directory.
Per #33951 and #35127.
Depends on #33663; only the last commit is relevant to this PR.
Automatic merge from submit-queue
Initial work on running windows containers on Kubernetes
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
This is the first stab at getting the Kubelet running on Windows (fixes#30279), and getting it to deploy network-accessible pods that consist of Windows containers. Thanks @csrwng, @jbhurat for helping out.
The main challenge with Windows containers at this point is that container networking is not supported. In other words, each container in the pod will get it's own IP address. For this reason, we had to make a couple of changes to the kubelet when it comes to setting the pod's IP in the Pod Status. Instead of using the infra-container's IP, we use the IP address of the first container.
Other approaches we investigated involved "disabling" the infra container, either conditionally on `runtime.GOOS` or having a separate windows-docker container runtime that re-implemented some of the methods (would require some refactoring to avoid maintainability nightmare).
Other changes:
- The default docker endpoint was removed. This results in the docker client using the default for the specific underlying OS.
More detailed documentation on how to setup the Windows kubelet can be found at https://docs.google.com/document/d/1IjwqpwuRdwcuWXuPSxP-uIz0eoJNfAJ9MWwfY20uH3Q.
cc: @ikester @brendandburns @jstarks
Automatic merge from submit-queue
Don't add duplicate Hostname address
If the cloudprovider returned an address of type Hostname, we shouldn't
add a duplicate one.
Fixes#36234
Automatic merge from submit-queue
Per Volume Inode Accounting
Collects volume inode stats using the same find command as cadvisor. The command is "find _path_ -xdev -printf '.' | wc -c". The output is passed to the summary api, and will be consumed by the eviction manager.
This cannot be merged yet, as it depends on changes adding the InodesUsed field to the summary api, and the eviction manager consuming this. Expect tests to fail until this happens.
DEPENDS ON #35137
Automatic merge from submit-queue
[AppArmor] Hold bad AppArmor pods in pending rather than rejecting
Fixes https://github.com/kubernetes/kubernetes/issues/32837
Overview of the fix:
If the Kubelet needs to reject a Pod for a reason that the control plane doesn't understand (e.g. which AppArmor profiles are installed on the node), then it might contiinuously try to run the pod on the same rejecting node. This change adds a concept of "soft rejection", in which the Pod is admitted, but not allowed to run (and therefore held in a pending state). This prevents the pod from being retried on other nodes, but also prevents the high churn. This is consistent with how other missing local resources (e.g. volumes) is handled.
A side effect of the change is that Pods which are not initially runnable will be retried. This is desired behavior since it avoids a race condition when a new node is brought up but the AppArmor profiles have not yet been loaded on it.
``` release-note
Pods with invalid AppArmor configurations will be held in a Pending state, rather than rejected (failed). Check the pod status message to find out why it is not running.
```
@kubernetes/sig-node @timothysc @rrati @davidopp
Consolidate the code used by docker+cri and remote+cri for consistency, and to
prevent changing one without the other. Enforce that
`--experimental-runtime-integration-type` has to be set in order for kubelet
use the CRI interface, *even for out-of-process shims`. This simplifies the
temporary `if` logic in kubelet while CRI still co-exists with older logic.
Automatic merge from submit-queue
Separate Direct and Indirect streaming paths, implement indirect path for CRI
This PR refactors the `pkg/kubelet/container.Runtime` interface to remove the `ExecInContainer`, `PortForward` and `AttachContainer` methods. Instead, those methods are part of the `DirectStreamingRuntime` interface which all "legacy" runtimes implement. I also added an `IndirectStreamingRuntime` which handles the redirect path and is implemented by CRI runtimes. To control the size of this PR, I did not fully setup the indirect streaming path for the dockershim, so I left legacy path behind.
Most of this PR is moving & renaming associated with the refactoring. To understand the functional changes, I suggest tracing the code from `getExec` in `pkg/kubelet/server/server.go`, which calls `GetExec` in `pkg/kubelet/kubelet_pods.go` to determine whether to follow the direct or indirect path.
For https://github.com/kubernetes/kubernetes/issues/29579
/cc @kubernetes/sig-node
Automatic merge from submit-queue
CRI: Add devices to ContainerConfig
This PR adds devices to ContainerConfig and adds experimental GPU support.
cc/ @yujuhong @Hui-Zhi @vishh @kubernetes/sig-node
Stopping a sandbox includes reclaiming the network resources. By always
stopping the sandbox before removing it, we reduce the possibility of leaking
resources in some corner cases.
Automatic merge from submit-queue
Populate Node.Status.Addresses with Hostname
This PR is supposed to address #22063
Currently `NodeName` has to be a resolvable dns address on the master to allow apiserver -> kubelet communication (exec, log, port-forward operations on a pod). In some situations this is unfortunate (see the discussions on the issue).
The PR aims to do the following:
- Populate the `Type: Hostname` in the `Node.Status.Addresses` array, the type is already defined, but was not used so far.
- Add logic to resolve a Node's Hostname when the apiserver initiates communication with the Kubelet, instead of using the Nodename string as Hostname.
```release-note
The hostname of the node (as autodetected by the kubelet, specified via --hostname-override, or determined by the cloudprovider) is now recorded as an address of type "Hostname" in the status of the Node API object. The hostname is expected to be resolveable from the apiserver.
```
Automatic merge from submit-queue
pod and qos level cgroup support
```release-note
[Kubelet] Add alpha support for `--cgroups-per-qos` using the configured `--cgroup-driver`. Disabled by default.
```
Automatic merge from submit-queue
CRI: Handle empty container name in dockershim.
Fixes https://github.com/kubernetes/kubernetes/issues/35924.
Dead container may have no name, we should handle this properly.
@yujuhong @bprashanth
Automatic merge from submit-queue
CRI: Add kuberuntime container logs
Based on https://github.com/kubernetes/kubernetes/pull/34858.
The first 2 commits are from #34858. And the last 2 commits are new.
This PR added kuberuntime container logs support and add unit test for it.
I've tested all the functions manually, and I'll send another PR to write a node e2e test for container log.
**_Notice: current implementation doesn't support log rotation**_, which means that:
- It will not retrieve logs in rotated log file.
- If log rotation happens when following the log:
- If the rotation is using create mode, we'll still follow the old file.
- If the rotation is using copytruncate, we'll be reading at the original position and get nothing.
To solve these issues, kubelet needs to rotate the log itself, or at least kubelet should be able to control the the behavior of log rotator. These are doable but out of the scope of 1.5 and will be addressed in future release.
@yujuhong @feiskyer @yifan-gu
/cc @kubernetes/sig-node
Automatic merge from submit-queue
CRI: Rename container/sandbox states
The enum constants are not namespaced. The shorter, unspecifc names are likely
to cause naming conflicts in the future.
Also replace "SandBox" with "Sandbox" in the API for consistency.
/cc @kubernetes/sig-node
The enum constants are not namespaced. The shorter, unspecifc names are likely
to cause naming conflicts in the future.
Also replace "SandBox" with "Sandbox" in the API.
This change add a container manager inside the dockershim to move docker daemon
and associated processes to a specified cgroup. The original kubelet container
manager will continue checking the name of the cgroup, so that kubelet know how
to report runtime stats.
Automatic merge from submit-queue
Eviction manager evicts based on inode consumption
Fixes: #32526 Integrate Cadvisor per-container inode stats into the summary api. Make the eviction manager act based on inode consumption to evict pods using the most inodes.
This PR is pending on a cadvisor godeps update which will be included in PR #35136
Automatic merge from submit-queue
Only set sysctls for infra containers
We did set the sysctls for each container in a pod. This opens up a way to set un-whitelisted sysctls during upgrade from v1.3:
- set annotation in v1.3 with an un-whitelisted sysctl. Set restartPolicy=Always
- upgrade cluster to v1.4
- kill container process
- un-whitelisted sysctl is set on restart of the killed container.
Automatic merge from submit-queue
SELinux Overhaul
Overhauls handling of SELinux in Kubernetes. TLDR: Kubelet dir no longer has to be labeled `svirt_sandbox_file_t`.
Fixes#33351 and #33510. Implements #33951.
Automatic merge from submit-queue
Implement streaming CRI methods in dockershim
*NOTE: Temporarily includes commit from https://github.com/kubernetes/kubernetes/pull/35330 - only review the second commit.*
Builds on https://github.com/kubernetes/kubernetes/pull/35330, using the library to implement the streaming methods in various CRI shims.
This does not actually wire up the new streaming methods in the kubelet (that will be my next PR). Once the new methods are wired up, I will delete the `Legacy{Exec,Attach,PortForward}` methods.
/cc @kubernetes/sig-node @feiskyer
Automatic merge from submit-queue
Simplify negotiation in server in preparation for multi version support
This is a pre-factor for #33900 to simplify runtime.NegotiatedSerializer, tighten up a few abstractions that may break when clients can request different client versions, and pave the way for better negotiation.
View this as pure simplification.
Automatic merge from submit-queue
Fix cadvisor_unsupported and the crossbuild
Resolves a bug in the `cadvisor_unsupported.go` code.
Fixes https://github.com/kubernetes/kubernetes/issues/35735
Introduced by: https://github.com/kubernetes/kubernetes/pull/35136
We should consider to cherrypick this as #35136 also was cherrypicked
cc @kubernetes/sig-testing @vishh @dashpole @jessfraz
```release-note
Fix cadvisor_unsupported and the crossbuild
```
Automatic merge from submit-queue
[PHASE 1] Opaque integer resource accounting.
## [PHASE 1] Opaque integer resource accounting.
This change provides a simple way to advertise some amount of arbitrary countable resource for a node in a Kubernetes cluster. Users can consume these resources by including them in pod specs, and the scheduler takes them into account when placing pods on nodes. See the example at the bottom of the PR description for more info.
Summary of changes:
- Defines opaque integer resources as any resource with prefix `pod.alpha.kubernetes.io/opaque-int-resource-`.
- Prevent kubelet from overwriting capacity.
- Handle opaque resources in scheduler.
- Validate integer-ness of opaque int quantities in API server.
- Tests for above.
Feature issue: https://github.com/kubernetes/features/issues/76
Design: http://goo.gl/IoKYP1
Issues:
kubernetes/kubernetes#28312kubernetes/kubernetes#19082
Related:
kubernetes/kubernetes#19080
CC @davidopp @timothysc @balajismaniam
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
Added support for accounting opaque integer resources.
Allows cluster operators to advertise new node-level resources that would be
otherwise unknown to Kubernetes. Users can consume these resources in pod
specs just like CPU and memory. The scheduler takes care of the resource
accounting so that no more than the available amount is simultaneously
allocated to pods.
```
## Usage example
```sh
$ echo '[{"op": "add", "path": "pod.alpha.kubernetes.io~1opaque-int-resource-bananas", "value": "555"}]' | \
> http PATCH http://localhost:8080/api/v1/nodes/localhost.localdomain/status \
> Content-Type:application/json-patch+json
```
```http
HTTP/1.1 200 OK
Content-Type: application/json
Date: Thu, 11 Aug 2016 16:44:55 GMT
Transfer-Encoding: chunked
{
"apiVersion": "v1",
"kind": "Node",
"metadata": {
"annotations": {
"volumes.kubernetes.io/controller-managed-attach-detach": "true"
},
"creationTimestamp": "2016-07-12T04:07:43Z",
"labels": {
"beta.kubernetes.io/arch": "amd64",
"beta.kubernetes.io/os": "linux",
"kubernetes.io/hostname": "localhost.localdomain"
},
"name": "localhost.localdomain",
"resourceVersion": "12837",
"selfLink": "/api/v1/nodes/localhost.localdomain/status",
"uid": "2ee9ea1c-47e6-11e6-9fb4-525400659b2e"
},
"spec": {
"externalID": "localhost.localdomain"
},
"status": {
"addresses": [
{
"address": "10.0.2.15",
"type": "LegacyHostIP"
},
{
"address": "10.0.2.15",
"type": "InternalIP"
}
],
"allocatable": {
"alpha.kubernetes.io/nvidia-gpu": "0",
"cpu": "2",
"memory": "8175808Ki",
"pods": "110"
},
"capacity": {
"alpha.kubernetes.io/nvidia-gpu": "0",
"pod.alpha.kubernetes.io/opaque-int-resource-bananas": "555",
"cpu": "2",
"memory": "8175808Ki",
"pods": "110"
},
"conditions": [
{
"lastHeartbeatTime": "2016-08-11T16:44:47Z",
"lastTransitionTime": "2016-07-12T04:07:43Z",
"message": "kubelet has sufficient disk space available",
"reason": "KubeletHasSufficientDisk",
"status": "False",
"type": "OutOfDisk"
},
{
"lastHeartbeatTime": "2016-08-11T16:44:47Z",
"lastTransitionTime": "2016-07-12T04:07:43Z",
"message": "kubelet has sufficient memory available",
"reason": "KubeletHasSufficientMemory",
"status": "False",
"type": "MemoryPressure"
},
{
"lastHeartbeatTime": "2016-08-11T16:44:47Z",
"lastTransitionTime": "2016-08-10T06:27:11Z",
"message": "kubelet is posting ready status",
"reason": "KubeletReady",
"status": "True",
"type": "Ready"
},
{
"lastHeartbeatTime": "2016-08-11T16:44:47Z",
"lastTransitionTime": "2016-08-10T06:27:01Z",
"message": "kubelet has no disk pressure",
"reason": "KubeletHasNoDiskPressure",
"status": "False",
"type": "DiskPressure"
}
],
"daemonEndpoints": {
"kubeletEndpoint": {
"Port": 10250
}
},
"images": [],
"nodeInfo": {
"architecture": "amd64",
"bootID": "1f7e95ca-a4c2-490e-8ca2-6621ae1eb5f0",
"containerRuntimeVersion": "docker://1.10.3",
"kernelVersion": "4.5.7-202.fc23.x86_64",
"kubeProxyVersion": "v1.3.0-alpha.4.4285+7e4b86c96110d3-dirty",
"kubeletVersion": "v1.3.0-alpha.4.4285+7e4b86c96110d3-dirty",
"machineID": "cac4063395254bc89d06af5d05322453",
"operatingSystem": "linux",
"osImage": "Fedora 23 (Cloud Edition)",
"systemUUID": "D6EE0782-5DEB-4465-B35D-E54190C5EE96"
}
}
}
```
After patching, the kubelet's next sync fills in allocatable:
```
$ kubectl get node localhost.localdomain -o json | jq .status.allocatable
```
```json
{
"alpha.kubernetes.io/nvidia-gpu": "0",
"pod.alpha.kubernetes.io/opaque-int-resource-bananas": "555",
"cpu": "2",
"memory": "8175808Ki",
"pods": "110"
}
```
Create two pods, one that needs a single banana and another that needs a truck load:
```
$ kubectl create -f chimp.yaml
$ kubectl create -f superchimp.yaml
```
Inspect the scheduler result and pod status:
```
$ kubectl describe pods chimp
Name: chimp
Namespace: default
Node: localhost.localdomain/10.0.2.15
Start Time: Thu, 11 Aug 2016 19:58:46 +0000
Labels: <none>
Status: Running
IP: 172.17.0.2
Controllers: <none>
Containers:
nginx:
Container ID: docker://46ff268f2f9217c59cc49f97cc4f0f085d5ac0e251f508cc08938601117c0cec
Image: nginx:1.10
Image ID: docker://sha256:82e97a2b0390a20107ab1310dea17f539ff6034438099384998fd91fc540b128
Port: 80/TCP
Limits:
cpu: 500m
memory: 64Mi
pod.alpha.kubernetes.io/opaque-int-resource-bananas: 3
Requests:
cpu: 250m
memory: 32Mi
pod.alpha.kubernetes.io/opaque-int-resource-bananas: 1
State: Running
Started: Thu, 11 Aug 2016 19:58:51 +0000
Ready: True
Restart Count: 0
Volume Mounts: <none>
Environment Variables: <none>
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
No volumes.
QoS Class: Burstable
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
9m 9m 1 {default-scheduler } Normal Scheduled Successfully assigned chimp to localhost.localdomain
9m 9m 2 {kubelet localhost.localdomain} Warning MissingClusterDNS kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to DNSDefault policy.
9m 9m 1 {kubelet localhost.localdomain} spec.containers{nginx} Normal Pulled Container image "nginx:1.10" already present on machine
9m 9m 1 {kubelet localhost.localdomain} spec.containers{nginx} Normal Created Created container with docker id 46ff268f2f92
9m 9m 1 {kubelet localhost.localdomain} spec.containers{nginx} Normal Started Started container with docker id 46ff268f2f92
```
```
$ kubectl describe pods superchimp
Name: superchimp
Namespace: default
Node: /
Labels: <none>
Status: Pending
IP:
Controllers: <none>
Containers:
nginx:
Image: nginx:1.10
Port: 80/TCP
Requests:
cpu: 250m
memory: 32Mi
pod.alpha.kubernetes.io/opaque-int-resource-bananas: 10Ki
Volume Mounts: <none>
Environment Variables: <none>
Conditions:
Type Status
PodScheduled False
No volumes.
QoS Class: Burstable
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
3m 1s 15 {default-scheduler } Warning FailedScheduling pod (superchimp) failed to fit in any node
fit failure on node (localhost.localdomain): Insufficient pod.alpha.kubernetes.io/opaque-int-resource-bananas
```
- Prevents kubelet from overwriting capacity during sync.
- Handles opaque integer resources in the scheduler.
- Adds scheduler predicate tests for opaque resources.
- Validates opaque int resources:
- Ensures supplied opaque int quantities in node capacity,
node allocatable, pod request and pod limit are integers.
- Adds tests for new validation logic (node update and pod spec).
- Added e2e tests for opaque integer resources.
Alter how runtime.SerializeInfo is represented to simplify negotiation
and reduce the need to allocate during negotiation. Simplify the dynamic
client's logic around negotiating type. Add more tests for media type
handling where necessary.
Automatic merge from submit-queue
First pass at CRI stream server library implementation
This is a first pass at implementing a library for serving attach/exec/portforward calls from a CRI shim process as discussed in [CRI Streaming Requests](https://docs.google.com/document/d/1OE_QoInPlVCK9rMAx9aybRmgFiVjHpJCHI9LrfdNM_s/edit#).
Remaining library work:
- implement authn/z
- implement `stayUp=false`, a.k.a. auto-stop the server once all connections are closed
/cc @kubernetes/sig-node
Automatic merge from submit-queue
Add sysctls for dockershim
This PR adds sysctls support for dockershim. All sysctls e2e tests are passed in my local settings.
Note that sysctls runtimeAdmit is not included in this PR, it is addressed in #32803.
cc/ @yujuhong @Random-Liu
Automatic merge from submit-queue
Fix devices information struct in container
So far nowhere use the ```Devices``` which in ```RunContainerOptions```. But when I want to use it, found that it could be better if change it, because Devices in container is like:
```json
"Devices": [
{
"PathOnHost": "/dev/nvidiactl",
"PathInContainer": "/dev/nvidiactl",
"CgroupPermissions": "mrw"
},
{
"PathOnHost": "/dev/nvidia-uvm",
"PathInContainer": "/dev/nvidia-uvm",
"CgroupPermissions": "mrw"
},
{
"PathOnHost": "/dev/nvidia0",
"PathInContainer": "/dev/nvidia0",
"CgroupPermissions": "mrw"
}
],
```
Automatic merge from submit-queue
CRI: Instrumented cri service
For https://github.com/kubernetes/kubernetes/issues/29478.
This PR added instrumented CRI service. Because we are adding the instrumented wrapper inside kuberuntime, it should work for both grpc and non-grpc integration.
This will be useful to compare latency difference between grpc and non-grpc integration, although there shouldn't be too much difference.
@yujuhong @feiskyer
/cc @kubernetes/sig-node
Automatic merge from submit-queue
Refactor PortForward server methods into the portforward package
Refactor PortForward code into it's own package so it can be reused in the CRI streaming library without pulling in lots of extra dependencies.
This is a straightforward move. Nothing is changed other than a few references to the package.
Automatic merge from submit-queue
Fix volume states out of sync problem after kubelet restarts
When kubelet restarts, all the information about the volumes will be
gone from actual/desired states. When update node status with mounted
volumes, the volume list might be empty although there are still volumes
are mounted and in turn causing master to detach those volumes since
they are not in the mounted volumes list. This fix is to make sure only
update mounted volumes list after reconciler starts sync states process.
This sync state process will scan the existing volume directories and
reconstruct actual states if they are missing.
This PR also fixes the problem during orphaned pods' directories. In
case of the pod directory is unmounted but has not yet deleted (e.g.,
interrupted with kubelet restarts), clean up routine will delete the
directory so that the pod directoriy could be cleaned up (it is safe to
delete directory since it is no longer mounted)
The third issue this PR fixes is that during reconstruct volume in
actual state, mounter could not be nil since it is required for creating
container.VolumeMap. If it is nil, it might cause nil pointer exception
in kubelet.
Detailed design proposal is #33203
Automatic merge from submit-queue
CRI: Add dockershim grpc server.
This PR adds a in-process grpc server for dockershim.
Flags change:
1. `container-runtime` will not be automatically set to remote when `container-runtime-endpoint` is set. @feiskyer
2. set kubelet flag `--experimental-runtime-integration-type=remote --container-runtime-endpoint=UNIX_SOCKET_FILE_PATH` to enable the in-process dockershim grpc server.
3. set node e2e test flag `--runtime-integration-type=remote -container-runtime-endpoint=UNIX_SOCKET_FILE_PATH` to run node e2e test against in-process dockershim grpc server.
I've run node e2e test against the remote cri integration, tests which don't rely on stream and log functions can pass.
This unblocks the following work:
1) CRI conformance test.
2) Performance comparison between in-process integration and in-process grpc integration.
@yujuhong @feiskyer
/cc @kubernetes/sig-node
When kubelet restarts, all the information about the volumes will be
gone from actual/desired states. When update node status with mounted
volumes, the volume list might be empty although there are still volumes
are mounted and in turn causing master to detach those volumes since
they are not in the mounted volumes list. This fix is to make sure only
update mounted volumes list after reconciler starts sync states process.
This sync state process will scan the existing volume directories and
reconstruct actual states if they are missing.
This PR also fixes the problem during orphaned pods' directories. In
case of the pod directory is unmounted but has not yet deleted (e.g.,
interrupted with kubelet restarts), clean up routine will delete the
directory so that the pod directoriy could be cleaned up (it is safe to
delete directory since it is no longer mounted)
The third issue this PR fixes is that during reconstruct volume in
actual state, mounter could not be nil since it is required for creating
container.VolumeMap. If it is nil, it might cause nil pointer exception
in kubelet.
Details are in proposal PR #33203
Automatic merge from submit-queue
Do not log stack trace for the error http.StatusBadRequest (400).
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
**What this PR does / why we need it**:
This PR fixes an issue where stack trace is being logged in kubelet when the status http.StatusBadRequest occurs.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
```
Automatic merge from submit-queue
Use the rawTerminal setting from the container itself
**What this PR does / why we need it**:
Checks whether the container is set for rawTerminal connection and uses the appropriate connection.
Prevents the output `Error from server: Unrecognized input header` when doing `kubectl run`.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
helps with case 1 in #28695, resolves#30159
**Special notes for your reviewer**:
**Release note**:
```
release-note-none
```
Automatic merge from submit-queue
CRI: Refactor kuberuntime unit test
Based on https://github.com/kubernetes/kubernetes/pull/34858
This PR:
1) Refactor the fake runtime service and some kuberuntime unit test.
2) Add better garbage collection unit test.
3) Fix init container unit test which isn't testing correctly. Some other unit tests may also need to be fixed.
4) Add pod log directory garbage collection unit test.
@feiskyer @yujuhong
/cc @kubernetes/sig-node
Automatic merge from submit-queue
Kubelet getting node from apiserver cache before update.
This is blocked on #35218 (however it's ready for review).
It seems to visibly reduce the apiserver metrics (and I didn't observe higher number of conflicts even in 2000-node kubemark).
Automatic merge from submit-queue
Create restclient interface
Refactoring of code to allow replace *restclient.RESTClient with any RESTClient implementation that implements restclient.RESTClientInterface interface.
Automatic merge from submit-queue
CRI: Handle container/sandbox restarts for pod with RestartPolicy == …
If all sandbox and containers are dead in a pod, and the restart policy is
"Never", kubelet should not try to recreate all of them.
Automatic merge from submit-queue
Return an empty network namespace path for exited infra containers
If the infra container has already terminated, `docker inspect` will report
pid 0. The path constructed using the pid to check the network namespace of
the process will be invalid. This commit changes docker to report an empty
path to stop kubenet from erroring out whenever TearDown is called on an
exited infra container.
This is not a fix for all the plugins, as some plugins may require the actual
network namespace to tear down properly.
If the infra container has already terminated, `docker inspect` will report
pid 0. The path constructed using the pid to check the network namespace of
the process will be invalid. This commit changes docker to report an empty
path to stop kubenet from erroring out whenever TearDown is called on an
exited infra container.
This is not a fix for all the plugins, as some plugins may require the actual
network namespace to tear down properly.
Automatic merge from submit-queue
rkt: Convert image name to be a valid acidentifier
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
Fix a bug under the rkt runtime whereby image-registries with ports would not be fetched from
```
This fixes a bug whereby an image reference that included a port was not
recognized after being downloaded, and so could not be run
This is the quick-and-simple fix. In the longer term, we'll want to refactor image logic a bit more to handle the many special cases that the current code does not, mostly related to library images on dockerhub.
/cc @yifan-gu @kubernetes/sig-rktnetes
Automatic merge from submit-queue
WIP: Remove the legacy networking mode
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
**What this PR does / why we need it**:
Removes the deprecated configure-cbr0 flag and networking mode to avoid having untested and maybe unstable code in kubelet, see: #33789
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
fixes#30589fixes#31937
**Special notes for your reviewer**: There are a lot of deployments who rely on this networking mode. Not sure how we deal with that: force switch to kubenet or just delete the old deployment?
But please review the code changes first (the first commit)
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
Removed the deprecated kubelet --configure-cbr0 flag, and with that the "classic" networking mode as well
```
PTAL @kubernetes/sig-network @kubernetes/sig-node @mikedanese
Automatic merge from submit-queue
Remove static kubelet client, refactor ConnectionInfoGetter
Follow up to https://github.com/kubernetes/kubernetes/pull/33718
* Collapses the multi-valued return to a `ConnectionInfo` struct
* Removes the "raw" connection info method and interface, since it was only used in a single non-test location (by the "real" connection info method)
* Disentangles the node REST object from being a ConnectionInfoProvider itself by extracting an implementation of ConnectionInfoProvider that takes a node (using a provided NodeGetter) and determines ConnectionInfo
* Plumbs the KubeletClientConfig to the point where we construct the helper object that combines the config and the node lookup. I anticipate adding a preference order for choosing an address type in https://github.com/kubernetes/kubernetes/pull/34259
Automatic merge from submit-queue
kubelet: storage: don't hang kubelet on unresponsive nfs
Fixes#31272
Currently, due to the nature of nfs, an unresponsive nfs volume in a pod can wedge the kubelet such that additional pods can not be run.
The discussion thus far surrounding this issue was to wrap the `lstat`, the syscall that ends up hanging in uninterruptible sleep, in a goroutine and limiting the number of goroutines that hang to one per-pod per-volume.
However, in my investigation, I found that the callsites that request a listing of the volumes from a particular volume plugin directory don't care anything about the properties provided by the `lstat` call. They only care about whether or not a directory exists.
Given that constraint, this PR just avoids the `lstat` call by using `Readdirnames()` instead of `ReadDir()` or `ReadDirNoExit()`
### More detail for reviewers
Consider the pod mounted nfs volume at `/var/lib/kubelet/pods/881341b5-9551-11e6-af4c-fa163e815edd/volumes/kubernetes.io~nfs/myvol`. The kubelet wedges because when we do a `ReadDir()` or `ReadDirNoExit()` it calls `syscall.Lstat` on `myvol` which requires communication with the nfs server. If the nfs server is unreachable, this call hangs forever.
However, for our code, we only care what about the names of files/directory contained in `kubernetes.io~nfs` directory, not any of the more detailed information the `Lstat` call provides. Getting the names can be done with `Readdirnames()`, which doesn't need to involve the nfs server.
@pmorie @eparis @ncdc @derekwaynecarr @saad-ali @thockin @vishh @kubernetes/rh-cluster-infra
Automatic merge from submit-queue
Fix edge case in qos evaluation
If a pod has a container C1 and C2, where sum(C1.requests, C2.requests) equals (C1.Limits), the code was reporting that the pod had "Guaranteed" qos, when it should have been Burstable.
/cc @vishh @dchen1107
Automatic merge from submit-queue
Log more information on pod status updates
Also bump the logging level to V2 so that we can see them in a non-test
cluster.
Automatic merge from submit-queue
add UpdateRuntimeConfig interface
Expose UpdateRuntimeConfig interface in RuntimeService for kubelet to pass a set of configurations to runtime. Currently it only takes PodCIDR.
The use case is for kubelet to pass configs to runtime. Kubelet holds some config/information which runtime does not have, such as PodCIDR. I expect some of kubelet configurations will gradually move to runtime, but I believe cases like PodCIDR, which dynamically assigned by k8s master, need to stay for a while.
Automatic merge from submit-queue
Allow kuberuntime to get network namespace for not ready sandboxes
Kubelet calls TearDownPod to clean up the network resources for a pod sandbox.
TearDownPod relies on GetNetNS to retrieve network namespace, and the current
implementation makes this impossible for not-ready sandboxes. This change
removes the unnecessary filter to fix this issue.
Kubelet calls TearDownPod to clean up the network resources for a pod sandbox.
TearDownPod relies on GetNetNS to retrieve network namespace, and the current
implementation makes this impossible for not-ready sandboxes. This change
removes the unnecessary filter to fix this issue.
Automatic merge from submit-queue
CRI: Image pullable support in dockershim
For #33189.
The new test `ImageID should be set to the manifest digest (from RepoDigests) when available` introduced in #33014 is failing, because:
1) `docker-pullable://` conversion is not supported in dockershim;
2) `kuberuntime` and `dockershim` is using `ListImages with image name filter` to check whether image presents. However, `ListImages` doesn't support filter with `digest`.
This PR:
1) Change `kuberuntime.IsImagePresent` to use `runtime.ImageStatus` and `dockershim.InspectImage` instead. ***Notice an API change: `ImageStatus` should return `(nil, nil)` for non-existing image.***
2) Add `docker-pullable://` support.
3) Fix `RemoveImage` in dockershim https://github.com/kubernetes/kubernetes/pull/29316.
I've tried myself, the test can pass now.
@yujuhong @feiskyer @yifan-gu
/cc @kubernetes/sig-node
Automatic merge from submit-queue
Add version cache for cri APIVersion
ref https://github.com/kubernetes/kubernetes/issues/29478
1. Added a version cache for `APIVersion()` by using object cache., with ttl=1 min
2. Leaving `Version()` as it is today
Automatic merge from submit-queue
Update godeps for libcontainer+cadvisor
Needed to unblock more progress on pod cgroup.
/cc @vishh @dchen1107 @timstclair
Automatic merge from submit-queue
Kubelet: Use RepoDigest for ImageID when available
```release-note
Use manifest digest (as `docker-pullable://`) as ImageID when available (exposes a canonical, pullable image ID for containers).
```
Previously, we used the docker config digest (also called "image ID"
by Docker) for the value of the `ImageID` field in the container status.
This was not particularly useful, since the config manifest is not
what's used to identify the image in a registry, which uses the manifest
digest instead. Docker 1.12+ always populates the RepoDigests field
with the manifest digests, and Docker 1.10 and 1.11 populate it when
images are pulled by digest.
This commit changes `ImageID` to point to the the manifest digest when
available, using the prefix `docker-pullable://` (instead of
`docker://`)
Related to #32159
Previously, we used the docker config digest (also called "image ID"
by Docker) for the value of the `ImageID` field in the container status.
This was not particularly useful, since the config manifest is not
what's used to identify the image in a registry, which uses the manifest
digest instead. Docker 1.12+ always populates the RepoDigests field
with the manifest digests, and Docker 1.10 and 1.11 populate it when
images are pulled by digest.
This commit changes `ImageID` to point to the the manifest digest when
available, using the prefix `docker-pullable://` (instead of
`docker://`)
Previously, the `InspectImage` method of the Docker interface expected a
"pullable" image ref (name, tag, or manifest digest). If you tried to
inspect an image by its ID (config digest), the inspect would fail to
validate the image against the input identifier. This commit changes
the original method to be named `InspectImageByRef`, and introduces a
new method called `InspectImageByID` which validates that the input
identifier was an image ID.
Automatic merge from submit-queue
Use nodeutil.GetHostIP consistently when talking to nodes
Most of our communications from apiserver -> nodes used
nodutil.GetNodeHostIP, but a few places didn't - and this meant that the
node name needed to be resolvable _and_ we needed to populate valid IP
addresses.
```release-note
The apiserver now uses addresses reported by the kubelet in the Node object's status for apiserver->kubelet communications, rather than the name of the Node object. The address type used defaults to `InternalIP`, `ExternalIP`, and `LegacyHostIP` address types, in that order.
```
Automatic merge from submit-queue
Add sandbox gc minage
Fixes https://github.com/kubernetes/kubernetes/issues/34272.
Fixes https://github.com/kubernetes/kubernetes/issues/33984.
This PR:
1) Change the `GetPodStatus` to get statuses of all containers in a pod instead of only containers belonging to existing sandboxes. This is because sandbox may be removed by GC or by users, kubelet should be able to deal with this case.
2) Change the CRI comment to clarify the timestamp unit (nanosecond).
2) Add MinAge for sandbox GC Policy.
@yujuhong @feiskyer @yifan-gu
/cc @kubernetes/sig-node
Automatic merge from submit-queue
remove testapi.Default.GroupVersion
I'm going to try to take this as a series of mechanicals. This removes `testapi.Default.GroupVersion()` and replaces it with `registered.GroupOrDie(api.GroupName).GroupVersion`.
@caesarxuchao I'm trying to see how much of `pkg/api/testapi` I can remove.
Automatic merge from submit-queue
Revert "Add kubelet awareness to taint tolerant match caculator."
Reverts kubernetes/kubernetes#26501
Original PR was not fully reviewed by @kubernetes/sig-node
cc/ @timothysc @resouer
Automatic merge from submit-queue
Kubelet: Use RepoDigest for ImageID when available
**Release note**:
```release-note
Use manifest digest (as `docker-pullable://`) as ImageID when available (exposes a canonical, pullable image ID for containers).
```
Previously, we used the docker config digest (also called "image ID"
by Docker) for the value of the `ImageID` field in the container status.
This was not particularly useful, since the config manifest is not
what's used to identify the image in a registry, which uses the manifest
digest instead. Docker 1.12+ always populates the RepoDigests field
with the manifest digests, and Docker 1.10 and 1.11 populate it when
images are pulled by digest.
This commit changes `ImageID` to point to the the manifest digest when
available, using the prefix `docker-pullable://` (instead of
`docker://`)
Related to #32159