Automatic merge from submit-queue (batch tested with PRs 40873, 40948, 39580, 41065, 40815)
[CRI] Enable Hostport Feature for Dockershim
Commits:
1. Refactor common hostport util logics and add more tests
2. Add HostportManager which can ADD/DEL hostports instead of a complete sync.
3. Add Interface for retreiving portMappings information of a pod in Network Host interface.
Implement GetPodPortMappings interface in dockerService.
4. Teach kubenet to use HostportManager
Automatic merge from submit-queue (batch tested with PRs 38796, 40823, 40756, 41083, 41105)
Let ReadLogs return when there is a read error.
Fixes a bug in kuberuntime log.
Today, @yujuhong found that once we cancel `kubectl logs -f` with `Ctrl+C`, kuberuntime will keep complaining:
```
27939 kuberuntime_logs.go:192] Failed with err write tcp 10.240.0.4:10250->10.240.0.2:53913: write: broken pipe when writing log for log file "/var/log/pods/5bb76510-ed71-11e6-ad02-42010af00002/busybox_0.log": &{timestamp:{sec:63622095387 nsec:625309193 loc:0x484c440} stream:stdout log:[84 117 101 32 70 101 98 32 32 55 32 50 48 58 49 54 58 50 55 32 85 84 67 32 50 48 49 55 10]}
```
This is because kuberuntime keeps writing to the connection even though it is already closed. Actually, kuberuntime should return and report error whenever there is a writing error.
Ref the [docker code](3a4ae1f661/pkg/stdcopy/stdcopy.go (L159-L167))
I'm still creating the cluster and verifying this fix. Will post the result here after that.
/cc @yujuhong @kubernetes/sig-node-bugs
Automatic merge from submit-queue (batch tested with PRs 38796, 40823, 40756, 41083, 41105)
kubelet/network-cni-plugin: modify the log's info
**What this PR does / why we need it**:
Checking the startup logs of kubelet, i can always find a error like this:
"E1215 10:19:24.891724 2752 cni.go:163] error updating cni config: No networks found in /etc/cni/net.d"
It will appears, neither i use cni network-plugin or not.
After analysis codes, i thought it should be a warn log, because it will not produce any actions like as exit or abort, and just ignored when not any valid plugins exit.
thank you!
Automatic merge from submit-queue (batch tested with PRs 41103, 41042, 41097, 40946, 40770)
Use Clientset interface in KubeletDeps
**What this PR does / why we need it**:
This replaces the Clientset struct with the equivalent interface for the KubeClient injected via KubeletDeps. This is useful for testing and for accessing the Node and Pod status event stream without an API server.
**Special notes for your reviewer**:
Follow up to #4907
**Release note**:
`NONE`
Automatic merge from submit-queue (batch tested with PRs 41103, 41042, 41097, 40946, 40770)
dockershim: set security option separators based on the docker version
Also add a version cache to avoid hitting the docker daemon frequently.
This is part of #38164
Automatic merge from submit-queue (batch tested with PRs 40971, 41027, 40709, 40903, 39369)
Set docker opt separator correctly for SELinux options
This is based on @pmorie's commit from #40179
Automatic merge from submit-queue (batch tested with PRs 40930, 40951)
Fix CRI port forwarding
Websocket support was introduced #33684, which broke the CRI
implementation. This change fixes it.
Automatic merge from submit-queue (batch tested with PRs 40289, 40877, 40879, 39972, 40942)
Rename experimental-cgroups-per-pod flag
**What this PR does / why we need it**:
1. Rename `experimental-cgroups-per-qos` to `cgroups-per-qos`
1. Update hack/local-up-cluster to match `CGROUP_DRIVER` with docker runtime if used.
**Special notes for your reviewer**:
We plan to roll this feature out in the upcoming release. Previous node e2e runs were running with this feature on by default. We will default this feature on for all e2es next week.
**Release note**:
```release-note
Rename --experiemental-cgroups-per-qos to --cgroups-per-qos
```
Automatic merge from submit-queue
CRI: Handle cri in-place upgrade
Fixes https://github.com/kubernetes/kubernetes/issues/40051.
## How does this PR restart/remove legacy containers/sandboxes?
With this PR, dockershim will convert and return legacy containers and infra containers as regular containers/sandboxes. Then we can rely on the SyncPod logic to stop the legacy containers/sandboxes, and the garbage collector to remove the legacy containers/sandboxes.
To forcibly trigger restart:
* For infra containers, we manually set `hostNetwork` to opposite value to trigger a restart (See [here](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L389))
* For application containers, they will be restarted with the infra container.
## How does this PR avoid extra overhead when there is no legacy container/sandbox?
For the lack of some labels, listing legacy containers needs extra `docker ps`. We should not introduce constant performance regression for legacy container cleanup. So we added the `legacyCleanupFlag`:
* In `ListContainers` and `ListPodSandbox`, only do extra `ListLegacyContainers` and `ListLegacyPodSandbox` when `legacyCleanupFlag` is `NotDone`.
* When dockershim starts, it will check whether there are legacy containers/sandboxes.
* If there are none, it will mark `legacyCleanupFlag` as `Done`.
* If there are any, it will leave `legacyCleanupFlag` as `NotDone`, and start a goroutine periodically check whether legacy cleanup is done.
This makes sure that there is overhead only when there are legacy containers/sandboxes not cleaned up yet.
## Caveats
* In-place upgrade will cause kubelet to restart all running containers.
* RestartNever container will not be restarted.
* Garbage collector sometimes keep the legacy containers for a long time if there aren't too many containers on the node. In that case, dockershim will keep performing extra `docker ps` which introduces overhead.
* Manually remove all legacy containers will fix this.
* Should we garbage collect legacy containers/sandboxes in dockershim by ourselves? /cc @yujuhong
* Host port will not be reclaimed for the lack of checkpoint for legacy sandboxes. https://github.com/kubernetes/kubernetes/pull/39903 /cc @freehan
/cc @yujuhong @feiskyer @dchen1107 @kubernetes/sig-node-api-reviews
**Release note**:
```release-note
We should mention the caveats of in-place upgrade in release note.
```
Automatic merge from submit-queue
Optionally avoid evicting critical pods in kubelet
For #40573
```release-note
When feature gate "ExperimentalCriticalPodAnnotation" is set, Kubelet will avoid evicting pods in "kube-system" namespace that contains a special annotation - `scheduler.alpha.kubernetes.io/critical-pod`
This feature should be used in conjunction with the rescheduler to guarantee availability for critical system pods - https://kubernetes.io/docs/admin/rescheduler/
```
Automatic merge from submit-queue (batch tested with PRs 40795, 40863)
Use caching secret manager in kubelet
I just found that this is in my local branch I'm using for testing, but not in master :)
Automatic merge from submit-queue
Add websocket support for port forwarding
#32880
**Release note**:
```release-note
Port forwarding can forward over websockets or SPDY.
```
- adjust ports to int32
- CRI flows the websocket ports as query params
- Do not validate ports since the protocol is unknown
SPDY flows the ports as headers and websockets uses query params
- Only flow query params if there is at least one port query param
Automatic merge from submit-queue
securitycontext: move docker-specific logic into kubelet/dockertools
This change moves the code specific to docker to kubelet/dockertools,
while leaving the common utility functions at its current package
(pkg/securitycontext).
When we deprecate dockertools in the future, the code will be moved to
pkg/kubelet/dockershim instead.
Depending on an exact cluster setup multiple dns may make sense.
Comma-seperated lists of DNS server are quite common as DNS servers
are always plain IPs.
- split out port forwarding into its own package
Allow multiple port forwarding ports
- Make it easy to determine which port is tied to which channel
- odd channels are for data
- even channels are for errors
- allow comma separated ports to specify multiple ports
Add portfowardtester 1.2 to whitelist
Automatic merge from submit-queue (batch tested with PRs 40638, 40742, 40710, 40718, 40763)
move client/record
An attempt at moving client/record to client-go. It's proving very stubborn and needs a lot manual intervention and near as I can tell, no one actually gets any benefit from the sink and source complexity it adds.
@sttts @caesarchaoxu
Automatic merge from submit-queue
kuberuntime: remove the kubernetesManagedLabel label
The CRI shim should be responsible for returning only those
containers/sandboxes created through CRI. Remove this label in kubelet.
Automatic merge from submit-queue (batch tested with PRs 40527, 40738, 39366, 40609, 40748)
pkg/kubelet/dockertools/docker_manager.go: removing unused stuff
This PR removes unused constants and variables. I checked that neither kubernetes nor openshift code aren't using them.
Automatic merge from submit-queue (batch tested with PRs 40392, 39242, 40579, 40628, 40713)
optimize podSandboxChanged() function and fix some function notes
Automatic merge from submit-queue (batch tested with PRs 38443, 40145, 40701, 40682)
fix GetVolumeInUse() function
Since we just want to get volume name info, each volume name just need to added once. desiredStateOfWorld.GetVolumesToMount() will return volume and pod binding info,
if one volume is mounted to several pods, the volume name will be return several times. That is not what we want in this function.
We can add a new function to only get the volume name info or judge whether the volume name is added to the desiredVolumesMap array.
This change moves the code specific to docker to kubelet/dockertools,
while leaving the common utility functions at its current package
(pkg/securitycontext).
When we deprecate dockertools in the future, the code will be moved to
pkg/kubelet/dockershim instead.
Automatic merge from submit-queue (batch tested with PRs 40126, 40565, 38777, 40564, 40572)
docker-CRI: Remove legacy code for non-grpc integration
A minor cleanup to remove the code that is no longer in use to simplify the logic.
Automatic merge from submit-queue (batch tested with PRs 40126, 40565, 38777, 40564, 40572)
Add IsContainerNotFound in kube_docker_client
This PR added `IsContainerNotFound` function in kube_docker_client and changed dockershim to use it.
@yujuhong @freehan
Automatic merge from submit-queue (batch tested with PRs 40046, 40073, 40547, 40534, 40249)
fix a typo in cni log
**What this PR does / why we need it**:
fixes a typo s/unintialized/uninitialized in pkg/kubelet/network/cni/cni.go
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 40239, 40397, 40449, 40448, 40360)
CRI: Work around container create conflict.
Fixes https://github.com/kubernetes/kubernetes/issues/40443.
This PR added a random suffix in the container name when we:
* Failed to create the container because of "Conflict".
* And failed to remove the container because of "No such container".
@yujuhong @feiskyer
/cc @kubernetes/sig-node-bugs
Automatic merge from submit-queue (batch tested with PRs 40239, 40397, 40449, 40448, 40360)
move the discovery and dynamic clients
Moved the dynamic client, discovery client, testing/core, and testing/cache to `client-go`. Dependencies on api groups we don't have generated clients for have dropped out, so federation, kubeadm, and imagepolicy.
@caesarxuchao @sttts
approved based on https://github.com/kubernetes/kubernetes/issues/40363
Automatic merge from submit-queue (batch tested with PRs 40239, 40397, 40449, 40448, 40360)
CRI: use more gogoprotobuf plugins
Generate marshaler/unmarshaler code should help improve the performance.
This addresses #40098
Automatic merge from submit-queue
Delay deletion of pod from the API server until volumes are deleted
Depends on #37228, and will not pass tests until that PR is merged, and this is rebased.
Keeps all kubelet behavior the same, except the kubelet will not make the "Delete" call (kubeClient.Core().Pods(pod.Namespace).Delete(pod.Name, deleteOptions)) until the volumes associated with that pod are removed. I will perform some performance testing so that we better understand the latency impact of this change.
Is kubelet_pods.go the correct file to include the "when can I delete this pod" logic?
cc: @vishh @sjenning @derekwaynecarr
Automatic merge from submit-queue (batch tested with PRs 38739, 40480, 40495, 40172, 40393)
Use fnv hash in the CRI implementation
fnv is more stable than adler. This PR changes CRI implementation to
use fnv for generating container hashes, but leaving the old
implementation (dockertools/rkt). This is because hash is what kubelet
uses to identify a container -- changes to the hash will cause kubelet
to restart existing containers. This is ok for CRI implementation (which
requires a disruptive upgrade already), but not for older implementations.
#40140
Automatic merge from submit-queue (batch tested with PRs 39538, 40188, 40357, 38214, 40195)
Use SecretManager when getting secrets for EnvFrom
Merges crossed in the night which missed this needed change.
Leave the old implementation (dockertools/rkt) untouched so that
containers will not be restarted during kubelet upgrade. For CRI
implementation (kuberuntime), container restart is required for kubelet
upgrade.
Automatic merge from submit-queue
Move remaining *Options to metav1
Primarily delete options, but will remove all internal references to non-metav1 options (except ListOptions).
Still working through it @sttts @deads2k
Automatic merge from submit-queue (batch tested with PRs 39275, 40327, 37264)
dockertools: remove some dead code
Remove `dockerRoot` that's not used anywhere.
Automatic merge from submit-queue
Fix bad time values in kubelet FakeRuntimeService
These values don't affect tests but they can be confusing
for developers looking at the code for reference.
Automatic merge from submit-queue
Optional configmaps and secrets
Allow configmaps and secrets for environment variables and volume sources to be optional
Implements approved proposal c9f881b7bb
Release note:
```release-note
Volumes and environment variables populated from ConfigMap and Secret objects can now tolerate the named source object or specific keys being missing, by adding `optional: true` to the volume or environment variable source specifications.
```
Enforce the following limits:
12kb for total message length in container status
4kb for the termination message path file
2kb or 80 lines (whichever is shorter) from the log on error
Fallback to log output if the user requests it.
Automatic merge from submit-queue
Remove TODOs to refactor kubelet labels
To address #39650 completely.
Remove label refactoring TODOs, we don't need them since CRI rollout is on the way.
Automatic merge from submit-queue (batch tested with PRs 40250, 40134, 40210)
Typo fix: Change logging function to formatting version
**What this PR does / why we need it**:
Slightly broken logging message:
```
I0120 10:56:08.555712 7575 kubelet_node_status.go:135] Deleted old node object %qkubernetes-cit-kubernetes-cr0-0
```
Automatic merge from submit-queue (batch tested with PRs 40232, 40235, 40237, 40240)
move listers out of cache to reduce import tree
Moving the listers from `pkg/client/cache` snips links to all the different API groups from `pkg/storage`, but the dreaded `ListOptions` remains.
@sttts
Automatic merge from submit-queue (batch tested with PRs 37228, 40146, 40075, 38789, 40189)
Cleanup temp dirs
So funny story my /tmp ran out of space running the unit tests so I am cleaning up all the temp dirs we create.
Automatic merge from submit-queue (batch tested with PRs 37228, 40146, 40075, 38789, 40189)
kubelet: storage: teardown terminated pod volumes
This is a continuation of the work done in https://github.com/kubernetes/kubernetes/pull/36779
There really is no reason to keep volumes for terminated pods attached on the node. This PR extends the removal of volumes on the node from memory-backed (the current policy) to all volumes.
@pmorie raised a concern an impact debugging volume related issues if terminated pod volumes are removed. To address this issue, the PR adds a `--keep-terminated-pod-volumes` flag the kubelet and sets it for `hack/local-up-cluster.sh`.
For consideration in 1.6.
Fixes#35406
@derekwaynecarr @vishh @dashpole
```release-note
kubelet tears down pod volumes on pod termination rather than pod deletion
```
Automatic merge from submit-queue (batch tested with PRs 40011, 40159)
dockertools/nsenterexec: fix err shadow
The shadow of err meant the combination of `exec-handler=nsenter` +
`tty` + a non-zero exit code meant that the exit code would be LOST
FOREVER 👻
This isn't all that important since no one really used the nsenter exec
handler as I understand it
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 36693, 40154, 40170, 39033)
make client-go authoritative for pkg/client/restclient
Moves client/restclient to client-go and a util/certs, util/testing as transitives.
Automatic merge from submit-queue (batch tested with PRs 40168, 40165, 39158, 39966, 40190)
dockershim: add support for the 'nsenter' exec handler
This change simply plumbs the kubelet configuration
(--docker-exec-handler) to DockerService.
This fixes#35747.
Automatic merge from submit-queue (batch tested with PRs 40168, 40165, 39158, 39966, 40190)
CRI: upgrade protobuf to v3
For #38854, this PR upgrades CRI protobuf version to v3, and also updated related packages for confirming to new api.
**Release note**:
```
CRI: upgrade protobuf version to v3.
```
The shadow of err meant the combination of `exec-handler=nsenter` +
`tty` + a non-zero exit code meant that the exit code would be LOST
FOREVER 👻
This isn't all that important since no one really used the nsenter exec
handler as I understand it
Automatic merge from submit-queue (batch tested with PRs 39446, 40023, 36853)
Create environment variables from secrets
Allow environment variables to be populated from entire secrets.
**Release note**:
```release-note
Populate environment variables from a secrets.
```