Ensure that kublet marks VolumeInUse before checking if it is Attached.
Also ensures that the attach/detach controller always fetches a fresh
copy of the node object before detach (instead ofKubelet relying on node
informer cache).
Automatic merge from submit-queue
Add support for basic QoS and pod level cgroup management
This PR is a WIP and is tied to this upstream issue #27204
It adds support for creation,deletion and updates of cgroups in Kubernetes.
@vishh PTAL
Please note that the first commit is part of this PR: #27749
cc @kubernetes/sig-node
Signed-off-by: Buddha Prakash <buddhap@google.com>
Automatic merge from submit-queue
Refactor func canRunPod
After refactoring, we only need to check `if pod.Spec.SecurityContext == nil` once. The logic is a bit clearer.
Automatic merge from submit-queue
Fix pkg/kubelet unit tests fail on OSX
use runtime.GOOS for the OperatingSystem and not hardcode it to linux.
Fixes#27730
Automatic merge from submit-queue
[Refactor] QOS to have QOS Class type for QoS classes
This PR adds a QOSClass type and initializes QOSclass constants for the three QoS classes.
It would be good to use this in all future QOS related features.
This would be good to have for the (Pod level cgroups isolation proposal)[https://github.com/kubernetes/kubernetes/pull/26751] that i am working on aswell.
@vishh PTAL
Signed-off-by: Buddha Prakash <buddhap@google.com>
Automatic merge from submit-queue
Kubelet can retrieve host IP even when apiserver has not been contacted
fixes https://github.com/kubernetes/kubernetes/issues/26590, fixes https://github.com/kubernetes/kubernetes/issues/6558
Right now the kubelet expects to get the hostIP from the kubelet's local nodeInfo cache. However, this will be empty if there is no api-server (or the apiServer has not yet been contacted).
In the case of static pods, this change means the downward api can now be used to populate hostIP.
Automatic merge from submit-queue
kubelt: Remove a couple lines of dead code
Presumably that code was added for debugging reasons and never removed. Hopefully.
If it's actually important and there's a good reason to do what looks like a no-op to get pause-the-world behaviour or whatever, I'd hope there'd be a comment.
cc @pwittrock
Automatic merge from submit-queue
Image GC logic should compensate for reserved blocks
Calculating the disk usage based on available bytes instead of usage bytes to account for reserved blocks in image GC
#27169
Automatic merge from submit-queue
rkt: Fix the 'privileged' check when stage1 annotation is provided.
Previously when stage1 annotation is provided, we only checks if
the kubelet allows privileged, which is not useful as that is a global
setting.
Instead, we should check if the pod has explicitly set the privileged
security context to 'true'.
cc @kubernetes/sig-rktnetes @kubernetes/sig-node
Automatic merge from submit-queue
Bump minimum API version for docker to 1.21
The corresponding docker version is 1.9.x. Dropping support for docker 1.8.
/cc @kubernetes/sig-node
Previously when stage1 annotation is provided, we only checks if
the kubelet allows privileged, which is not useful as that is a global
setting.
Instead, we should check if the pod has explicitly set the privileged
security context to 'true'.
Automatic merge from submit-queue
kubenet: Fix host port for rktnetes.
Because rkt pod runs after plugin.SetUpPod() is called, so
getRunningPods() does not return the newly created pod, which
causes the hostport iptable rules to be missing for this new pod.
cc @dcbw @freehan
A follow up fix for https://github.com/kubernetes/kubernetes/pull/27878#issuecomment-227898936
Because rkt pod runs after plugin.SetUpPod() is called, so
getRunningPods() does not return the newly created pod, which
causes the hostport iptable rules to be missing for this new pod.
Automatic merge from submit-queue
rkt: Refactor grace termination period.
Add `TimeoutStopSec` service option to support grace termination.
Found we can improve the grace-period-termination by adding a systemd service option.
cc @kubernetes/sig-rktnetes
Use the generic runtime method to get the netns path. Also
move reading the container IP address into cni (based off kubenet)
instead of having it in the Docker manager code. Both old and new
methods use nsenter and /sbin/ip and should be functionally
equivalent.
Automatic merge from submit-queue
Remove pod mutation for volumes annotated with supplemental groups
Removes the pod mutation added in #20490 -- partially resolves#27197 from the standpoint of making the feature inactive in 1.3. Our plan is to make this work correctly in 1.4.
@kubernetes/sig-storage
Automatic merge from submit-queue
Kubelet Volume Manager Wait For Attach Detach Controller and Backoff on Error
* Closes https://github.com/kubernetes/kubernetes/issues/27483
* Modified Attach/Detach controller to report `Node.Status.AttachedVolumes` on successful attach (unique volume name along with device path).
* Modified Kubelet Volume Manager wait for Attach/Detach controller to report success before proceeding with attach.
* Closes https://github.com/kubernetes/kubernetes/issues/27492
* Implemented an exponential backoff mechanism for for volume manager and attach/detach controller to prevent operations (attach/detach/mount/unmount/wait for controller attach/etc) from executing back to back unchecked.
* Closes https://github.com/kubernetes/kubernetes/issues/26679
* Modified volume `Attacher.WaitForAttach()` methods to uses the device path reported by the Attach/Detach controller in `Node.Status.AttachedVolumes` instead of calling out to cloud providers.
Modify attach/detach controller to keep track of volumes to report
attached in Node VolumeToAttach status.
Modify kubelet volume manager to wait for volume to show up in Node
VolumeToAttach status.
Implement exponential backoff for errors in volume manager and attach
detach controller
This enables rkt to use cached stage1 image instead of unpacking the
stage1 image every time for every pod.
After this change, users need to preload the stage1 images in order to
enable rkt to find the stage1 image with the name specified by this flag.
Automatic merge from submit-queue
Logging for OutOfDisk when file system info is not available
#26566
1. Adding logs for file system info being not available.
2. Reporting outOfDisk when file system info is not available.
Automatic merge from submit-queue
Filter seccomp profile path from malicious .. and /
Without this patch with `localhost/<some-releative-path>` as seccomp profile one can load any file on the host, e.g. `localhost/../../../../dev/mem` which is not healthy for the kubelet.
/cc @jfrazelle
Unit tests depend on https://github.com/kubernetes/kubernetes/pull/26710.
Automatic merge from submit-queue
kubelet/kubenet: split hostport handling into separate module
This pulls the hostport functionality of kubenet out into a separate module so that it can be more easily tested and potentially used from other code (maybe CNI, maybe downstream consumers like OpenShift, etc). Couldn't find a mock iptables so I wrote one, but I didn't look very hard.
@freehan @thockin @bprashanth
Automatic merge from submit-queue
Revert revert of downward api node defaults
Reverts the revert of https://github.com/kubernetes/kubernetes/pull/27439Fixes#27062
@dchen1107 - who at Google can help debug why this caused issues with GKE infrastructure but not GCE merge queue?
/cc @wojtek-t @piosz @fgrzadkowski @eparis @pmorie
If the mount operation exceeds the timeout, it will return an error and the
pod worker will retry in the next sync (10s or less). Compared with the
original value (i.e., 10 minutes), this frees the pod worker sooner to process
pod updates, if there are any.
This commit adds a new volume manager in kubelet that synchronizes
volume mount/unmount (and attach/detach, if attach/detach controller
is not enabled).
This eliminates the race conditions between the pod creation loop
and the orphaned volumes loops. It also removes the unmount/detach
from the `syncPod()` path so volume clean up never blocks the
`syncPod` loop.
Automatic merge from submit-queue
Let kubelet log the DeletionTimestamp if it's not nil in update
This helps to debug if it's the kubelet to blame when a pod is not deleted.
Example output:
```
SyncLoop (UPDATE, "api"): "redis-master_default(c6782276-2dd4-11e6-b874-64510650ab1c):DeletionTimestamp=2016-06-08T23:58:12Z"
```
ref #26290
cc @Random-Liu
Automatic merge from submit-queue
Update reason_cache.go, Get method operate lru cache not threadsafe
The reason_cache wrapped lru cache , lru cache modies linked list even for a get, should use WLock for both read and write
Automatic merge from submit-queue
Fix docker api version in kubelet
There are two variables `dockerv110APIVersion` and `dockerV110APIVersion` with
the same purpose, but different values. Remove the incorrect one and fix usage
in the file.
/cc @dchen1107 @Random-Liu
Automatic merge from submit-queue
Sets IgnoreUnknown=1 in CNI_ARGS
```release-note
release-note-none
```
K8 uses CNI_ARGS to pass pod namespace, name and infra container
id to the CNI network plugin. CNI logic will throw an error
if these args are not known to it, unless the user specifies
IgnoreUnknown as part of CNI_ARGS. This PR sets IgnoreUnknown=1
to prevent the CNI logic from erroring and blocking pod setup.
https://github.com/appc/cni/pull/158https://github.com/appc/cni/issues/126
Since appc requires gid to be non-empty today (https://github.com/appc/spec/issues/623),
we have to error out when gid is empty instead of using the root gid.
Automatic merge from submit-queue
rkt: Replace 'journalctl' with rkt's GetLogs() API.
This replaced the `journactl` shell out with rkt's GetLogs() API.
Fixes#26997
To make this fully work, we need rkt to have this patch #https://github.com/coreos/rkt/pull/2763
cc @kubernetes/sig-node @euank @alban @iaguis @jonboulle
Automatic merge from submit-queue
rkt: Do not run rkt pod inside a pre-created netns when network plugin is no-op
This fixed a panic where the returned pod network status is nil. (Fix#26540)
Also this makes lkvm stage1 able to run inside a user defined network, where the network name needs to be 'rkt.kubernetes.io'. A temporal solution to solve the network issue for lkvm stage1.
Besides, I fixed minor issues such as passing the wrong pod UID when cleaning up the netns file.
/cc @euank @pskrzyns @jellonek @kubernetes/sig-node
I tested with no networkplugin locally, works fine.
As a reminder, we need to document this in the release.https://github.com/kubernetes/kubernetes/issues/26201
This fixed a panic where the returned pod network status is nil.
Also this makes lkvm stage1 able to run inside a user defined
network, where the network name needs to be 'rkt.kubernetes.io'.
Also fixed minor issues such as passing the wrong pod UID, ignoring
logging errors.
Automatic merge from submit-queue
rkt: Fix incomplete selinux context string when the option is partial.
Fix "EmptyDir" e2e tests failures caused by #https://github.com/kubernetes/kubernetes/pull/24901
As mentioned in https://github.com/kubernetes/kubernetes/pull/24901#discussion_r61372312
We should apply the selinux context of the rkt data directory (/var/lib/rkt) when users do not specify all the selinux options.
Due to my fault, the change was missed during rebase, thus caused the regression.
After applying this PR, the e2e tests passed.
```
$ go run hack/e2e.go -v -test --test_args="--ginkgo.dryRun=false --ginkgo.focus=EmptyDir"
...
Ran 19 of 313 Specs in 199.319 seconds
SUCCESS! -- 19 Passed | 0 Failed | 0 Pending | 294 Skipped PASS
```
BTW, the test is removed because the `--no-overlay=true` flag will only be there on non-coreos distro.
cc @euank @kubernetes/sig-node