Automatic merge from submit-queue
Fixed TODO: move predicate check into a pod admitter
refractoring AdmitPod func to move predicate check into a pod admitter
Automatic merge from submit-queue
Redundant code process for container_mananger start
1. need not sum the total numEnsureStateFuncs
2. numEnsureStateFuncs should > 0, otherwise, calculate numEnsureStateFuncs would be not neccessary
Automatic merge from submit-queue
Move CSR helper for nodes out of kubelet
**What this PR does / why we need it**:
Including `cmd/kubelet/app` in kubeadm causes flag leakage.
Namelly, the problem is with `pkg/credentialprovider/gcp`, which
leaks `--google-json-key` and changing the behaviour of `init()`
doesn't sound reasonable, given kubelet is the only one who uses
this packages and obviously the flag is part of the functionality.
The helper is already generic enough, it has already been exported
and works well for kubeadm, so moving it should be fine.
**Special notes for your reviewer**: cc @mikedanese @yifan-gu @gtank
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Kubelet: add KillPod for new runtime API
This PR adds implements of KillPod for new runtime API.
CC @yujuhong @Random-Liu @kubernetes/sig-node @kubernetes/sig-rktnetes
Automatic merge from submit-queue
Kubelet: rename CreatePodSandbox to RunPodSandbox in CRI
As @yifan-gu pointed out in #31847, the name `CreatePodSandbox` doesn't reflect that the sandbox is running after the API succeeds. This PR renames it to `RunPodSandbox` to make this clear.
CC @yujuhong @yifan-gu @kubernetes/sig-node
Automatic merge from submit-queue
Kubelet: implement GetPodContainerID for new runtime API
Add implements of `GetPodContainerID` interface for new runtime API.
CC @yujuhong @kubernetes/sig-node @kubernetes/sig-rktnetes
Automatic merge from submit-queue
correct imagefs inodes value in kubelet summary stats
Fix https://github.com/kubernetes/kubernetes/issues/31501
Correct get imagefs inodes value from imageFsInfo.Inodes in kubelet summary stats api.
@derekwaynecarr
Automatic merge from submit-queue
Avoid unnecessary status update when there is no corresponding mirror pod
Fixes https://github.com/kubernetes/kubernetes/issues/32191.
This PR changes status manager to skip update when there is no mirror pod for a static pod.
We need this because:
1) When static pod terminates and mirror pod is deleted, this will avoid extra `syncPod`.
2) During mirror pod creation and recreation, this will avoid unnecessary `syncPod`.
Mark P1 to match the original issue.
@wojtek-t @yujuhong
/cc @kubernetes/sig-node
Automatic merge from submit-queue
Instruct PLEG to detect pod sandbox state changes
This PR adds a Sandboxes list in `kubecontainer.Pod`, so that PLEG can check
sandbox changes using `GetPods()` . The sandboxes are treated as regular
containers (type `kubecontainer.Container`) for now to avoid additional
changes in PLEG.
/cc @feiskyer @yifan-gu @euank
Including `cmd/kubelet/app` in kubeadm causes flag leakage.
Namelly, the problem is with `pkg/credentialprovider/gcp`, which
leaks `--google-json-key` and changing the behaviour of `init()`
doesn't sound reasonable, given kubelet is the only one who uses
this packages and obviously the flag is part of the functionality.
The helper is already generic enough, it has already been exported
and works well for kubeadm, so moving it should be fine.
Automatic merge from submit-queue
Rename ConnectToDockerOrDie to CreateDockerClientOrDie
This function does not actually attempt to connect to the docker daemon, it just creates a client object that can be used to do so later. The old name was confusing, as it implied that a failure to touch the docker daemon could cause program termination (rather than just a failure to create the client).
Automatic merge from submit-queue
Log an event when container runtime exceeds grace-period during eviction
While debugging flakes in eviction, I encountered scenarios where the container run-time did not evict a pod within the allowed grace period. This could result in situations where a BE pod would not get killed fast enough and therefore a Bu pod was killed next (assuming there were no other BE pods)
/cc @mtaufen @vishh
Automatic merge from submit-queue
Kubelet: implement GetPodStatus for new runtime API
Implement `GetPodStatus()` for new runtime API. Part of #28789 .
CC @yujuhong @Random-Liu @dchen1107
Automatic merge from submit-queue
rkt: Update kube-up rkt version to v1.14.0
cc @kubernetes/sig-rktnetes
This should have been included in #31286 (whoops).
This is a bugfix that I propose for v1.4 inclusion.
Automatic merge from submit-queue
Remove net.ipv4.tcp_max_syn_backlog from sysctl whitelist
Remove `net.ipv4.tcp_max_syn_backlog` from sysctl whitelist. This is not namespaced in today's kernels, but must be set on node-level.
Having this on the whitelist, wouldn't harm because the kernel only offers namespaced `net.*` sysctls in the `/proc/sys` tree. But having a sysctl on the whitelist, which cannot be used, doesn't make sense either.
#### 1.4 justification:
- Risk: the whitelist is a published API. We shouldn't have sysctls on there which do not work.
- Rollback: nothing should depend on this behavior.
- Cost: the cost of this is relatively low, as no pod with this sysctl will launch.
Automatic merge from submit-queue
kubelet_test.go: use assertions
Switch most of the tests in this file to using the assert library
(`github.com/stretchr/testify/assert`) in the tests for better readability and
less code in general.
Automatic merge from submit-queue
Update node status instead of node in kubelet
#31730 added code for the Kubelet to reconcile the existing and new nodes in order to annotate existing nodes with the annotation for controller-managed attach-detach. However, it used `Update` instead of `UpdateStatus`, which changes the operations the node's token needs to be permitted to use. Using `UpdateStatus` is functionally equivalent and maintains the same set of permissions nodes need to have today.
I'm adding this with the 1.4 milestone because it is a follow-on to a 1.4 PR and fixes a downstream bug (which won't surface to Kube).
Switch most of the tests in this file to using the assert/require library
(in `github.com/stretchr/testify`) in the tests for better readability and
less code in general.
Automatic merge from submit-queue
Make it possible to enable controller-managed attach-detach on existing nodes
Fixes#31673. Now, if a node already exists with the given name on Kubelet startup, the Kubelet will reconcile the value of the controller-managed-attach-detach annotation so that existing nodes can have this feature turned on and off by changing the Kubelet configuration.
cc @kubernetes/sig-storage @kubernetes/rh-cluster-infra
PLEG will treat them as if they are regular containers and detect changes the
same manner. Note that this makes an assumption that container IDs will not
collide with the podsandbox IDs.
Automatic merge from submit-queue
Include security options in the container created event
New container creation events look like:
```
Created container with docker id /k8s_bar2.a4; Security:[seccomp=sub/subtest(md5:07c9bcb4db631f7ca191d6e0bca49f76)]
Created container with docker id /k8s_bar2.a4; Security:[seccomp=unconfined apparmor=foo-profile]
```
The goal is to provide enough information to confirm that the requseted security constraints were honored.
For https://github.com/kubernetes/kubernetes/issues/31284
/cc @dchen1107 @thockin @jfrazelle @pweil- @pmorie
---
Justification for v1.4:
- Risk: low. This appends some additional information to a human readable message. A bug here would probably not break any functionality
- Roll-back: I don't anticipate any more changes to this area of the code. No functionality depends on this change.
- Cost of not including: Users don't get any (positive) confirmation that the AppArmor or Seccomp profile they requested were actually enabled.
Automatic merge from submit-queue
Add log message in Kubelet when controller attach/detach is enabled
Adds a message to the Kubelet log indicating whether controller attach/detach is enabled for a node.
cc @kubernetes/sig-storage
Automatic merge from submit-queue
Set imagefs rank and reclaim functions when nodefs+imagefs share comm…
Fixes#31192
I decided that the behavior should match the current output of the kubelet summary API. With no dedicated imagefs, the ranking and reclaim functions will match the nodefs ranking and reclaim functions.
/cc @ronnielai @vishh
This function does not actually attempt to connect to the docker daemon,
it just creates a client object that can be used to do so later. The old
name was confusing, as it implied that a failure to touch the docker daemon
could cause program termination (rather than just a failure to create the
client).
Automatic merge from submit-queue
Fix hang/websocket timeout when streaming container log with no content
When streaming and following a container log, no response headers are sent from the kubelet `containerLogs` endpoint until the first byte of content is written to the log. This propagates back to the API server, which also will not send response headers until it gets response headers from the kubelet. That includes upgrade headers, which means a websocket connection upgrade is not performed and can time out.
To recreate, create a busybox pod that runs `/bin/sh -c 'sleep 30 && echo foo && sleep 10'`
As soon as the pod starts, query the kubelet API:
```
curl -N -k -v 'https://<node>:10250/containerLogs/<ns>/<pod>/<container>?follow=true&limitBytes=100'
```
or the master API:
```
curl -N -k -v 'http://<master>:8080/api/v1/<ns>/pods/<pod>/log?follow=true&limitBytes=100'
```
In both cases, notice that the response headers are not sent until the first byte of log content is available.
This PR:
* does a 0-byte write prior to handing off to the container runtime stream copy. That commits the response header, even if the subsequent copy blocks waiting for the first byte of content from the log.
* fixes a bug with the "ping" frame sent to websocket streams, which was not respecting the requested protocol (it was sending a binary frame to a websocket that requested a base64 text protocol)
* fixes a bug in the limitwriter, which was not propagating 0-length writes, even before the writer's limit was reached
Automatic merge from submit-queue
rkt: Force `rkt fetch` to fetch from remote to conform the image pull policy.
Fix https://github.com/kubernetes/kubernetes/issues/27646
Use `--no-store` option for `rkt fetch` to force it to fetch from remote.
However, `--no-store` will fetch the remote image regardless of whether the content of the image has changed or not.
This causes performance downgrade when the image tag is ':latest' and the image pull policy is 'always'.
The issue is tracked in https://github.com/coreos/rkt/issues/2937.
Automatic merge from submit-queue
Filter internal Kubernetes labels from Prometheus metrics
**What this PR does / why we need it**:
Kubernetes uses Docker labels as storage for some internal labels. The
majority of these labels are not meaningful metric labels and a few of
them are even harmful as they're not static and cause wrong aggregation
results.
This change provides a custom labels func to only attach meaningful
labels to cAdvisor exported metrics.
**Which issue this PR fixes**
google/cadvisor#1312
**Special notes for your reviewer**:
Depends on google/cadvisor#1429. Once that is merged, I'll update the vendor update commit.
**Release note**:
```release-note
Remove environment variables and internal Kubernetes Docker labels from cAdvisor Prometheus metric labels.
Old behavior:
- environment variables explicitly whitelisted via --docker-env-metadata-whitelist were exported as `container_env_*=*`. Default is zero so by default non were exported
- all docker labels were exported as `container_label_*=*`
New behavior:
- Only `container_name`, `pod_name`, `namespace`, `id`, `image`, and `name` labels are exposed
- no environment variables will be exposed ever via /metrics, even if whitelisted
```
---
Given that we have full control over the exported label set, I shortened the pod_name, pod_namespace and container_name label names. Below an example of the change (reformatted for readability).
```
# BEFORE
container_cpu_cfs_periods_total{
container_label_io_kubernetes_container_hash="5af8c3b4",
container_label_io_kubernetes_container_name="sync",
container_label_io_kubernetes_container_restartCount="1",
container_label_io_kubernetes_container_terminationMessagePath="/dev/termination-log",
container_label_io_kubernetes_pod_name="popularsearches-web-3165456836-2bfey",
container_label_io_kubernetes_pod_namespace="popularsearches",
container_label_io_kubernetes_pod_terminationGracePeriod="30",
container_label_io_kubernetes_pod_uid="6a291e48-47c4-11e6-84a4-c81f66bdf8bd",
id="/docker/68e1f15353921f4d6d4d998fa7293306c4ac828d04d1284e410ddaa75cf8cf25",
image="redacted.com/popularsearches:42-16-ba6bd88",
name="k8s_sync.5af8c3b4_popularsearches-web-3165456836-2bfey_popularsearches_6a291e48-47c4-11e6-84a4-c81f66bdf8bd_c02d3775"
} 72819
# AFTER
container_cpu_cfs_periods_total{
container_name="sync",
pod_name="popularsearches-web-3165456836-2bfey",
namespace="popularsearches",
id="/docker/68e1f15353921f4d6d4d998fa7293306c4ac828d04d1284e410ddaa75cf8cf25",
image="redacted.com/popularsearches:42-16-ba6bd88",
name="k8s_sync.5af8c3b4_popularsearches-web-3165456836-2bfey_popularsearches_6a291e48-47c4-11e6-84a4-c81f66bdf8bd_c02d3775"
} 72819
```
Feedback requested on:
* Label names. Other suggestions? Should we keep these very long ones?
* Do we need to export io.kubernetes.pod.uid? It makes working with the metrics a bit more complicated and the pod name is already unique at any time (but not over time). The UID is aslo part of `name`.
As discussed with @timstclair, this should be added to v1.4 as the current labels are harmful.
PTAL @jimmidyson @fabxc @vishh
This refactor removes the legacy KubeletConfig object and adds a new
KubeletDeps object, which contains injected runtime objects and
separates them from static config. It also reduces NewMainKubelet to two
arguments: a KubeletConfiguration and a KubeletDeps.
Some mesos and kubemark code was affected by this change, and has been
modified accordingly.
And a few final notes:
KubeletDeps:
KubeletDeps will be a temporary bin for things we might consider
"injected dependencies", until we have a better dependency injection
story for the Kubelet. We will have to discuss this eventually.
RunOnce:
We will likely not pull new KubeletConfiguration from the API server
when in runonce mode, so it doesn't make sense to make this something
that can be configured centrally. We will leave it as a flag-only option
for now. Additionally, it is increasingly looking like nobody actually uses the
Kubelet's runonce mode anymore, so it may be a candidate for deprecation
and removal.
Automatic merge from submit-queue
rkt: Improve support for privileged pod (pod whose all containers are privileged)
Fix https://github.com/kubernetes/kubernetes/issues/31100
This takes advantage of https://github.com/coreos/rkt/pull/2983 . By appending the new `--all-run` insecure-options to `rkt run-prepared` command when all the containers are privileged. The pod now gets more privileged power.
Automatic merge from submit-queue
Add sysctl support
Implementation of proposal https://github.com/kubernetes/kubernetes/pull/26057, feature https://github.com/kubernetes/features/issues/34
TODO:
- [x] change types.go
- [x] implement docker and rkt support
- [x] add e2e tests
- [x] decide whether we want apiserver validation
- ~~[ ] add documentation~~: api docs exist. Existing PodSecurityContext docs is very light and links back to the api docs anyway: 6684555ed9/docs/user-guide/security-context.md
- [x] change PodSecurityPolicy in types.go
- [x] write admission controller support for PodSecurityPolicy
- [x] write e2e test for PodSecurityPolicy
- [x] make sure we are compatible in the sense of https://github.com/kubernetes/kubernetes/blob/master/docs/devel/api_changes.md
- [x] test e2e with rkt: it only works with kubenet, not with no-op network plugin. The later has no sysctl support.
- ~~[ ] add RunC implementation~~ (~~if that is already in kube,~~ it isn't)
- [x] update whitelist
- [x] switch PSC fields to annotations
- [x] switch PSP fields to annotations
- [x] decide about `--experimental-whitelist-sysctl` flag to be additive or absolute
- [x] decide whether to add a sysctl node whitelist annotation
### Release notes:
```release-note
The pod annotation `security.alpha.kubernetes.io/sysctls` now allows customization of namespaced and well isolated kernel parameters (sysctls), starting with `kernel.shm_rmid_forced`, `net.ipv4.ip_local_port_range`, `net.ipv4.tcp_max_syn_backlog` and `net.ipv4.tcp_syncookies` for Kubernetes 1.4.
The pod annotation `security.alpha.kubernetes.io/unsafeSysctls` allows customization of namespaced sysctls where isolation is unclear. Unsafe sysctls must be enabled at-your-own-risk on the kubelet with the `--experimental-allowed-unsafe-sysctls` flag. Future versions will improve on resource isolation and more sysctls will be considered safe.
```
Automatic merge from submit-queue
Increase request timeout based on termination grace period
When terminationGracePeriodSeconds is set to > 2 minutes (which is
the default request timeout), ContainerStop() times out at 2 minutes.
We should check the timeout being passed in and bump up the
request timeout if needed.
Fixes#31219
Automatic merge from submit-queue
Kubelet code move: volume / util
Addresses some odds and ends that I apparently missed earlier. Preparation for kubelet code-move ENDGAME.
cc @kubernetes/sig-node
Automatic merge from submit-queue
Kubelet: implement GetNetNS for new runtime api
Kubelet: implement GetNetNS for new runtime api.
CC @yujuhong @thockin @kubernetes/sig-node @kubernetes/sig-rktnetes