Automatic merge from submit-queue
Kubelet: add --container-runtime-endpoint and --image-service-endpoint
Flag `--container-runtime-endpoint` (overrides `--container-runtime`) is introduced to identify the unix socket file of the remote runtime service. And flag `--image-service-endpoint` is introduced to identify the unix socket file of the image service.
This PR is part of #28789 Milestone 0.
CC @yujuhong @Random-Liu
Automatic merge from submit-queue
Implement TLS bootstrap for kubelet using `--experimental-bootstrap-kubeconfig` (2nd take)
Ref kubernetes/features#43 (comment)
cc @gtank @philips @mikedanese @aaronlevy @liggitt @deads2k @errordeveloper @justinsb
Continue on the older PR https://github.com/kubernetes/kubernetes/pull/30094 as there are too many comments on that one and it's not loadable now.
Automatic merge from submit-queue
Unblock iterative development on pod-level cgroups
In order to allow forward progress on this feature, it takes the commits from #28017#29049 and then it globally disables the flag that allows these features to be exercised in the kubelet. The flag can be re-added to the kubelet when its actually ready.
/cc @vishh @dubstack @kubernetes/rh-cluster-infra
Add --bootstrap-kubeconfig flag to kubelet. If the flag is non-empty
and --kubeconfig doesn't exist, then the kubelet will use the bootstrap
kubeconfig to create rest client and generate certificate signing request
to request a client cert from API server.
Once succeeds, the result cert will be written down to
--cert-dir/kubelet-client.crt, and the kubeconfig will be populated with
certfile, keyfile path pointing to the result certificate file, key file.
(The key file is generated before creating the CSR).
New flag --container-runtime-endpoint (overrides --container-runtime)
is introduced to kubelet which identifies the unix socket file of
the remote runtime service. And new flag --image-service-endpoint is
introduced to kubelet which identifies the unix socket file of the
image service.
Automatic merge from submit-queue
Revert "Revert "syncNetworkUtil in kubelet and fix loadbalancerSourceRange on GCE
Reverts kubernetes/kubernetes#30729
Automatic merge from submit-queue
Add note: kubelet manages only k8s containers.
Kubelet wrote log when accesing container which was not created in k8s, what could confuse users. That's why we added note about it in documentation and lowered log level of the message to 5.
Here is example of the message:
```
> Apr 19 11:50:32 openshift-114.lab.sjc.redhat.com atomic-openshift-node[9551]:
I0419 11:50:32.194020 9600 docker.go:363]
Docker Container: /tiny_babbage is not managed by kubelet.
```
bug 1328441
Bugzilla link https://bugzilla.redhat.com/show_bug.cgi?id=1328441
Automatic merge from submit-queue
syncNetworkUtil in kubelet and fix loadbalancerSourceRange on GCE
fixes: #29997#29039
@yujuhong Can you take a look at the kubelet part?
@girishkalele KUBE-MARK-DROP is the chain for dropping connections. Marked connection will be drop in INPUT/OUTPUT chain of filter table. Let me know if this is good enough for your use case.
Automatic merge from submit-queue
Add volume reconstruct/cleanup logic in kubelet volume manager
Currently kubelet volume management works on the concept of desired
and actual world of states. The volume manager periodically compares the
two worlds and perform volume mount/unmount and/or attach/detach
operations. When kubelet restarts, the cache of those two worlds are
gone. Although desired world can be recovered through apiserver, actual
world can not be recovered which may cause some volumes cannot be cleaned
up if their information is deleted by apiserver. This change adds the
reconstruction of the actual world by reading the pod directories from
disk. The reconstructed volume information is added to both desired
world and actual world if it cannot be found in either world. The rest
logic would be as same as before, desired world populator may clean up
the volume entry if it is no longer in apiserver, and then volume
manager should invoke unmount to clean it up.
Fixes https://github.com/kubernetes/kubernetes/issues/27653
Currently kubelet volume management works on the concept of desired
and actual world of states. The volume manager periodically compares the
two worlds and perform volume mount/unmount and/or attach/detach
operations. When kubelet restarts, the cache of those two worlds are
gone. Although desired world can be recovered through apiserver, actual
world can not be recovered which may cause some volumes cannot be cleaned
up if their information is deleted by apiserver. This change adds the
reconstruction of the actual world by reading the pod directories from
disk. The reconstructed volume information is added to both desired
world and actual world if it cannot be found in either world. The rest
logic would be as same as before, desired world populator may clean up
the volume entry if it is no longer in apiserver, and then volume
manager should invoke unmount to clean it up.
Automatic merge from submit-queue
[kubelet] Introduce --protect-kernel-defaults flag to make the tunable behaviour configurable
Let's make the default behaviour of kernel tuning configurable. The default behaviour is kept modify as has been so far.
Automatic merge from submit-queue
Check for CAP_SYS_ADMIN in Kubelet
Comment from nsenter_mount.go header:
The Kubelet process must have CAP_SYS_ADMIN (required by nsenter); at
the present, this effectively means that the kubelet is running in a
privileged container.
Related to #26093
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.kubernetes.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.kubernetes.io/reviews/kubernetes/kubernetes/30176)
<!-- Reviewable:end -->
Also provide a new --pod-manifest-path flag and deprecate the old
--config one.
This field holds the location of a manifest file or directory of manifest
files for pods the Kubelet is supposed to run. The name of the field
should reflect that purpose.
The Kubelet process must have CAP_SYS_ADMIN, which implies that
the kubelet process must be either running as root or in a privileged
container. Make this check early in the startup sequence and bail out
if necessary.
Related to #26093
Automatic merge from submit-queue
Refactoring runner resource container linedelimiter to it's own pkg
Continuing my work ref #15634
Anyone is ok to review this fix.
Automatic merge from submit-queue
optimise some code style in server.go
The PR modified some code style for authPathClientConfig and parseReservation function in server.go.
Automatic merge from submit-queue
Judge the cloud isn't nil before use it in server.go
The PR add a judgement for the cloud before use it, because cloudprovider.InitCloudProvider maybe return nil for the cloud.
Move SystemReserved and KubeReserved into KubeletConfiguration struct
Convert int64 to int32 for some external type fields so they match internal ones
tLS* to tls* for JSON field names
Fix dependency on removed options.AutoDetectCloudProvider
Change floats in KubeletConfiguration API to ints
Update external KubeletConfiguration type
Add defaults for new KubeletConfiguration fields
Modify some defaults to match upstream settings
Add/rename some conversion functions
Updated codegen
Fixed typos
Mike Danese caught that s.NodeLabels wasn't allocated, fix on line 118
of cmd/kubelet/app/options/options.go.
Provide list of valid sources in comment for HostNetworkSources field
Automatic merge from submit-queue
Enable memory eviction by default
```release-note
Enable memory based pod evictions by default on the kubelet.
Trigger pod eviction when available memory falls below 100Mi.
```
See: https://github.com/kubernetes/kubernetes/issues/28552
/cc @kubernetes/rh-cluster-infra @kubernetes/sig-node
Automatic merge from submit-queue
[kubelet] Allow opting out of automatic cloud provider detection in kubelet. By default kubelet will auto-detect cloud providers
fixes#28231
This PR contains Kubelet changes to enable attach/detach controller control.
* It introduces a new "enable-controller-attach-detach" kubelet flag to
enable control by controller. Default enabled.
* It removes all references "SafeToDetach" annoation from controller.
* It adds the new VolumesInUse field to the Node Status API object.
* It modifies the controller to use VolumesInUse instead of SafeToDetach
annotation to gate detachment.
* There is a bug in node-problem-detector that causes VolumesInUse to
get reset every 30 seconds. Issue https://github.com/kubernetes/node-problem-detector/issues/9
opened to fix that.
Automatic merge from submit-queue
Setting TLS1.2 minimum because TLS1.0 and TLS1.1 are vulnerable
TLS1.0 is known as vulnerable since it can be downgraded to SSL
https://blog.varonis.com/ssl-and-tls-1-0-no-longer-acceptable-for-pci-compliance/
TLS1.1 can be vulnerable if cipher RC4-SHA is used, and in Kubernetes it is, you can check it with
`
openssl s_client -cipher RC4-SHA -connect apiserver.k8s.example.com:443
`
https://www.globalsign.com/en/blog/poodle-vulnerability-expands-beyond-sslv3-to-tls/
Test suites like Qualys are reporting this Kubernetes issue as a level 3 vulnerability, they recommend to upgrade to TLS1.2 that is not affected, quoting Qualys:
`
RC4 should not be used where possible. One reason that RC4 was still being used was BEAST and Lucky13 attacks against CBC mode ciphers in
SSL and
TLS. However, TLSv 1.2 or later address these issues.
`
Automatic merge from submit-queue
Use pause image depending on the server's platform when testing
Removed all pause image constant strings, now the pause image is chosen by arch. Part of the effort of making e2e arch-agnostic.
The pause image name and version is also now only in two places, and it's documented to bump both
Also removed "amd64" constants in the code. Such constants should be replaced by `runtime.GOARCH` or by looking up the server platform
Fixes: #22876 and #15140
Makes it easier for: #25730
Related: #17981
This is for `v1.3`
@ixdy @thockin @vishh @kubernetes/sig-testing @andyzheng0831 @pensu
Automatic merge from submit-queue
vSphere Volume Plugin Implementation
This PR implements vSphere Volume plugin support in Kubernetes (ref. issue #23932).
Automatic merge from submit-queue
kubelet/cadvisor: Refactor cadvisor disk stat/usage interfaces.
basically
1) cadvisor struct will know what runtime the kubelet is, passed in via additional argument to New()
2) rename cadvisor wrapper function to DockerImagesFsInfo() to ImagesFsInfo() and have linux implementation choose a label based on the runtime inside the cadvisor struct
2a) mock/fake/unsupported modified to take the same additional argument in New()
3) kubelet's wrapper for the cadvisor wrapper is renamed in parallel
4) make all tests use new interface
Automatic merge from submit-queue
Add support for limiting grace period during soft eviction
Adds eviction manager support in kubelet for max pod graceful termination period when a soft eviction is met.
```release-note
Kubelet evicts pods when available memory falls below configured eviction thresholds
```
/cc @vishh
Automatic merge from submit-queue
Use protobufs by default to communicate with apiserver (still store JSONs in etcd)
@lavalamp @kubernetes/sig-api-machinery
Automatic merge from submit-queue
Add support for PersistentVolumeClaim in Attacher/Detacher interface
The attach detach interface does not support volumes which are referenced through PVCs. This PR adds that support
This patch adds the --exit-on-lock-contention flag, which must be used
in conjunction with the --lock-file flag. When provided, it causes the
kubelet to wait for inotify events for that lock file. When an 'open'
event is received, the kubelet will exit.
Automatic merge from submit-queue
WIP v0 NVIDIA GPU support
```release-note
* Alpha support for scheduling pods on machines with NVIDIA GPUs whose kubelets use the `--experimental-nvidia-gpus` flag, using the alpha.kubernetes.io/nvidia-gpu resource
```
Implements part of #24071 for #23587
I am not familiar with the scheduler enough to know what to do with the scores. Mostly punting for now.
Missing items from the implementation plan: limitranger, rkt support, kubectl
support and docs
cc @erictune @davidopp @dchen1107 @vishh @Hui-Zhi @gopinatht
Implements part of #24071
I am not familiar with the scheduler enough to know what to do with the scores. Punting for now.
Missing items from the implementation plan: limitranger, rkt support, kubectl
support and user docs
Automatic merge from submit-queue
Kubelet eviction flag parsers and tests
The first two commits are from https://github.com/kubernetes/kubernetes/pull/24559 that have achieved LGTM.
The last commit is only part that is interesting, it adds the parsing logic to handle the flags, and reserves `pkg/kubelet/eviction` for eviction manager logic.
Automatic merge from submit-queue
Unit test for negative value of allocatable resources
Introduce unit test for checking resource quantities for kubelet's allocatable resources.
Covered values:
* negative quantity value: error expected
* invalid quantity unit: error expected
* valid quantity: error not expected
Running go test with -v, returned error are logged as well for more information:
```shell
=== RUN TestValueOfAllocatableResources
--- PASS: TestValueOfAllocatableResources (0.00s)
server_test.go:47: Returned err: "resource quantity for \"memory\" cannot be negative: -150G"
server_test.go:47: Returned err: "unable to parse quantity's suffix"
PASS
ok k8s.io/kubernetes/cmd/kubelet/app 0.020s
```
Automatic merge from submit-queue
Move typed clients into clientset folder
Move typed clients from `pkg/client/typed/` to `pkg/client/clientset_generated/${clientset_name}/typed`.
The first commit changes the client-gen, the last commit updates the doc, other commits are just moving things around.
@lavalamp @krousey
Covered values:
- negative quantity value: error expected
- invalid quantity unit: error expected
- valid quantity: error not expected
Running go test with -v, returned error are logged as well for more information:
=== RUN TestValueOfAllocatableResources
--- PASS: TestValueOfAllocatableResources (0.00s)
server_test.go:47: negative quantity value: resource quantity for \"memory\" cannot be negative: -150G
server_test.go:47: invalid quantity unit: unable to parse quantity's suffix
PASS
ok k8s.io/kubernetes/cmd/kubelet/app 0.020s
Automatic merge from submit-queue
rkt: bump rkt version to 1.2.1
Upon bumping the rkt version, `--hostname` is supported. Also we now gets the configs from the rkt api service, so `stage1-image` is deprecated.
cc @yujuhong @Random-Liu
Automatic merge from submit-queue
Make kubelet use an arch-specific pause image depending on GOARCH
Related to: #22876, #22683 and #15140
@ixdy @pwittrock @brendandburns @mikedanese @yujuhong @thockin @zmerlynn
When setting kube/system-resources for a node, negative quantities can result in
node's allocatable being higher then node's capacity.
Let's check the quantity and return error if it is negative.
`--kubelet-cgroups` and `--system-cgroups` respectively.
Updated `--runtime-container` to `--runtime-cgroups`.
Cleaned up most of the kubelet code that consumes these flags to match
the flag name changes.
Signed-off-by: Vishnu kannan <vishnuk@google.com>
* Metrics will not be expose until they are hooked up to a handler
* Metrics are not cached and expose a dos vector, this must be fixed before release or the stats should not be exposed through an api endpoint
This is part of migrating kubelet configuration to the componentconfig api
group and is preliminary to retrofitting client configuration and
implementing full fledged API group mechinary.
Signed-off-by: Mike Danese <mikedanese@google.com>
Add `kube-reserved` and `system-reserved` flags for configuration
reserved resources for usage outside of kubernetes pods. Allocatable is
provided by the Kubelet according to the formula:
```
Allocatable = Capacity - KubeReserved - SystemReserved
```
Also provides a method for estimating a reasonable default for
`KubeReserved`, but the current implementation probably is low and needs
more tuning.
Implement a flag that defines the frequency at which a node's out of
disk condition can change its status. Use this flag to suspend out of
disk status changes in the time period specified by the flag, after
the status is changed once.
Set the flag to 0 in e2e tests so that we can predictably test out of
disk node condition.
Also, use util.Clock interface for all time related functionality in
the kubelet. Calling time functions in unversioned package or time
package such as unversioned.Now() or time.Now() makes it really hard
to test such code. It also makes the tests flaky and sometimes
unnecessarily slow due to time.Sleep() calls used to simulate the
time elapsed. So use util.Clock interface instead which can be faked
in the tests.
Refactor Kubelet's server functionality into a server package. Most
notably, move pkg/kubelet/server.go into
pkg/kubelet/server/server.go. This will lead to better separation of
concerns and a more readable code hierarchy.