Automatic merge from submit-queue (batch tested with PRs 52047, 52063, 51528)
Improve dynamic kubelet config e2e node test and fix bugs
Rather than just changing the config once to see if dynamic kubelet
config at-least-sort-of-works, this extends the test to check that the
Kubelet reports the expected Node condition and the expected configuration
values after several possible state transitions.
Additionally, this adds a stress test that changes the configuration 100
times. It is possible for resource leaks across Kubelet restarts to
eventually prevent the Kubelet from restarting. For example, this test
revealed that cAdvisor's leaking journalctl processes (see:
https://github.com/google/cadvisor/issues/1725) could break dynamic
kubelet config. This test will help reveal these problems earlier.
This commit also makes better use of const strings and fixes a few bugs
that the new testing turned up.
Related issue: #50217
I had been sitting on this until the cAdvisor fix merged in #51751, as these tests fail without that fix.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Added large topology tests for static policy in CPU Manager.
**What this PR does / why we need it**: This PR adds a very large topology test case for the CPU Manager feature.
Related to #51180.
CC @ConnorDoyle
Rather than just changing the config once to see if dynamic kubelet
config at-least-sort-of-works, this extends the test to check that the
Kubelet reports the expected Node condition and the expected configuration
values after several possible state transitions.
Additionally, this adds a stress test that changes the configuration 100
times. It is possible for resource leaks across Kubelet restarts to
eventually prevent the Kubelet from restarting. For example, this test
revealed that cAdvisor's leaking journalctl processes (see:
https://github.com/google/cadvisor/issues/1725) could break dynamic
kubelet config. This test will help reveal these problems earlier.
This commit also makes better use of const strings and fixes a few bugs
that the new testing turned up.
Related issue: #50217
Automatic merge from submit-queue (batch tested with PRs 51239, 51644, 52076)
do not update init containers status if terminated
fixes#29972#41580
This fixes an issue where, if a completed init container is removed while the pod or subsequent init containers are still running, the status for that init container will be reset to `Waiting` with `PodInitializing`.
This can manifest in a number of ways.
If the init container is removed why the main pod containers are running, the status will be reset with no functional problem but the status will be reported incorrectly in `kubectl get pod` for example
If the init container is removed why a subsequent init container is running, the init container will be **re-executed** leading to all manner of badness.
@derekwaynecarr @bparees
Automatic merge from submit-queue (batch tested with PRs 51728, 49202)
Fix setNodeAddress when a node IP and a cloud provider are set
**What this PR does / why we need it**:
When a node IP is set and a cloud provider returns the same address with
several types, only the first address was accepted. With the changes made
in PR #45201, the vSphere cloud provider returned the ExternalIP first,
which led to a node without any InternalIP.
The behaviour is modified to return all the address types for the
specified node IP.
**Which issue this PR fixes**: fixes#48760
**Special notes for your reviewer**:
* I'm not a golang expert, is it possible to mock `kubelet.validateNodeIP()` to avoid the need of real host interface addresses in the test ?
* It would be great to have it backported for a next 1.6.8 release.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51728, 49202)
Enable CRI-O stats from cAdvisor
**What this PR does / why we need it**:
cAdvisor may support multiple container runtimes (docker, rkt, cri-o, systemd, etc.)
As long as the kubelet continues to run cAdvisor, runtimes with native cAdvisor support may not want to run multiple monitoring agents to avoid performance regression in production. Pending kubelet running a more light-weight monitoring solution, this PR allows remote runtimes to have their stats pulled from cAdvisor when cAdvisor is registered stats provider by introspection of the runtime endpoint.
See issue https://github.com/kubernetes/kubernetes/issues/51798
**Special notes for your reviewer**:
cAdvisor will be bumped to pick up https://github.com/google/cadvisor/pull/1741
At that time, CRI-O will support fetching stats from cAdvisor.
**Release note**:
```release-note
NONE
```
* Expected value comes before actual value in assert.Equal()
* Use assert.Equal() instead of assert.True() when possible
* Add a unit test that verifies no-op pod updates to the
secret_manager and the configmap_manager
* Add a clarifying comment about why it's good to seemingly
delete a secret on updates.
* Fix (for now, non-buggy) variable shadowing issue
Automatic merge from submit-queue
Make *fakeMountInterface in container_manager_unsupported_test.go implement mount.Interface again.
This was broken in #45724
**Release note**:
```release-note
NONE
```
/sig storage
/sig node
/cc @jsafrane, @vishh
Automatic merge from submit-queue (batch tested with PRs 51186, 50350, 51751, 51645, 51837)
Set up DNS server in containerized mounter path
During NFS/GlusterFS mount, it requires to have DNS server to be able to
resolve service name. This PR gets the DNS server ip from kubelet and
add it to the containerized mounter path. So if containerized mounter is
used, service name could be resolved during mount
**Release note**:
```release-note
Allow DNS resolution of service name for COS using containerized mounter. It fixed the issue with DNS resolution of NFS and Gluster services.
```
Automatic merge from submit-queue (batch tested with PRs 51186, 50350, 51751, 51645, 51837)
Update Cadvisor Dependency
Fixes: https://github.com/kubernetes/kubernetes/issues/51832
This is the worst dependency update ever...
The root of the problem is the [name change of Sirupsen -> sirupsen](https://github.com/sirupsen/logrus/issues/570#issuecomment-313933276). This means that in order to update cadvisor, which venders the lowercase, we need to update all dependencies to use the lower-cased version. With that being said, this PR updates the following packages:
`github.com/docker/docker`
- `github.com/docker/distribution`
- `github.com/opencontainers/go-digest`
- `github.com/opencontainers/image-spec`
- `github.com/opencontainers/runtime-spec`
- `github.com/opencontainers/selinux`
- `github.com/opencontainers/runc`
- `github.com/mrunalp/fileutils`
- `golang.org/x/crypto`
- `golang.org/x/sys`
- `github.com/docker/go-connections`
- `github.com/docker/go-units`
- `github.com/docker/libnetwork`
- `github.com/docker/libtrust`
- `github.com/sirupsen/logrus`
- `github.com/vishvananda/netlink`
`github.com/google/cadvisor`
- `github.com/euank/go-kmsg-parser`
`github.com/json-iterator/go`
Fixed https://github.com/kubernetes/kubernetes/issues/51832
```release-note
Fix journalctl leak on kubelet restart
Fix container memory rss
Add hugepages monitoring support
Fix incorrect CPU usage metrics with 4.7 kernel
Add tmpfs monitoring support
```
Automatic merge from submit-queue (batch tested with PRs 51186, 50350, 51751, 51645, 51837)
Wait for container cleanup before deletion
We should wait to delete pod API objects until the pod's containers have been cleaned up. See issue: #50268 for background.
This changes the kubelet container gc, which deletes containers belonging to pods considered "deleted".
It adds two conditions under which a pod is considered "deleted", allowing containers to be deleted:
Pods where deletionTimestamp is set, and containers are not running
Pods that are evicted
This PR also changes the function PodResourcesAreReclaimed by making it return false if containers still exist.
The eviction manager will wait for containers of previous evicted pod to be deleted before evicting another pod.
The status manager will wait for containers to be deleted before removing the pod API object.
/assign @vishh
Automatic merge from submit-queue (batch tested with PRs 50072, 51744)
Deviceplugin checkpoint
**What this PR does / why we need it**:
Extends on top of PR 51209 to checkpoint device to pod allocation information on Kubelet to recover from Kubelet restarts.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
During NFS/GlusterFS mount, it requires to have DNS server to be able to
resolve service name. This PR gets the DNS server ip from kubelet and
add it to the containerized mounter path. So if containerized mounter is
used, service name could be resolved during mount
Modifies the VolumeZonePredicate to handle a PV that belongs to more
then one zone or region. This is indicated by the zone or region label
value containing a comma separated list.
Introduce feature gate for expanding PVs
Add a field to SC
Add new Conditions and feature tag pvc update
Add tests for size update via feature gate
register the resize admission plugin
Update golint failures
Automatic merge from submit-queue
Remove deprecated init-container in annotations
fixes#50655fixes#51816closes#41004fixes#51816
Builds on #50654 and drops the initContainer annotations on conversion to prevent bypassing API server validation/security and targeting version-skewed kubelets that still honor the annotations
```release-note
The deprecated alpha and beta initContainer annotations are no longer supported. Init containers must be specified using the initContainers field in the pod spec.
```
Automatic merge from submit-queue (batch tested with PRs 51805, 51725, 50925, 51474, 51638)
Flexvolume dynamic plugin discovery: Prober unit tests and basic e2e test.
**What this PR does / why we need it**: Tests for changes introduced in PR #50031 .
As part of the prober unit test, I mocked filesystem, filesystem watch, and Flexvolume plugin initialization.
Moved the filesystem event goroutine to watcher implementation.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#51147
**Special notes for your reviewer**:
First commit contains added functionality of the mock filesystem.
Second commit is the refactor for moving mock filesystem into a common util directory.
Third commit is the unit and e2e tests.
**Release note**:
```release-note
NONE
```
/release-note-none
/sig storage
/assign @saad-ali @liggitt
/cc @mtaufen @chakri-nelluri @wongma7
Automatic merge from submit-queue (batch tested with PRs 51666, 49829, 51058, 51004, 50938)
Fix threshold notifier build tags
**What this PR does / why we need it**:
Cross building from darwin is currently broken on the following error:
```
# k8s.io/kubernetes/pkg/kubelet/eviction
pkg/kubelet/eviction/threshold_notifier_unsupported.go:25: NewMemCGThresholdNotifier redeclared in this block
previous declaration at pkg/kubelet/eviction/threshold_notifier_linux.go:38
```
It looks like #49300 broke the build tags introduced in #38630 and #37384. This fixes the build tag on `threshold_notifier_unsupported.go` as the cgo requirement was removed from `threshold_notifier_linux.go`.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#50935
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 45724, 48051, 46444, 51056, 51605)
Mount propagation in kubelet
Together with #45724 it implements mount propagation as proposed in https://github.com/kubernetes/community/pull/659
There is:
- New alpha annotation that allows user to explicitly set propagation mode for each `VolumeMount` in pod containers (to be replaced with real `VolumeMount.Propagation` field during beta) + validation + tests. "Private" is the default one (= no change to existing pods).
I know about proposal for real API fields for alpha feature in https://docs.google.com/document/d/1wuoSqHkeT51mQQ7dIFhUKrdi3-1wbKrNWeIL4cKb9zU/edit, but it seems it's not implemented yet. It would save me quite lot of code and ugly annotation.
- Updated CRI API to transport chosen propagation to Docker.
- New `kubelet --experimental-mount-propagation` option to enable the previous bullet without modifying types.go (worked around with changing `KubeletDeps`... not nice, but it's better than adding a parameter to `NewMainKubelet` and removing it in the next release...)
```release-note
kubelet has alpha support for mount propagation. It is disabled by default and it is there for testing only. This feature may be redesigned or even removed in a future release.
```
@derekwaynecarr @dchen1107 @kubernetes/sig-node-pr-reviews
Automatic merge from submit-queue
Make /var/lib/kubelet as shared during startup
This is part of ~~https://github.com/kubernetes/community/pull/589~~https://github.com/kubernetes/community/pull/659
We'd like kubelet to be able to consume mounts from containers in the future, therefore kubelet should make sure that `/var/lib/kubelet` has shared mount propagation to be able to see these mounts.
On most distros, root directory is already mounted with shared mount propagation and this code will not do anything. On older distros such as Debian Wheezy, this code detects that `/var/lib/kubelet` is a directory on `/` which has private mount propagation and kubelet bind-mounts `/var/lib/kubelet` as rshared.
Both "regular" linux mounter and `NsenterMounter` are updated here.
@kubernetes/sig-storage-pr-reviews @kubernetes/sig-node-pr-reviews
@vishh
Release note:
```release-note
Kubelet re-binds /var/lib/kubelet directory with rshared mount propagation during startup if it is not shared yet.
```
Automatic merge from submit-queue (batch tested with PRs 51590, 48217, 51209, 51575, 48627)
Skip system container cgroup stats if undefined
**What this PR does / why we need it**:
the kubelet /stats/summary endpoint tried to look up cgroup stats for containers that are not required. this polluted logs with messages about not finding stats for "" container. this pr skips cgroup stats if the cgroup name is not specified (they are optional anyway)
**Special notes for your reviewer**:
i think this was a regression from recent refactor.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51590, 48217, 51209, 51575, 48627)
Deviceplugin jiayingz
**What this PR does / why we need it**:
This PR implements the kubelet Device Plugin Manager.
It includes four commits implemented by @RenaudWasTaken and a commit that supports allocation.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Design document: kubernetes/community#695
PR tracking: kubernetes/features#368
**Special notes for your reviewer**:
**Release note**:
Extending Kubelet to support device plugin
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 50381, 51307, 49645, 50995, 51523)
Remove deprecated and experimental fields from KubeletConfiguration
As we work towards providing a stable (v1) kubeletconfig API,
we cannot afford to have deprecated or "experimental" (alpha) fields
living in the KubeletConfiguration struct. This removes all existing
experimental or deprecated fields, and places them in KubeletFlags
instead.
I'm going to send another PR after this one that organizes the remaining
fields into substructures for readability. Then, we should try to move
to v1 ASAP (maybe not v1 in 1.8, given how close we are, but definitely in 1.9).
It makes far more sense to focus on a clean API in kubeletconfig v2,
than to try and further clean up the existing "API" that everyone
already depends on.
fixes: #51657
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50381, 51307, 49645, 50995, 51523)
Bugfix: Use local JSON log buffer in parseDockerJSONLog.
**What this PR does / why we need it**:
The issue described in #47800 is due to a race condition in `ReadLogs`: Because the JSON log buffer (`dockerJSONLog`) is package-scoped, any two goroutines modifying the buffer could race and overwrite the other's changes. In particular, one goroutine could unmarshal a JSON log line into the buffer, then another goroutine could `Reset()` the buffer, and the resulting `Stream` would be empty (`""`). This empty `Stream` is caught in a `case` block and raises an `unexpected stream type` error.
This PR creates a new buffer for each execution of `parseDockerJSONLog`, so each goroutine is guaranteed to have a local instance of the buffer.
**Which issue this PR fixes**: fixes#47800
**Release note**:
```release-note
Fixed an issue (#47800) where `kubectl logs -f` failed with `unexpected stream type ""`.
```
It is possible to have multiple signals that point to the same type of
resource, e.g., both SignalNodeFsAvailable and
SignalAllocatableNodeFsAvailable refer to the same resource NodeFs.
Change the map from map[v1.ResourceName]evictionapi.Signal to
map[v1.ResourceName][]evictionapi.Signal
Automatic merge from submit-queue (batch tested with PRs 49971, 51357, 51616, 51649, 51372)
Separate feature gates for dynamic kubelet config vs loading from a file
This makes it so these two features can be turned on independently, rather than bundling both under dynamic kubelet config.
fixes: #51664
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 49971, 51357, 51616, 51649, 51372)
CPU manager wiring and `none` policy
Blocker for CPU manager #49186 (4 of 6)
* Previous PR in this series: #51140
* Next PR in this series: #51180
cc @balajismaniam @derekwaynecarr @sjenning
**Release note**:
```release-note
NONE
```
TODO:
- [X] In-memory CPU manager state
- [x] Kubelet config value
- [x] Feature gate
- [X] None policy
- [X] Unit tests
- [X] CPU manager instantiation
- [x] Calls into CPU manager from Kubelet container runtime
Automatic merge from submit-queue (batch tested with PRs 51628, 51637, 51490, 51279, 51302)
Fix pod local ephemeral storage usage calculation
We use podDiskUsage to calculate pod local ephemeral storage which is not correct, because podDiskUsage also contains HostPath volume which is considered as persistent storage
This pr fixes it
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#51489
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
/assign @jingxu97 @vishh
cc @ddysher
Automatic merge from submit-queue (batch tested with PRs 51513, 51515, 50570, 51482, 51448)
Add PVCRef to VolumeStats
**What this PR does / why we need it**:
For pod volumes that reference a PVC, add a PVCRef to the corresponding
volume stat. This allows metrics to be indexed/queried by PVC name
which is more user-friendly than Pod reference
**Which issue this PR fixes** : [#363](https://github.com/kubernetes/features/issues/363)
**Special notes for your reviewer**:
**Release note**:
```
`VolumeStats` reported by the kubelet stats summary API
(http://<node>:10255/stats/summary) now include a PVCRef
field describing the PVC referenced by the volume (if any).
```
Automatic merge from submit-queue (batch tested with PRs 51513, 51515, 50570, 51482, 51448)
fix typo about volumes
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50719, 51216, 50212, 51408, 51381)
Use constants instead of magic string for runtime names
**What this PR does / why we need it**:
Use constants instead of magic string for runtime names.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#51678
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
As we work towards providing a stable (v1) kubeletconfig API,
we cannot afford to have deprecated or "experimental" (alpha) fields
living in the KubeletConfiguration struct. This removes all existing
experimental or deprecated fields, and places them in KubeletFlags
instead.
I'm going to send another PR after this one that organizes the remaining
fields into substructures for readability. Then, we should try to move
to v1 ASAP.
It makes far more sense to focus on a clean API in kubeletconfig v2,
than to try and further clean up the existing "API" that everyone
already depends on.
Kubelet makes sure that /var/lib/kubelet is rshared when it starts.
If not, it bind-mounts it with rshared propagation to containers
that mount volumes to /var/lib/kubelet can benefit from mount propagation.
For pod volumes that reference a PVC, add a PVCRef to the corresponding
volume stat. This allows metrics to be indexed/queried by PVC name
which is more user-friendly than Pod reference
Automatic merge from submit-queue
Implement stop function in streaming server.
Implement streaming server stop, so that we could properly stop streaming server.
We need this to properly stop cri-containerd.
Automatic merge from submit-queue (batch tested with PRs 49961, 50005, 50738, 51045, 49927)
adding validations on kubelet starting configurations
**What this PR does / why we need it**:
I found some validations of kubelet starting options were missing when I was creating a custom cluster from scratch. The kubelet does not check invalid configurations on `--cadvisor-port`, `--event-burst`, `--image-gc-high-threshold`, etc. I have added some validations in kubelet like validations in `cmd/kube-apiserver/app/options/validation.go`.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
Adds additional validation for kubelet in `pkg/kubelet/apis/kubeletconfig/validation`.
```
When a node IP is set and a cloud provider returns the same address with
several types, on the first address was accepted. With the changes made
in PR #45201, the vSphere cloud provider returned the ExternalIP first,
which led to a node without any InternalIP.
The behaviour is modified to return all the address types for the
specified node IP.
Issue #48760
Automatic merge from submit-queue (batch tested with PRs 51425, 51404, 51459, 51504, 51488)
Admit NoNewPrivs for remote and rkt runtimes
**What this PR does / why we need it**:
#51347 is aiming to admit NoNewPrivis for remote container runtime, but it didn't actually solve the problem. See @miaoyq 's comments [here](https://github.com/kubernetes/kubernetes/pull/51347#discussion_r135379446).
This PR always admit NoNewPrivs for runtimes except docker, which should fix the problem.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
Fixes#51319.
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51471, 50561, 50435, 51473, 51436)
Fix inconsistent Prometheus cAdvisor metrics
**What this PR does / why we need it**:
We need this because otherwise kubelet is exposing different sets of Prometheus metrics that randomly include or do not include containers.
See also https://github.com/google/cadvisor/issues/1704; quoting here:
Prometheus requires that all metrics in the same family have the same labels, so we arrange to supply blank strings for missing labels
The function `containerPrometheusLabels()` conditionally adds various metric labels from container labels - pod name, image, etc. However, when it receives the metrics, Prometheus [checks](https://github.com/prometheus/client_golang/blob/master/prometheus/registry.go#L665) that all metrics in the same family have the same label set, and [rejects](https://github.com/prometheus/client_golang/blob/master/prometheus/registry.go#L497) those that do not.
Since containers are collected in (somewhat) random order, depending on which kind is seen first you get one set of metrics or the other.
Changing the container labels function to always add the same set of labels, adding `""` when it doesn't have a real value, eliminates the issue in my testing.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Fixes#50151
**Special notes for your reviewer**:
I have made the same fix in two places. I am 98% sure the one in `cadvisor_linux.go` isn't used and indeed cannot be used, but have not gone fully down that rabbit-hole.
**Release note**:
```release-note
Fix inconsistent Prometheus cAdvisor metrics
```
Automatic merge from submit-queue (batch tested with PRs 50932, 49610, 51312, 51415, 50705)
Deprecation warnings for auto detecting cloud providers
**What this PR does / why we need it**:
Adds deprecation warnings for auto detecting cloud providers. As part of the initiative for out-of-tree cloud providers, this feature is conflicting since we're shifting the dependency of kubernetes core into cAdvisor. In the future kubelets should be using `--cloud-provider=external` or no cloud provider at all.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#50986
**Special notes for your reviewer**:
NOTE: I still have to coordinate with sig-node and kubernetes-dev to get approval for this deprecation, I'm only opening this PR since we're close to code freeze and it's something presentable.
**Release note**:
```release-note
Deprecate auto detecting cloud providers in kubelet. Auto detecting cloud providers go against the initiative for out-of-tree cloud providers as we'll now depend on cAdvisor integrations with cloud providers instead of the core repo. In the near future, `--cloud-provider` for kubelet will either be an empty string or `external`.
```
Automatic merge from submit-queue (batch tested with PRs 50932, 49610, 51312, 51415, 50705)
Implement StatsProvider interface using cadvisor
Ref: https://github.com/kubernetes/kubernetes/issues/46984
- This PR changes the `StatsProvider` interface in `pkg/kubelet/server/stats` so that it can provide container stats from either cadvisor or CRI, and the summary API can consume the stats without knowing how they are provided.
- The `StatsProvider` struct in the newly added package `pkg/kubelet/stats` implements part of the `StatsProvider` interface in `pkg/kubelet/server/stats`.
- In `pkg/kubelet/stats`,
- `stats_provider.go`: implements the node level stats and provides the entry point for this package.
- `cadvisor_stats_provider.go`: implements the container level stats using cadvisor.
- `cri_stats_provider.go`: implements the container level stats using CRI.
- `helper.go`: utility functions shared by the above three components.
- There should be no user visible behaviors change in this PR.
- A follow up PR will implement the StatsProvider interface using CRI.
**Release note**:
```
None
```
/assign @yujuhong
/assign @WIZARD-CXY
Prometheus requires that all metrics in the same family have the same
labels, so we arrange to supply blank strings for missing labels
See https://github.com/google/cadvisor/issues/1704
Automatic merge from submit-queue
When faild create pod sandbox record event.
I created pods because of the failure to create a sandbox, but there was no clear message telling me what was the failure, so I wanted to record an event when the sandbox was created.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Always check if default labels on node need to be updated in kubelet
**What this PR does / why we need it**:
Nodes join again but maybe OS/Arch/Instance-Type has changed in the meantime.
In this case the kubelet needs to check if the default labels are still correct and if not it needs to update them.
```release-note
Kubelet updates default labels if those are deprecated
```
Automatic merge from submit-queue (batch tested with PRs 49861, 50933, 51380, 50688, 51305)
Test loading Kubelet config from a file
**What this PR does / why we need it**:
Adds a test for loading kubelet config from a file, part of improving https://github.com/kubernetes/kubernetes/issues/50217
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 49849, 50334, 51414)
make volumesInUse sorted in node status updates
**What this PR does / why we need it**:
`volumesInUse` is not sent in a stable sort order. This will make node status patch requests larger than needed, and makes debugging nodes harder than necessary.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#49731
**Special notes for your reviewer**:
/cc @derekwaynecarr @jboyd01
**Release note**:
```release-note
make volumesInUse sorted in node status updates
```
Automatic merge from submit-queue
Remove null -> [] slice hack
Closes#44593
When 1.6 added protobuf storage, the storage layer lost the ability to persist slice fields with empty but non-null values.
As a workaround, we tried to convert empty slice fields to `[]`, rather than `null`. Compressing `null` -> `[]` was just as much of an API breakage as `[]` -> `null`, but was hoped to cause fewer problems in clients that don't do null checks.
Because of conversion optimizations around converting lists of objects, the `null` -> `[]` hack was discovered to only apply to individual get requests, not to a list of objects. 1.6 and 1.7 was released with this behavior, and the world didn't explode. 1.7 documented the breaking API change that `null` and `[]` should be considered equivalent, unless otherwise noted on a particular field.
This PR:
* Reverts the earlier attempt (https://github.com/kubernetes/kubernetes/pull/43422) at ensuring non-null json slice output in conversion
* Makes results of `get` consistent with the results of `list` (which helps naive clients that do deepequal comparisons of objects obtained via list/watch and get), and allows empty slice fields to be returned as `null`
```release-note
Protobuf serialization does not distinguish between `[]` and `null`.
API fields previously capable of storing and returning either `[]` and `null` via JSON API requests (for example, the Endpoints `subsets` field) can now store only `null` when created using the protobuf content-type or stored in etcd using protobuf serialization (the default in 1.6+). JSON API clients should tolerate `null` values for such fields, and treat `null` and `[]` as equivalent in meaning unless specifically documented otherwise for a particular field.
```
Automatic merge from submit-queue (batch tested with PRs 51391, 51338, 51340, 50773, 49599)
Remove duplicate code
This PR cleans up Kubelet test code. Adds a function enabling the removal of duplicate code for Mock chaining. Also adds a function to check the pod status, again enabling removal of duplicate code.
Fixes#22470
**Special notes for your reviewer**:
This is my first PR for the Kubernetes project. Keeping it simple.
Automatic merge from submit-queue (batch tested with PRs 51391, 51338, 51340, 50773, 49599)
Delete "hugetlb" from whitelistControllers
**What this PR does / why we need it**:
Delete "hugetlb" from whitelistControllers
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#50770
**Special notes for your reviewer**:
**Release note**:
```
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51054, 51101, 50031, 51296, 51173)
Dynamic Flexvolume plugin discovery, probing with filesystem watch.
**What this PR does / why we need it**: Enables dynamic Flexvolume plugin discovery. This model uses a filesystem watch (fsnotify library), which notifies the system that a probe is necessary only if something changes in the Flexvolume plugin directory.
This PR uses the dependency injection model in https://github.com/kubernetes/kubernetes/pull/49668.
**Release Note**:
```release-note
Dynamic Flexvolume plugin discovery. Flexvolume plugins can now be discovered on the fly rather than only at system initialization time.
```
/sig-storage
/assign @jsafrane @saad-ali
/cc @bassam @chakri-nelluri @kokhang @liggitt @thockin
Automatic merge from submit-queue (batch tested with PRs 51054, 51101, 50031, 51296, 51173)
Refactor kuberuntime test case with sets.String
**What this PR does / why we need it**:
change to make got and want use sets.String instead, since that is both safe and more clearly shows the intent.
ref: https://github.com/kubernetes/kubernetes/pull/50554
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/kubernetes/kubernetes/issues/51396
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50889, 51347, 50582, 51297, 51264)
Fix NoNewPrivs and also allow remote runtime to provide the support.
Fixes https://github.com/kubernetes/kubernetes/issues/51319.
This PR:
1) Let kubelet admit remote runtime for `NoNewPrivis` container runtime.
2) Fix a `NoNewPrivis` bug which checks wrong runtime type.
/cc @kubernetes/sig-node-bugs @jessfraz
Automatic merge from submit-queue (batch tested with PRs 50889, 51347, 50582, 51297, 51264)
Change eviction manager to manage one single local storage resource
**What this PR does / why we need it**:
We decided to manage one single resource name, eviction policy should be modified too.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: part of #50818
**Special notes for your reviewer**:
**Release note**:
```release-note
Change eviction manager to manage one single local ephemeral storage resource
```
/assign @jingxu97
The ReadOnlyPort defaulting prevented passing 0 to diable via
the KubeletConfiguraiton struct.
The HealthzPort defaulting prevented passing 0 to disable via the
KubeletConfiguration struct. The documentation also failed to mention
this, but the check is performed in code.
The CAdvisorPort documentation failed to mention that you can pass 0 to
disable.
fsGroup check will be enforcing that if a volume has already been
mounted by one pod and another pod wants to mount it but has a different
fsGroup value, this mount operation will not be allowed.
Automatic merge from submit-queue (batch tested with PRs 47115, 51196, 51204, 51208, 51206)
Delete redundant err definition
**What this PR does / why we need it**:
Delete reduandant err definition
Line 307 has err definition and initialization.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Remove crash loop "detection" from the dynamic kubelet config feature
**What this PR does / why we need it**:
The subfeature was a cool idea, but in the end it is very complex to
separate Kubelet restarts into crash-loops caused by config vs.
crash-loops caused by other phenomena, like admin-triggered node restarts,
kernel panics, and and process babysitter behavior. Dynamic kubelet config
will be better off without the potential for false positives here.
Removing this subfeature also simplifies dynamic configuration by
reducing persistent state:
- we no longer need to track bad config in a file
- we no longer need to track kubelet startups in a file
**Which issue this PR fixes**: fixes#50216
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51113, 46597, 50397, 51052, 51166)
implement proposal 34058: hostPath volume type
**What this PR does / why we need it**:
implement proposal #34058
**Which issue this PR fixes** : fixes#46549
**Special notes for your reviewer**:
cc @thockin @luxas @euank PTAL
Automatic merge from submit-queue
Support HostAlias for HostNetwork Pods
**What this PR does / why we need it**: Currently, HostAlias does not support HostNetwork pods because historically, kubelet only manages hosts file for non-HostNetwork pods. With the recent change in https://github.com/kubernetes/kubernetes/pull/49140, kubelet now manages hosts file for all Pods, which enables HostAlias support also.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#48398
**Special notes for your reviewer**: might be easier to review commit-by-commit
**Release note**:
```release-note
HostAlias is now supported for both non-HostNetwork Pods and HostNetwork Pods.
```
@yujuhong @hongchaodeng @thockin
Automatic merge from submit-queue (batch tested with PRs 50489, 51070, 51011, 51022, 51141)
Fixed code comments that were not updated
**What this PR does / why we need it**:
The comment of the args ‘KubeReserved’ is out of date and there is no consistent with command line messages
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Release note**:
```
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50257, 50247, 50665, 50554, 51077)
Replace hard-code "cpu" and "memory" to consts
**What this PR does / why we need it**:
There are many places using hard coded "cpu" and "memory" as resource name. This PR replace them to consts.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
/kind cleanup
**Release note**:
```release-note
NONE
```
The subfeature was a cool idea, but in the end it is very complex to
separate Kubelet restarts into crash-loops caused by config vs.
crash-loops caused by other phenomena, like admin-triggered node restarts,
kernel panics, and and process babysitter behavior. Dynamic kubelet config
will be better off without the potential for false positives here.
Removing this subfeature also simplifies dynamic configuration by
reducing persistent state:
- we no longer need to track bad config in a file
- we no longer need to track kubelet startups in a file
Automatic merge from submit-queue (batch tested with PRs 51102, 50712, 51037, 51044, 51059)
Create the directory for cadvisor if needed
**What this PR does / why we need it**:
In 6c7245d464, code was added to
bail out if the directory that cadvisor monitored did not exist.
However, this breaks the earlier assumption that kubelet created
directories when needed in pkg/kubelet/kubelet.go's setupDataDirs()
method. setupDataDirs() happens much later, so basically kubelet
exits now.
So since cadvisor really needs this directory, let us just create
it
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Fixes#50709
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50967, 50505, 50706, 51033, 51028)
Revert "Merge pull request #51008 from kubernetes/revert-50789-fix-scheme"
I'm spinning up a cluster right now to test this fix, but I'm pretty sure this was the problem.
There doesn't seem to be a way to confirm from logs, because AFAICT the logs from the hollow kubelet containers are not collected as part of the kubemark test.
**What this PR does / why we need it**:
This reverts commit f4afdecef8, reversing
changes made to e633a1604f.
This also fixes a bug where Kubemark was still using the core api scheme
to manipulate the Kubelet's types, which was the cause of the initial
revert.
**Which issue this PR fixes**: fixes#51007
**Release note**:
```release-note
NONE
```
/cc @shyamjvs @wojtek-t
Automatic merge from submit-queue (batch tested with PRs 46458, 50934, 50766, 50970, 47698)
Prepare VolumeHost for running mount tools in containers
This is the first part of implementation of https://github.com/kubernetes/features/issues/278 - running mount utilities in containers.
It updates `VolumeHost` interface:
* `GetMounter()` now requires volume plugin name, as it is going to return different mounter to different volume plugings, because mount utilities for these plugins can be on different places.
* New `GetExec()` method that should volume plugins use to execute any utilities. This new `Exec` interface will execute them on proper place.
* `SafeFormatAndMount` is updated to the new `Exec` interface.
This is just a preparation, `GetExec` right now leads to simple `os.Exec` and mount utilities are executed on the same place as before. Also, the volume plugins will be updated in subsequent PRs (split into separate PRs, some plugins required lot of changes).
```release-note
NONE
```
@kubernetes/sig-storage-pr-reviews
@rootfs @gnufied
Automatic merge from submit-queue (batch tested with PRs 50531, 50853, 49976, 50939, 50607)
cni: print better error when a CNI .configlist is put into a .config
If the admin mistakenly puts a CNI configlist into a "conf" file, that's not correct, but kubelet will still read the config file and then fail to start the pod because "type=". Be a bit smarter about that. Should also be fixed in CNI, which I'm doing a PR for as well.
@squeed @thockin @freehan
This reverts commit f4afdecef8, reversing
changes made to e633a1604f.
This also fixes a bug where Kubemark was still using the core api scheme
to manipulate the Kubelet's types, which was the cause of the initial
revert.
In 6c7245d464, code was added to
bail out if the directory that cadvisor monitored did not exist.
However, this breaks the earlier assumption that kubelet created
directories when needed in pkg/kubelet/kubelet.go's setupDataDirs()
method. setupDataDirs() happens much later, so basically kubelet
exits now.
So since cadvisor really needs this directory, let us just create
it
Fixes#50709
Automatic merge from submit-queue (batch tested with PRs 46512, 50146)
Make metav1.(Micro)?Time functions take pointers
Is there any reason for those functions not to be on pointers?
Automatic merge from submit-queue
Fix initial exec terminal dimensions
**What this PR does / why we need it**:
Delay attempting to send a terminal resize request to docker until after
the exec has started; otherwise, the initial resize request will fail.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#47990
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
delete ineffectual assginment
**What this PR does / why we need it**:
delete ineffectual assignment
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
Automatic merge from submit-queue (batch tested with PRs 50536, 50809, 50220, 50399, 50176)
Set ExecSync timeout in liveness prober.
Although Dockershim doesn't actually support `ExecSync` timeout (see [here](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/dockershim/exec.go#L137)), we should set the timeout, so that the other runtime which supports the timeout could work properly.
Fixes#50389.
/cc @yujuhong @timstclair @feiskyer
Automatic merge from submit-queue (batch tested with PRs 50563, 50698, 50796)
Disable Docker's health check until we officially support it
Ref: https://github.com/kubernetes/kubernetes/issues/50703
Tested locally.
Without this PR:
```
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
afdd796ddddc gcr.io/ygg-gke-dev/nginx-healthcheck "nginx -g 'daemon ..." 5 minutes ago Up 5 minutes (healthy) k8s_test-container_test-pod_default_8a1ad225-82bf-11e7-becb-480fcf514648_0
```
With this PR:
```
e3fb2437555f gcr.io/ygg-gke-dev/nginx-healthcheck "nginx -g 'daemon ..." 10 seconds ago Up 9 seconds k8s_test-container_test-pod_default_de82e83c-82c0-11e7-b3fc-480fcf514648_0
```
**Release note**:
```
Disable Docker's health check until we officially support it.
```
/assign @yujuhong
/assign @dchen1107
Automatic merge from submit-queue (batch tested with PRs 49869, 47987, 50211, 50804, 50583)
Make socket address parsing work on FreeBSD.
**What this PR does / why we need it**:
I am currently in the process of porting Kubernetes to work on FreeBSD. What is interesting is that I am not interested in using Kubernetes to run Docker containers in this case. I happen to be the author of CloudABI, a sandboxing framework that is available on FreeBSD (and other systems). I want to have a cluster management tool for scheduling these sandboxed processes.
Anyway, right now `kubelet` crashes on startup when passing in CRI command line flags, for the reason that it's not able to parse `unix:...` socket addresses. This change fixes this by making the respective Linux-only source file work on FreeBSD as well.
Automatic merge from submit-queue (batch tested with PRs 49342, 50581, 50777)
Device Plugin Protobuf API
**What this PR does / why we need it:**
This implements the Device Plugin API
- Design document: kubernetes/community#695
- PR tracking: [kubernetes/features#368](https://github.com/kubernetes/features/issues/368#issuecomment-321625420)
Special notes for your reviewer:
First proposal submitted to the community repo, please advise if something's not right with the format or procedure, etc.
@vishh @derekwaynecarr
**Release note:**
```
NONE
```
Automatic merge from submit-queue (batch tested with PRs 46317, 48922, 50651, 50230, 47599)
Rerun init containers when the pod needs to be restarted
Whenever pod sandbox needs to be recreated, all containers associated
with it will be killed by kubelet. This change ensures that the init
containers will be rerun in such cases.
The change also refactors the compute logic so that the control flow of
init containers act is more aligned with the regular containers. Unit
tests are added to verify the logic.
This fixes#36485
Automatic merge from submit-queue (batch tested with PRs 46317, 48922, 50651, 50230, 47599)
Resources outside the `*kubernetes.io` namespace are integers and cannot be over-committed.
**What this PR does / why we need it**:
Fixes#50473
Rationale: since the scheduler handles all resources except CPU as integers, that could just be the default behavior for namespaced resources.
cc @RenaudWasTaken @vishh
**Release note**:
```release-note
Resources outside the `*kubernetes.io` namespace are integers and cannot be over-committed.
```
Whenever pod sandbox needs to be recreated, all containers associated
with it will be killed by kubelet. This change ensures that the init
containers will be rerun in such cases.
The change also refactors the compute logic so that the control flow of
init containers act is more aligned with the regular containers. Unit
tests are added to verify the logic.
Automatic merge from submit-queue (batch tested with PRs 50694, 50702)
Fix make cross build failure
**What this PR does / why we need it**:
had to fix the method getSecurityOpts in helpers_windows.go to
match the implementation in helpers_linux.go from commit:
bf01fa2f00
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Fixes#50675
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Remove the status of the terminated containers in the summary endpoint
Ref: https://github.com/kubernetes/kubernetes/issues/47853
- When building summary, a container is considered to be terminated if it has an older creation time and no CPU instantaneous or memory RSS usage.
- We remove the terminated containers in the summary by grouping the containers with the same name in the same pod, sorting them in each group by creation time, and skipping the oldest ones with no usage in each group. Let me know if there's simpler way.
**Release note**:
```
None
```
/assign @yujuhong
Automatic merge from submit-queue
Typed static/mirror pod UID translation
Fixes#36031 , partially.
TODO:
- [x] Add types ResolvedPodUID and MirrorPodUID.
- [x] Use the ResolvedPodUID type with minimal changes.
- [x] Use the MirrorPodUID type with minimal changes.
- [x] Clarify whether the new types should be used anywhere else; if so make the agreed upon changes.
```NONE
```
Automatic merge from submit-queue (batch tested with PRs 50094, 48966, 49478, 50593, 49140)
Kubelet manage hosts file for HostNetwork Pods instead of Docker
**What this PR does / why we need it**: Currently, Docker manages the hosts file for containers inside Pods using hostNetwork. It creates discrepancy between how we treat hostNetwork and non-hostNetwork Pods. Kubelet should manage the file regardless of the network setup.
**Which issue this PR fixes**: fixes#48397 more context in https://github.com/kubernetes/kubernetes/issues/43632#issuecomment-304376441
**Special notes for your reviewer**: Because the new logic relies on reading the node filesystem, I'm not sure how to write a proper unit test. I was thinking about using a node e2e test to cover the case, but suggestions are greatly welcomed.
**Release note**:
```release-note
Kubelet now manages `/etc/hosts` file for both hostNetwork Pods and non-hostNetwork Pods.
```
/kind feature
/sig node
@yujuhong @hongchaodeng @thockin
@kubernetes/sig-network-feature-requests @kubernetes/sig-node-feature-requests
Automatic merge from submit-queue (batch tested with PRs 49488, 50407, 46105, 50456, 50258)
Add UpdateContainerResources method to CRI
This is first step toward support for opinionated cpu pinning for certain guaranteed pods.
In order to do this, the kubelet needs to be able to dynamically update the cpuset at the container level, which is managed by the container runtime. Thus the kubelet needs a method to communicate over the CRI so the runtime can then modify the container cgroup.
This is used in the situation where a core is added or removed from the shared pool to become a exclusive core for a new G pod. The cpuset for all containers in the shared pool will need to be updated to add or remove that core.
Opening this up now so we can start discussion. The need for a change to the CRI might be unexpected.
@derekwaynecarr @vishh @ConnorDoyle
```release-note
NONE
```
Automatic merge from submit-queue
Task 2: Added toleration to DaemonSet pods for node condition taints
**What this PR does / why we need it**:
If TaintByCondition was enabled, added toleration to DaemonSet pods for node condition taints.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: part of #42001
**Release note**:
```release-note
None
```
Automatic merge from submit-queue (batch tested with PRs 49725, 50367, 50391, 48857, 50181)
Use 'Infof' instead of 'Errorf' for a debug log
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
#50167
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 49642, 50335, 50390, 49283, 46582)
Admit sysctls for other runtime.
Fixes https://github.com/kubernetes/kubernetes/issues/50343.
Admit sysctl for other runtimes.
/cc @mikebrow @yujuhong @feiskyer @sttts
Automatic merge from submit-queue (batch tested with PRs 50016, 49583, 49930, 46254, 50337)
Alpha Dynamic Kubelet Configuration
Feature: https://github.com/kubernetes/features/issues/281
This proposal contains the alpha implementation of the Dynamic Kubelet Configuration feature proposed in ~#29459~ [community/contributors/design-proposals/dynamic-kubelet-configuration.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/dynamic-kubelet-configuration.md).
Please note:
- ~The proposal doc is not yet up to date with this implementation, there are some subtle differences and some more significant ones. I will update the proposal doc to match by tomorrow afternoon.~
- ~This obviously needs more tests. I plan to write several O(soon). Since it's alpha and feature-gated, I'm decoupling this review from the review of the tests.~ I've beefed up the unit tests, though there is still plenty of testing to be done.
- ~I'm temporarily holding off on updating the generated docs, api specs, etc, for the sake of my reviewers 😄~ these files now live in a separate commit; the first commit is the one to review.
/cc @dchen1107 @vishh @bgrant0607 @thockin @derekwaynecarr
```release-note
Adds (alpha feature) the ability to dynamically configure Kubelets by enabling the DynamicKubeletConfig feature gate, posting a ConfigMap to the API server, and setting the spec.configSource field on Node objects. See the proposal at https://github.com/kubernetes/community/blob/master/contributors/design-proposals/dynamic-kubelet-configuration.md for details.
```
On FreeBSD, it is perfectly reasonable to make use of util_linux.go.
Rename util_linux.go to util_unix.go so that it may be used on non-Linux
UNIX-like systems. Add proper 'freebsd' build tags.
Automatic merge from submit-queue
Support exec/attach/portforward in `kubectl proxy`
Use the UpgradeAwareProxy shared code in kubectl proxy. Provide a separate transport for those requests that does not have HTTP/2 enabled. Refactor the code to be a bit cleaner in places and to better separate changes.
Fixes#32026
```release-note
`kubectl proxy` will now correctly handle the `exec`, `attach`, and `portforward` commands. You must pass `--disable-filter` to the command in order to allow these endpoints.
```
Automatic merge from submit-queue (batch tested with PRs 50208, 50259, 49702, 50267, 48986)
Relax restrictions on environment variable names.
Fixes#2707
The POSIX standard restricts environment variable names to uppercase letters, digits, and the underscore character in shell contexts only. For generic application usage, it is stated that all other characters shall be tolerated. (Reference [here](http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap08.html), my prose reasoning [here](https://github.com/kubernetes/kubernetes/issues/2707#issuecomment-285309156).)
This change relaxes the rules to some degree. Namely, we stop requiring environment variable names to be strict `C_IDENTIFIERS` and start permitting lowercase, dot, and dash characters.
Public container images using environment variable names beyond the shell-only context can benefit from this relaxation. Elasticsearch is one popular example.
Automatic merge from submit-queue (batch tested with PRs 49885, 49751, 49441, 49952, 49945)
Ignore UDP metrics in kubelet
Updating cadvisor godeps to 0.26.0 for the 1.7 release (#46658) added udp metrics. However, they were not disabled in the kubelet.
This PR disables collection of UDP metrics in the kubelet.
This should be cherrypicked to the 1.7 branch.
cc @dchen1107
Automatic merge from submit-queue
simplify if and else for code
Signed-off-by: allencloud <allen.sun@daocloud.io>
**What this PR does / why we need it**:
This PR tries to simplify the code of if and else, and this could make code a little bit cleaner.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
NONE
**Special notes for your reviewer**:
NONE
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Timeout and Max-in-flight don't report non-resource URLs correctly.
Unify error reporting for 429 and 504 to be correct for timeout and max in flight and eviction. Add better messages to eviction (removing a todo). Return the correct body content for timeouts (reason and code should be correct).
This potentially increases cardinality of 429, but because non-api urls may be under the max-inflight budget we need to report them somewhere (if something breaks and starts fetching API versions endlessly).
```release-note
The 504 timeout error was returning a JSON error body that indicated it was a 500. The body contents now correctly report a 500 error.
```
Automatic merge from submit-queue
Update OWNERS to correct members' handles
**What this PR does / why we need it**:
Fix some typos of members' handles as per https://github.com/kubernetes/kubernetes/issues/50048#issuecomment-319831957.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Associated with: #50048
**Special notes for your reviewer**:
/cc @madhusudancs @sebgoa @liggitt @saad-ali
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50119, 48366, 47181, 41611, 49547)
Fail on swap enabled and deprecate experimental-fail-swap-on flag
**What this PR does / why we need it**:
* Deprecate the old experimental-fail-swap-on
* Add a new flag fail-swap-on and set it to true
Before this change, we would not fail when swap is on. With this
change we fail for everyone when swap is on, unless they explicitly
set --fail-swap-on to false.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Fixes#34726
**Special notes for your reviewer**:
**Release note**:
```release-note
Kubelet will by default fail with swap enabled from now on. The experimental flag "--experimental-fail-swap-on" has been deprecated, please set the new "--fail-swap-on" flag to false if you wish to run with /proc/swaps on.
```
Requires a separate transport that is guaranteed not to be HTTP/2 for
exec/attach/portforward, because otherwise the Go client attempts to
upgrade us to HTTP/2 first.
Automatic merge from submit-queue
kubelet: remove code for handling old pod/containers paths
**What this PR does / why we need it**:
This PR removes the code for handling the paths that has been deprecated for a long time.
**Release note**:
```release-note
NONE
```
CC @simo5
Automatic merge from submit-queue (batch tested with PRs 50103, 49677, 49449, 43586, 48969)
Do not try to run preStopHook when the gracePeriod is 0
**What this PR does / why we need it**:
1. Sometimes when the user force deletes a POD with no gracePeriod, its possible that kubelet attempts to execute the preStopHook which will certainly fail. This PR prevents this inavitable PreStopHook failure.
```
kubectl delete --force --grace-period=0 po/<pod-name>
```
2. This also adds UT for LifeCycle Hooks
```
time go test --cover -v --run "Hook" ./pkg/kubelet/kuberuntime/
.
.
.
--- PASS: TestLifeCycleHook (0.00s)
--- PASS: TestLifeCycleHook/PreStop-CMDExec (0.00s)
--- PASS: TestLifeCycleHook/PreStop-HTTPGet (0.00s)
--- PASS: TestLifeCycleHook/PreStop-NoTimeToRun (0.00s)
--- PASS: TestLifeCycleHook/PostStart-CmdExe (0.00s)
PASS
coverage: 15.3% of statements
ok k8s.io/kubernetes/pkg/kubelet/kuberuntime 0.429s
```
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```
Do not try to run preStopHook when the gracePeriod is 0
```
Automatic merge from submit-queue
Fix incorrect owner in OWNERS
**What this PR does / why we need it**:
typo: yuyuhong --> yujuhong
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
https://github.com/kubernetes/kubernetes/issues/50048#issuecomment-319846621
**Special notes for your reviewer**:
/assign @yujuhong
I don't know whether you can approve this PR or not in such case 😄
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Switch from package syscall to golang.org/x/sys/unix
**What this PR does / why we need it**:
The syscall package is locked down and the comment in https://github.com/golang/go/blob/master/src/syscall/syscall.go#L21-L24 advises to switch code to use the corresponding package from golang.org/x/sys. This PR does so and replaces usage of package syscall with package golang.org/x/sys/unix where applicable. This will also allow to get updates and fixes
without having to use a new go version.
In order to get the latest functionality, golang.org/x/sys/ is re-vendored. This also allows to use Eventfd() from this package instead of calling the eventfd() C function.
**Special notes for your reviewer**:
This follows previous works in other Go projects, see e.g. moby/moby#33399, cilium/cilium#588
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 49237, 49656, 49980, 49841, 49899)
certificate manager: close existing client conns once cert rotates
After the kubelet rotates its client cert, it will keep connections to the API server open indefinitely, causing it to use its old credentials instead of the new certs. Because the API server authenticates client certs at the time of the request, and not the handshake, this could cause the kubelet to start hitting auth failures even if it rotated its certificate to a new, valid one.
When the kubelet rotates its cert, close down existing connections to force a new TLS handshake.
Ref https://github.com/kubernetes/features/issues/266
Updates https://github.com/kubernetes-incubator/bootkube/pull/663
```release-note
After a kubelet rotates its client cert, it now closes its connections to the API server to force a handshake using the new cert. Previously, the kubelet could keep its existing connection open, even if the cert used for that connection was expired and rejected by the API server.
```
/cc @kubernetes/sig-auth-bugs
/assign @jcbsmpsn @mikedanese
Automatic merge from submit-queue (batch tested with PRs 49237, 49656, 49980, 49841, 49899)
[Bug Fix] Set NodeOODCondition to false
fixes#49839, which was introduced by #48846
This PR makes the kubelet set NodeOODCondition to false, so that the scheduler and other controllers do not consider the node to be unschedulable.
/assign @vishh
/sig node
/release-note-none
* Deprecate the old experimental-fail-swap-on
* Add a new flag fail-swap-on and set it to true
Before this change, we would not fail when swap is on. With this
change we fail for everyone when swap is on, unless they explicitly
set --fail-swap-on to false.
Automatic merge from submit-queue
If error continue for loop
If err does not add continue, type conversion will be error.
If do not add continue, pod. (* V1.Pod) may cause panic to run.
**Release note**:
```release-note
NONE
```
After the kubelet rotates its client cert, it will keep connections
to the API server open indefinitely, causing it to use its old
credentials instead of the new certs
When the kubelet rotates its cert, close down existing connections
to force a new TLS handshake.
Automatic merge from submit-queue (batch tested with PRs 49284, 49555, 47639, 49526, 49724)
skip WaitForAttachAndMount for terminated pods in syncPod
Fixes https://github.com/kubernetes/kubernetes/issues/49663
I tried to tread lightly with a small localized change because this needs to be picked to 1.7 and 1.6 as well.
I suspect this has been as issue since we started unmounting volumes on pod termination https://github.com/kubernetes/kubernetes/pull/37228
xref openshift/origin#14383
@derekwaynecarr @eparis @smarterclayton @saad-ali @jwforres
/release-note-none
Automatic merge from submit-queue (batch tested with PRs 49284, 49555, 47639, 49526, 49724)
Change pod config to manifest
**What this PR does / why we need it**:
As per https://github.com/kubernetes/kubernetes/pull/46494#discussion_r119675805, change `config` to `manifest` to avoid ambiguous.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
Since it's a minor fix, there is no issue here.
/cc @mtaufen
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 49651, 49707, 49662, 47019, 49747)
Add support for `no_new_privs` via AllowPrivilegeEscalation
**What this PR does / why we need it**:
Implements kubernetes/community#639
Fixes#38417
Adds `AllowPrivilegeEscalation` and `DefaultAllowPrivilegeEscalation` to `PodSecurityPolicy`.
Adds `AllowPrivilegeEscalation` to container `SecurityContext`.
Adds the proposed behavior to `kuberuntime`, `dockershim`, and `rkt`. Adds a bunch of unit tests to ensure the desired default behavior and that when `DefaultAllowPrivilegeEscalation` is explicitly set.
Tests pass locally with docker and rkt runtimes. There are also a few integration tests with a `setuid` binary for sanity.
**Release note**:
```release-note
Adds AllowPrivilegeEscalation to control whether a process can gain more privileges than it's parent process
```