Automatic merge from submit-queue (batch tested with PRs 64013, 63896, 64139, 57527, 62102). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Setup fsGroup for local volumes correctly
**What this PR does / why we need it**:
This pr fixes fsGroup check in local volume in containerized kubelet. Except this, it also fixes fsGroup check when volume source is a normal directory whether kubelet is running on the host or in a container.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#61741
**Special notes for your reviewer**:
Bind mounts are detected in `/proc/mounts`, but it does not contain root of mount for bind mounts. So `mount.GetMountRefsByDev()` cannot get all references if source is a normal directory. e.g.
```
# mkdir /tmp/src /mnt/dst
# mount --bind /tmp/src /tmp/src # required by local-volume-provisioner, see https://github.com/kubernetes-incubator/external-storage/pull/499
# mount --bind /tmp/src /mnt/dst
# grep -P 'src|dst' /proc/mounts
tmpfs /tmp/src tmpfs rw,nosuid,nodev,noatime,size=4194304k 0 0
tmpfs /mnt/dst tmpfs rw,nosuid,nodev,noatime,size=4194304k 0 0
# grep -P 'src|dst' /proc/self/mountinfo
234 409 0:42 /src /tmp/src rw,nosuid,nodev,noatime shared:30 - tmpfs tmpfs rw,size=4194304k
235 24 0:42 /src /mnt/dst rw,nosuid,nodev,noatime shared:30 - tmpfs tmpfs rw,size=4194304k
```
We need to compare root of mount and device in this case.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 64102, 63303, 64150, 63841). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
When creating ext3/ext4 volume, disable reserved blocks
**What this PR does / why we need it**:
When creating ext3/ext4 volume, `mkfs` defaults to reserving 5% of the volume for the super-user (root). This patch changes the `mkfs` to pass `-m0` to disable this setting.
Rationale: Reserving a percentage of the volume is generally a neither useful nor desirable feature for volumes that aren't used as root file systems for Linux distributions, since the reserved portion becomes unavailable for non-root users. For containers, the general case is to use the entire volume for data, without running as root. The case where one might want reserved blocks enabled is much rarer.
**Special notes for your reviewer**:
I also added some comments to describe the flags passed to `mkfs`.
**Release note**:
```release-note
Changes ext3/ext4 volume creation to not reserve any portion of the volume for the root user.
```
Automatic merge from submit-queue (batch tested with PRs 63914, 63887, 64116, 64026, 62933). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Enable SELinux relabeling in CSI volumes
**What this PR does / why we need it**:
CSI volume plugin should provide correct information in `GetAttributes` call so kubelet can ask container runtime to relabel the volume. Therefore CSI volume plugin needs to check if a random volume mounted by a CSI driver supports SELinux or not by checking for "seclabel" mount or superblock option.
**Which issue(s) this PR fixes**
Fixes#63965
**Release note**:
```release-note
NONE
```
@saad-ali @vladimirvivien @davidz627
@cofyc, FYI, I'm changing `struct mountInfo`.
Automatic merge from submit-queue (batch tested with PRs 63914, 63887, 64116, 64026, 62933). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubeadm: Write kubelet config file to disk and persist in-cluster
**What this PR does / why we need it**:
In order to make configuration flow from the cluster level to node level, we need a way for kubeadm to tell the kubelet what config to use. As of v1.10 (I think) the kubelet can read `--config` using the kubelet Beta ComponentConfiguration API, so now we have an interface to talk to the kubelet properly.
This PR:
- Writes the kubelet ComponentConfig to `/var/lib/kubelet/config.yaml` on init and join
- Writes an environment file to source in the kubelet systemd dropin `/var/lib/kubelet/kubeadm-flags.env`. This file contain runtime flags that should be passed to the kubelet.
- Uploads a ConfigMap with the name `kubelet-config-1.X`
- Patches the node object so that it starts using the ConfigMap with updates using Dynamic Kubelet Configuration, **only if the feature gate is set** (currently alpha and off by default, not intended to be switched on in v1.11)
- Updates the phase commands to reflect this new flow
The kubelet dropin file I used now looks like this:
```
# v1.11.x dropin as-is at HEAD
# /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
---
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
EnvironmentFile-=/var/lib/kubelet/kubeadm-flags.env
# Should default to 0 in v1.11: https://github.com/kubernetes/kubernetes/pull/63881, and hence not be here in the real v1.11 manifest
Environment="KUBELET_CADVISOR_ARGS=--cadvisor-port=0"
# Should be configurable via the config file: https://github.com/kubernetes/kubernetes/issues/63878, and hence be configured using the file in v1.11
Environment="KUBELET_CERTIFICATE_ARGS=--rotate-certificates=true"
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_CADVISOR_ARGS $KUBELET_CERTIFICATE_ARGS $KUBELET_EXTRA_ARGS
---
# v1.11.x dropin end goal
# /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
---
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
EnvironmentFile-=/var/lib/kubelet/kubeadm-flags.env
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS
---
# Environment file dynamically created at runtime by "kubeadm init"
# /var/lib/kubelet/kubeadm-flags.env
KUBELET_KUBEADM_ARGS=--cni-bin-dir=/opt/cni/bin --cni-conf-dir=/etc/cni/net.d --network-plugin=cni
```
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/kubeadm/issues/822
Fixes https://github.com/kubernetes/kubeadm/issues/571
**Special notes for your reviewer**:
**Release note**:
```release-note
"kubeadm init" now writes a structured and versioned kubelet ComponentConfiguration file to `/var/lib/kubelet/config.yaml` and an environment file with runtime flags (you can source this file in the systemd kubelet dropin) to `/var/lib/kubelet/kubeadm-flags.env`.
```
@kubernetes/sig-cluster-lifecycle-pr-reviews @mtaufen
super-user-reserved blocks, which otherwise defaults to 5% of the
entire disk.
Rationale: Reserving a percentage of the volume is generally a neither
useful nor desirable feature for volumes that aren't used as root file
systems for Linux distributions, since the reserved portion becomes
unavailable for non-root users. For containers, the general case is to
use the entire volume for data, without running as root. The case where
one might want reserved blocks enabled is much rarer.
Automatic merge from submit-queue (batch tested with PRs 63792, 63495, 63742, 63332, 63779). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
add timeout for exec interface
This should get us away from situations like https://github.com/kubernetes/kubernetes/issues/63331.
A little bit more context, the `os/exec` package starts to accept `context.Context` in golang 1.7. We should leverage that so we can have a more predictable behavior, then.
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fix mount unmount failure for a Windows pod
**What this PR does / why we need it**:
`IsLikelyNotMountPoint` func does not return correctly, for invalid symlink, it should return true(not a mount point), now it will always return false:
7711d88661/pkg/util/mount/mount_windows.go (L141-L148)7711d88661/pkg/volume/util/util.go (L147-L163)
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#63249
**Special notes for your reviewer**:
**Release note**:
```
fix mount unmount failure for a Windows pod
```
/sig windows
/assign @msau42
Automatic merge from submit-queue (batch tested with PRs 63297, 61883). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
set right Content-Type for configz
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
passthrough readOnly to subpath
**What this PR does / why we need it**:
If a volume is mounted as readonly, or subpath volumeMount is configured as readonly, then the subpath bind mount should be readonly.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#62752
**Special notes for your reviewer**:
**Release note**:
```release-note
Fixes issue where subpath readOnly mounts failed
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
use TempDir func in mount_windows_test.go
**What this PR does / why we need it**:
Use `c:\tmp` dir is not correct in windows test, this PR use `ioutil.TempDir("", xx)` to create temp dir instead.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```
none
```
/assign @jsafrane @msau42
Automatic merge from submit-queue (batch tested with PRs 62657, 63278, 62903, 63375). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add more volume types in e2e and fix part of them.
**What this PR does / why we need it**:
- Add dir-link/dir-bindmounted/dir-link-bindmounted/bockfs volume types for e2e tests.
- Fix fsGroup related e2e tests partially.
- Return error if we cannot resolve volume path.
- Because we should not fallback to volume path, if it's a symbolic link, we may get wrong results.
To safely set fsGroup on local volume, we need to implement these two methods correctly for all volume types both on the host and in container:
- get volume path kubelet can access
- paths on the host and in container are different
- get mount references
- for directories, we cannot use its mount source (device field) to identify mount references, because directories on same filesystem have same mount source (e.g. tmpfs), we need to check filesystem's major:minor and directory root path on it
Here is current status:
| | (A) volume-path (host) | (B) volume-path (container) | (C) mount-refs (host) | (D) mount-refs (container) |
| --- | --- | --- | --- | --- |
| (1) dir | OK | FAIL | FAIL | FAIL |
| (2) dir-link | OK | FAIL | FAIL | FAIL |
| (3) dir-bindmounted | OK | FAIL | FAIL | FAIL |
| (4) dir-link-bindmounted | OK | FAIL | FAIL | FAIL |
| (5) tmpfs| OK | FAIL | FAIL | FAIL |
| (6) blockfs| OK | FAIL | OK | FAIL |
| (7) block| NOTNEEDED | NOTNEEDED | NOTNEEDED | NOTNEEDED |
| (8) gce-localssd-scsi-fs| NOTTESTED | NOTTESTED | NOTTESTED | NOTTESTED |
- This PR uses `nsenter ... readlink` to resolve path in container as @msau42 @jsafrane [suggested](https://github.com/kubernetes/kubernetes/pull/61489#pullrequestreview-110032850). This fixes B1:B6 and D6, , the rest will be addressed in https://github.com/kubernetes/kubernetes/pull/62102.
- C5:D5 marked `FAIL` because `tmpfs` filesystems can share same mount source, we cannot rely on it to check mount references. e2e tests passes due to we use unique mount source string in tests.
- A7:D7 marked `NOTNEEDED` because we don't set fsGroup on block devices in local plugin. (TODO: Should we set fsGroup on block device?)
- A8:D8 marked `NOTTESTED` because I didn't test it, I leave it to `pull-kubernetes-e2e-gce`. I think it should be same as `blockfs`.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Update all script shebangs to use /usr/bin/env interpreter instead of /bin/interpreter
This is required to support systems where bash doesn't reside in /bin (such as NixOS, or the *BSD family) and allow users to specify a different interpreter version through $PATH manipulation.
https://www.cyberciti.biz/tips/finding-bash-perl-python-portably-using-env.html
```release-note
Use /usr/bin/env in all script shebangs to increase portability.
```
in cases where iptables stucks forever due to some reasons, we lost
the availability of kube-proxy, this is about adding a timeout for
the rule checking operations, as a result, it should give us a more
reliable working iptables proxy.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Support nsenter in non-systemd environments
**What this PR does / why we need it**:
In our CI, we run kubekins image for most of the jobs. This is a
debian image with upstart and does not enable systemd. So we should
* Bailout if any binary is missing other than systemd-run.
* SupportsSystemd should check the binary path to correctly
identify if the systemd-run is present or not
* Pass the errors back to the callers so kubelet is forced to
fail early when there is a problem. We currently assume
that all binaries are in the root directory by default which
is wrong.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
unpack dynamic kubelet config payloads to files
This PR unpacks the downloaded ConfigMap to a set of files on the node.
This enables other config files to ride alongside the
KubeletConfiguration, and the KubeletConfiguration to refer to these
cohabitants with relative paths.
This PR also stops storing dynamic config metadata (e.g. current,
last-known-good config records) in the same directory as config
checkpoints. Instead, it splits the storage into `meta` and
`checkpoints` dirs.
The current store dir structure is as follows:
```
- dir named by --dynamic-config-dir (root for managing dynamic config)
| - meta (dir for metadata, e.g. which config source is currently assigned, last-known-good)
| - current (a serialized v1 NodeConfigSource object, indicating the assigned config)
| - last-known-good (a serialized v1 NodeConfigSource object, indicating the last-known-good config)
| - checkpoints (dir for config checkpoints)
| - uid1 (dir for unpacked config, identified by uid1)
| - file1
| - file2
| - ...
| - uid2
| - ...
```
There are some likely changes to the above structure before dynamic config goes beta, such as renaming "current" to "assigned" for clarity, and extending the checkpoint identifier to include a resource version, as part of resolving #61643.
```release-note
NONE
```
/cc @luxas @smarterclayton
In our CI, we run kubekins image for most of the jobs. This is a
debian image with upstart and does not enable systemd. So we should:
* Bailout if any binary is missing other than systemd-run.
* SupportsSystemd should check the binary path to correctly
identify if the systemd-run is present or not
* Pass the errors back to the callers so kubelet is forced to
fail early when there is a problem. We currently assume
that all binaries are in the root directory by default which
is wrong.
This PR unpacks the downloaded ConfigMap to a set of files on the node.
This enables other config files to ride alongside the
KubeletConfiguration, and the KubeletConfiguration to refer to these
cohabitants with relative paths.
This PR also stops storing dynamic config metadata (e.g. current,
last-known-good config records) in the same directory as config
checkpoints. Instead, it splits the storage into `meta` and
`checkpoints` dirs.
Automatic merge from submit-queue (batch tested with PRs 62467, 62482, 62211). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fix nsenter GetFileType issue in containerized kubelet
**What this PR does / why we need it**:
UnmountDevice would fail in containerized kubelet in v1.9.x, and in the end, pod with volume mount could not be scheduled from one node to another since the original volume will never be released. This PR fixed this issue.
In [nsenter_mount.GetFileType func](https://github.com/kubernetes/kubernetes/blob/master/pkg/util/mount/nsenter_mount.go#L238), return error of following code will be different than [mount_linux.GetFileType func](https://github.com/kubernetes/kubernetes/blob/master/pkg/util/mount/mount.go#L347)
```
outputBytes, err := mounter.ne.Exec("stat", []string{"-L", `--printf "%F"`, pathname}).CombinedOutput()
```
Return error and output would be like following in nsenter_mount.GetFileType func:
error: `exit status 1`
output: `/usr/bin/stat: cannot stat '2': No such file or directory`
This PR makes the return error consistent as mount_linux.GetFileType func, and finally makes [isDeviceOpened func](https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/util/operationexecutor/operation_generator.go#L1340) work.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#62282
**Special notes for your reviewer**:
/assign @dixudx
**Release note**:
```
fix nsenter GetFileType issue in containerized kubelet
```
/sig node
Automatic merge from submit-queue (batch tested with PRs 61608, 62304). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Remove isNotDir error check
**What this PR does / why we need it**:
This check was supposed to handle the "subpath file" scenario, but:
1. It's wrong (should have been !)
2. It's not needed anymore. `IsLikelyNotMountPoint` was fixed to handle file mounts via https://github.com/kubernetes/kubernetes/pull/58433
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 62324, 61459, 62475, 62476, 61914). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Return error in mount_unsupported for unsupported platforms
**What this PR does / why we need it**:
Presently, `util/mount_unsupported.go` does not return any errors. For unsupported platforms, this hides failures. This PR returns errors, thereby properly informing users attempting to run on an unsupported platform.
**Which issue(s) this PR fixes** :
Fixes https://github.com/kubernetes/kubernetes/issues/61919
**Special notes for your reviewer**:
There are a few methods that simply call through to other methods. I did not return errors from those methods.
I've also updated an error check and message in `volume/fc/fc_test.go`, since it was ignoring an error on unsupported platforms.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 60990, 60947, 45275, 60565, 61091). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fix service loadbalancer source range for ipvs proxy mode
**What this PR does / why we need it**:
fix service loadbalancer source range for ipvs proxy mode
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#61090
**Special notes for your reviewer**:
**Release note**:
```
NONE
```
In shell scripts inside [[ .. ]] blocks, ">" is a string comparison
operator. The return value check using it appears to work mostly by
accident, because the only values are "0" and "1". Change to -gt
operator.
Automatic merge from submit-queue (batch tested with PRs 61453, 61393, 61379, 61373, 61494). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Use inner volume name instead of outer volume name for subpath directory
**What this PR does / why we need it**:
Fixes volume reconstruction for PVCs with subpath
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#61372
**Special notes for your reviewer**:
**Release note**:
```release-note
ACTION REQUIRED: In-place node upgrades to this release from versions 1.7.14, 1.8.9, and 1.9.4 are not supported if using subpath volumes with PVCs. Such pods should be drained from the node first.
```