This PR add comments for the background why plugin gets loopback
device and removes loopback device even if operation_generator has
same functionality.
Automatic merge from submit-queue (batch tested with PRs 57572, 57512, 57770). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
RBD Plugin: Pass monitors addresses in a comma-separed list instead of trying one by one.
**What this PR does / why we need it**:
In production, monitors may crash (or have a network problem), if we try monitors one by one, rbd
command will hang a long time (e.g. `rbd map -m <unconnectable_host_ip>`
on linux 4.4 timed out in 6 minutes) when trying a unconnectable monitor. This is unacceptable.
Actually, we can simply pass a comma-separated list monitor addresses to `rbd`
command utility. Kernel rbd/libceph modules will pick monitor randomly
and try one by one, `rbd` command utility succeed soon if there is a
good one in monitors list.
[Docs](http://docs.ceph.com/docs/jewel/man/8/rbd/#cmdoption-rbd-m) about `-m` option of `rbd` is wrong, 'rbd' utility simply pass '-m <mon>' parameter to kernel rbd/libceph modules, which
takes a comma-seprated list of one or more monitor addresses (e.g. ip1[:port1][,ip2[:port2]...]) in its first version in linux (see 602adf4002/net/ceph/ceph_common.c (L239)). Also, libceph choose monitor randomly, so we can simply pass all addresses without randomization (see 602adf4002/net/ceph/mon_client.c (L132)).
From what I saw, there is no need to iterate monitor hosts one by one.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
Run `rbd map` against unconnectable monitor address logs on Linux 4.4:
```
root@myhost:~# uname -a
Linux myhost 4.4.0-62-generic #83-Ubuntu SMP Wed Jan 18 14:10:15 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
root@myhost:~# time rbd map kubernetes-dynamic-pvc-941ff4d2-b951-11e7-8836-049fca8e58df --pool <pool> --id <id> -m <unconnectable_host_ip> --key=<password>
rbd: sysfs write failed
2017-12-20 18:55:11.810583 7f7ec56863c0 0 monclient(hunting): authenticate timed out after 300
2017-12-20 18:55:11.810638 7f7ec56863c0 0 librados: client.<id> authentication error (110) Connection timed out
rbd: couldn't connect to the cluster!
In some cases useful info is found in syslog - try "dmesg | tail" or so.
rbd: map failed: (110) Connection timed out
real 6m0.018s
user 0m0.052s
sys 0m0.064s
```
We can simply pass a comma-separated list of monitors, if there is a good one in them, `rbd map` succeed soon.
```
root@myhost:~# time rbd map kubernetes-dynamic-pvc-941ff4d2-b951-11e7-8836-049fca8e58df --pool <pool> --id <id> -m <unconnectable_host_ip>,<good_host_ip> --key=<password>
/dev/rbd3
real 0m0.426s
user 0m0.008s
sys 0m0.008s
```
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fix rbd volume ConstructVolumeSpec
**What this PR does / why we need it**:
1. rbd plugin.ConstructVolumeSpec() construct volume spec with fake value, cause kubelet volume manager will create two volumesInUse in node Status.
2. change plugin.GetVolumeName(), create volumeName using rbd pool instead of monitors, because monitors is a group of IPs, which makes the volumeName too long. Also, this is to fit plugin.ConstructVolumeSpec() since makeGlobalPDName only uses rbd pool and image.
```
before fix:
volumesAttached:
- devicePath: ""
name: kubernetes.io/rbd/[xxxxxxx:6789 xxxxxxxxx:6789]:volume-9a106847-4def-4d1e-9603-4c7099b22a31
volumesInUse:
- 'kubernetes.io/rbd/[]:'
- kubernetes.io/rbd/[xxxxxxx:6789 xxxxxxxxx:6789]:volume-9a106847-4def-4d1e-9603-4c7099b22a31
after fix:
volumesAttached:
- devicePath: ""
name: kubernetes.io/rbd/volumes:volume-9a106847-4def-4d1e-9603-4c7099b22a31
volumesInUse:
- kubernetes.io/rbd/volumes:volume-9a106847-4def-4d1e-9603-4c7099b22a31
trying one by one.
In production, monitors may crash (or have a network problem), if we try
monitors one by one, rbd command will hang a long time (e.g. `rbd map -m
<unconnectable_host_ip>` on linux 4.4 timed out in 6 minutes) when
trying a unconnectable monitor. This is unacceptable.
Actually, we can simply pass a comma-separed list monitor addresses
to `rbd` command utility. Kernel rbd/libceph modules will pick
monitor randomly and try one by one, `rbd` command utility succeed soon
if there is a good one in monitors list.
Automatic merge from submit-queue (batch tested with PRs 56480, 56675, 56624, 56648, 56658). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
minor fix error typo of rbd volume teardown
fix typo in rbd volume plugin `diskTearDown`
Automatic merge from submit-queue (batch tested with PRs 54604, 55781, 55806, 55935, 55991). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Use core client with explicit version
**What this PR does / why we need it**:
Core client without explicit version has been deprecated, change them to the one with explicit version.
**Which issue(s) this PR fixes**:
Fixes partially #55993
**Special notes for your reviewer**:
/cc @kubernetes/sig-api-machinery-pr-reviews
/cc @caesarxuchao @k82cn @sttts @kevin-wangzefeng
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
RBD Plugin: Fix bug in checking command not found error.
**What this PR does / why we need it**:
Fix bug in error checking logic.
`Error()` method of command not found error returned from `command.Run/Output` is not "executable file not found in $PATH". Actually, it's `exec: "<command>": executable file not found in $PATH`.
I followed the logic in https://github.com/kubernetes/kubernetes/blob/v1.9.0-alpha.1/pkg/kubectl/cmd/util/editor/editor.go#L129 to detect command not found error.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
https://play.golang.org/p/yZJxtouUQL
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Rename Detach() parameter.
`Detach()` does not get device name, it gets volume name. Parameters named `deviceMountPath` or `deviceName` just confuses developers.
Note that this PR just renames parameters here and there, there should be no behavior change.
@kubernetes/sig-storage-pr-reviews
/assign @gnufied @jingxu97
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Update volume OWNERS to reflect active sig-storage reviewers
**What this PR does / why we need it**:
Update sig-storage reviewers to add new members and remove those that don't have as much time to review storage PRs. Approvers are unchanged.
**Special notes for your reviewer**:
For all those that have been removed, please approve. If you want to remain as a reviewer, let me know and I will add you back.
**Release note**:
NONE
Automatic merge from submit-queue (batch tested with PRs 54635, 54250, 54657, 54696, 54700). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Don't cache exec and mounter in RBD volume plugin
#51608 has broken containerized RBD mount utilities proposed in https://github.com/kubernetes/kubernetes/pull/53440.
Volume plugin can get a different exec and mounter implementation with every call, it must not be cached.
```release-note
NONE
```
/sig storage
/assign @rootfs
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fix incorrect log
**What this PR does / why we need it**:
fix incorrect log in nfs_test.go
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
NONE
**Special notes for your reviewer**:
NONE
**Release note**:
NONE
With central attachdetach controller, we don't need to lock the image
any more. But for backward compatibility, we should:
1) Check if the image is still used by nodes running old kubelet in
attaching.
2) Clean old rbd.json file and remove lock if found in detaching.
1) Modify rbdPlugin to implement volume.AttachableVolumePlugin
interface.
2) Add rbdAttacher/rbdDetacher structs to implement
volume.Attacher/Detacher interfaces.
3) Add mount.SafeFormatAndMount/mount.Exec fields to rbdPlugin, and
setup them in rbdPlugin.Init for later uses.
Attacher/Mounter/Unmounter/Detacher reference rbdPlugin to use mounter
and exec. This simplifies code.
4) Add testcase struct to abstract RBD Plugin test case, etc.
5) Add newRBD constructor to unify rbd struct initialization.
Automatic merge from submit-queue (batch tested with PRs 54229, 54380, 54302, 54454). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
RBD Plugin: rbdStatus only check output of successful `rbd status` run
**What this PR does / why we need it**:
Current RBDUtil.rbdStatus implementation never return error in any cases, even if command not found or `rbd status` never succeed. This PR change it to only check output of successful `rbd status` run, and return error on other cases.
Because there are maybe network problem or ceph cluster unresponsive conditions which will cause `rbd status` command to fail. We cannot assume there is no watchers on given image. It's better to return error, and let the caller to decide what to do.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
RBD Plugin: Omit volume.MetricsProvider field and add some testcases.
**What this PR does / why we need it**:
Embedded struct `volume.MetricProvider` is capitalized and should be omitted in JSON marshalling. It's also a unmarshalable struct, will cause error in `RBDUtil.loadRBD`.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
It's a bug I introduced in https://github.com/kubernetes/kubernetes/pull/48486. It's my bad, sorry about that.
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 51574, 51534, 49257, 44680, 48836)
rbd: default image format to v2 instead of deprecated v1
**What this PR does / why we need it**:
Image format v1 has been deprecated since the Infernalis release of
Ceph over two years ago.
**Release note**:
```StorageClass Ceph RBD now defaults to using the v2 image format ```
Automatic merge from submit-queue (batch tested with PRs 51114, 51233, 51024, 51053, 51197)
rbd: Use VolumeHost.GetExec() to execute stuff in volume plugins
**What this PR does / why we need it**:
This PR updates rbd volume plugin to use `VolumeHost.GetExec()` to execute utilities like mkfs and lsblk instead of simple `os/exec`. This prepares the volume plugin to run these utilities in containers instead of running them on the host + makes the volume plugin more independent and less hardcoded.
See proposal in https://github.com/kubernetes/community/pull/589.
Note that this PR does **not** change place where the utilities are executed - `VolumeHost.GetExec()` still leads directly to `os/exec`. It will be changed when the aforementioned proposal is merged and implemented.
@kubernetes/sig-storage-pr-reviews
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
RBD Plugin: Log RBD Attach/Mount/Unmout actions in addition to Detach
**What this PR does / why we need it**:
Currently, RBD Plugin can log a info message for a successful action of RBD Unmap, e.g.:
```
I0822 09:32:31.595162 15177 rbd_util.go:349] rbd: successfully unmap device /dev/rbd0
```
This PR adds logs for another three important actions: Attach, Mount and Unmount.
Logging these actions and associated info is *very* useful in diagnosing problems.
**Special notes for your reviewer**:
Example RBD Plugin logs of successful pod volume attaching and mounting:
```
I0822 09:30:27.512015 15177 rbd_util.go:148] lock list output "2017-08-22 09:30:27.493889 7fa4ae3c23c0 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.kube.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory\n"
W0822 09:30:27.547513 15177 rbd_util.go:460] rbd: no watchers on kubernetes-dynamic-pvc-83bfd49e-871c-11e7-b88e-000c291fbe71
I0822 09:30:27.704703 15177 rbd_util.go:315] rbd: successfully map image kube/kubernetes-dynamic-pvc-83bfd49e-871c-11e7-b88e-000c291fbe71 to /dev/rbd0
I0822 09:30:27.965603 15177 rbd_util.go:322] rbd: successfully mount image kube/kubernetes-dynamic-pvc-83bfd49e-871c-11e7-b88e-000c291fbe71 at /var/lib/kubelet/plugins/kubernetes.io/rbd/rbd/kube-image-kubernetes-dynamic-pvc-83bfd49e-871c-11e7-b88e-000c291fbe71
```
Example RBD Plugin logs of successful pod volume detaching and unmouting:
```
I0822 09:32:31.380124 15177 rbd_util.go:334] rbd: successfully umount mountpoint /var/lib/kubelet/plugins/kubernetes.io/rbd/rbd/kube-image-kubernetes-dynamic-pvc-83bfd49e-871c-11e7-b88e-000c291fbe71
I0822 09:32:31.459867 15177 rbd_util.go:148] lock list output "2017-08-22 09:32:31.443643 7f2bb8ab53c0 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.kube.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory\nThere is 1 exclusive lock on this image.\nLocker ID Address \nclient.64117 kubelet_lock_magic_k8s 192.168.2.128:0/4124042516 \n"
I0822 09:32:31.595162 15177 rbd_util.go:349] rbd: successfully unmap device /dev/rbd0
```
It does not add too much logs, but admins/ops can know what RBD plugin are doing internally and exact time a RBD image is mapped, mounted or unmounted (in addition to unmapped).
**Release note**:
```release-note
NONE
```