Automatic merge from submit-queue
Add volume reconstruct/cleanup logic in kubelet volume manager
Currently kubelet volume management works on the concept of desired
and actual world of states. The volume manager periodically compares the
two worlds and perform volume mount/unmount and/or attach/detach
operations. When kubelet restarts, the cache of those two worlds are
gone. Although desired world can be recovered through apiserver, actual
world can not be recovered which may cause some volumes cannot be cleaned
up if their information is deleted by apiserver. This change adds the
reconstruction of the actual world by reading the pod directories from
disk. The reconstructed volume information is added to both desired
world and actual world if it cannot be found in either world. The rest
logic would be as same as before, desired world populator may clean up
the volume entry if it is no longer in apiserver, and then volume
manager should invoke unmount to clean it up.
Fixes https://github.com/kubernetes/kubernetes/issues/27653
Currently kubelet volume management works on the concept of desired
and actual world of states. The volume manager periodically compares the
two worlds and perform volume mount/unmount and/or attach/detach
operations. When kubelet restarts, the cache of those two worlds are
gone. Although desired world can be recovered through apiserver, actual
world can not be recovered which may cause some volumes cannot be cleaned
up if their information is deleted by apiserver. This change adds the
reconstruction of the actual world by reading the pod directories from
disk. The reconstructed volume information is added to both desired
world and actual world if it cannot be found in either world. The rest
logic would be as same as before, desired world populator may clean up
the volume entry if it is no longer in apiserver, and then volume
manager should invoke unmount to clean it up.
This commit adds a new volume manager in kubelet that synchronizes
volume mount/unmount (and attach/detach, if attach/detach controller
is not enabled).
This eliminates the race conditions between the pod creation loop
and the orphaned volumes loops. It also removes the unmount/detach
from the `syncPod()` path so volume clean up never blocks the
`syncPod` loop.
If it does belong to the device then we make sure we mount the mpio device instead of
the raw device.
Heuristics
Login into /dev/disk/by-path/iqn-example.com.2999 -> /dev/sde
Check if sde existsin in /sys/block/[dm-X]/slaves/xx
If it does mount /dev/[dm-x] which will look like /dev/mapper/mpiodevicename in mount
examples/iscsi has more details
- Add volume.MetricsProvider function to Volume interface.
- Add volume.MetricsDu for providing metrics via executing "du".
- Add volulme.MetricsNil for unsupported Volumes.
This enables use of software or hardware transports viz. be2iscsi,
bnx2i, cxgb3i, cxgb4i, qla4xx, iser and ocs. The default transport
(tcp) happens to be called "default".
Use of non-default transports changes the disk path to the following format:
/dev/disk/by-path/pci-<pci_id>-ip-<portal>-iscsi-<iqn>-lun-<lun_id>
Code comments currently claim the default iscsi mount path as
kubernetes.io/pod/iscsi/<portal>-iqn-<iqn>-lun-<id>, however actual
path being used is
kubernetes.io/iscsi/iscsi/<portal>-iqn-<iqn>-lun-<id>
This leads to ultimate path being similar to this :
kubernetes.io/iscsi/iscsi/...iqn-iqn...-lun-N
Both iscsi and iqn are repated twice for no reason, since "iqn" is
required by spec to be part of an iqn. This is also wrong on
multiple leves as actual allowed naming formats are :
iqn.2001-04.com.example:storage:diskarrays-sn-a8675309
eui.02004567A425678D
(RFC 3720 3.2.6.3)
and in the second case "iqn-eui" in the path would be misleading.
Change this to a more reasonable path of
kubernetes.io/iscsi/<portal>-<iqn>-lun-<id>
which also aligns up with how the /dev/by-path and sysfs entries
are created for iscsi devices on linux
* -- *
Update iSCSI README and sample json file
There seems to have been quite a skew in recent updates to these
files adding in wrong info or info that no longer lines up the
sample config with the README.
Fixed the following issues :
* Fix discrepancy in samples json using initiator iqn from previous
linked example as target iqn (which was just wrong)
* Generate sample output and README from the same json config provided.
* Remove recommendation to edit initiator name, this is not required
(open-iscsi warns against editing this manually and provides a utility
for the same)
* Update docker inspect command to one that works.
* Use separate LUNs for separate mount points instead of re-using.
This code was originally added because the first mount call did not
respect the ro option. This no longer seems to be the cause so there
is no need to use remount.
Signed-off-by: Sami Wagiaalla <swagiaal@redhat.com>