It takes a variable amount of time for the multipath daemon
to create /dev/dm-XX in response to new LUNs being discovered.
The old iscsi_util code only discovered the multipath device
if it was created quickly enough, but in a significant number
of cases, kubelet would grab one of the individual paths and
put a filesystem it on before multipathd could construct a
multipath device.
This change waits for the multipath device to get created for
up to 10 seconds, but only if the PV actually had more than
one portal.
This change ensures that iSCSI block devices are deleted after
unmounting, and implements scanning of individual LUNs rather
than scanning the whole iSCSI bus.
In cases where an iSCSI bus is in use by more than one attachment,
detaching used to leave behind phantom block devices, which could
cause I/O errors, long timeouts, or even corruption in the case
when the underlying LUN number was recycled. This change makes
sure to flush references to the block devices after unmounting.
The original iSCSI code scanned the whole target every time a LUN
was attached. On storage controllers that export multiple LUNs on
the same target IQN, this led to a situation where nodes would
see SCSI disks that they weren't supposed to -- possibly dozens or
hundreds of extra SCSI disks. This caused 3 significant problems:
1) The large number of disks wasted resources on the node and
caused a minor drag on performance.
2) The scanning of all the devices caused a huge number of uevents
from the kernel, causing udev to bog down for multiple minutes in
some cases, triggering timeouts and other transient failures.
3) Because Kubernetes was not tracking all the "extra" LUNs that
got discovered, they would not get cleaned up until the last LUN
on a particular target was detached, causing a logout. This led
to significant complications:
In the time window between when a LUN was unintentially scanned,
and when it was removed due to a logout, if it was deleted on the
backend, a phantom reference remained on the node. In the best
case, the phantom LUN would cause I/O errors and timeouts in the
udev system. In the worst case, the backend could reuse the LUN
number for a new volume, and if that new volume were to be
scheduled to a pod with a phantom reference to the old LUN by the
same number, the initiator could get confused and possibly corrupt
data on that volume.
To avoid these problems, the new implementation only scans for
the specific LUN number it expects to see. It's worth noting that
the default behavior of iscsiadm is to automatically scan the
whole bus on login. That behavior can be disabled by setting
node.session.scan = manual
in iscsid.conf, and for the reasons mentioned above, it is
strongly recommended to set that option. This change still works
regardless of the setting in iscsid.conf, and while automatic
scanning will cause some problems, this change doesn't make the
problems any worse, and can make things better in some cases.
These are all flagged by Go 1.11's
more accurate printf checking in go vet,
which runs as part of go test.
Lubomir I. Ivanov <neolit123@gmail.com>
applied ammend for:
pkg/cloudprovider/provivers/vsphere/nodemanager.go
Google's configure-helper.sh script bind-mounts /var/lib/kubelet somewhere
into /home/kubernetes and thus every mount that Kubernetes does is visible
twice in /proc/mounts.
iSCSI and RBD should not rely on counting on entries in /proc/mounts and
unmount device when Kubernetes thinks it's unusued. Kubernetes tracks
the mounts by itself and most of other volume plugins rely on it safely.
This PR add comments for the background why plugin gets loopback
device and removes loopback device even if operation_generator has
same functionality.
If the default iSCSI node.startup is set to automatic, if there is a node failure,
any pods on that node will get rescheduled to another node. If the failed node is
later brought back up it will then try to log back in to any iSCSI sessions it had
prior to the failure, which may no longer exist or may be now in-use by the other
nodes.
It appears most platforms keep the open-iscsi default of node.startup-automatic.
But in case this system-wide setting has been changed, and just to be explicit, this
sets node.startup values for kubernetes controlled volumes to manual.
Closes issue #21305
This PR makes following changes.
- Simplify volume tearDown path for iSCSI and FC using
util.UnmountPath().
- Log lastErr during iscsi connection
If iscsid fails to connect second portal, currently
the error is ignored silently. The lastErr should be
logged to find the root cause of problem.
- Remove iscsi plugin directory after iscsi connection
is successfully closed.
WaitForAttach failed consistently with this error:
Heuristic determination of mount point failed:stat /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.128.0.3:3260-iqn.2003-01.org.linux-iscsi.f21.x8664:sn.4b0aae584f7c-lun-0: no such file or directory
We should ignore "no such file or directory" eror, the directory is created
just few lines below.
This PR adds iSCSI initiatorname parameter to ISCSIVolumeSource
to enable automatic configuration of initiator name per volume.
This would allow for more fine grained configuration, and remove
the need to configure the initiator name on the host by
administrator.
fixes: #47311
If iscsiTransport is not tcp, iSCSI plugin tries to
find devicepath using filepath.Glob but never updates
devicepath with the filepath.Glob result.
This patch fixes the problem.
Fixes#47253
When using iscsi storage with multiple target portal (TP)
addresses and multipathing the volume manager logs on to
the IQN for all portal addresses, but when a pod gets
destroyed the volume manager only logs out for the primary
TP and sessions for another TPs are always remained.
This patch adds methods to store and load iscsi disk
configrations, then uses the stored config at DetachDisk
path.
Fix#45394
If it does belong to the device then we make sure we mount the mpio device instead of
the raw device.
Heuristics
Login into /dev/disk/by-path/iqn-example.com.2999 -> /dev/sde
Check if sde existsin in /sys/block/[dm-X]/slaves/xx
If it does mount /dev/[dm-x] which will look like /dev/mapper/mpiodevicename in mount
examples/iscsi has more details
This enables use of software or hardware transports viz. be2iscsi,
bnx2i, cxgb3i, cxgb4i, qla4xx, iser and ocs. The default transport
(tcp) happens to be called "default".
Use of non-default transports changes the disk path to the following format:
/dev/disk/by-path/pci-<pci_id>-ip-<portal>-iscsi-<iqn>-lun-<lun_id>
Code comments currently claim the default iscsi mount path as
kubernetes.io/pod/iscsi/<portal>-iqn-<iqn>-lun-<id>, however actual
path being used is
kubernetes.io/iscsi/iscsi/<portal>-iqn-<iqn>-lun-<id>
This leads to ultimate path being similar to this :
kubernetes.io/iscsi/iscsi/...iqn-iqn...-lun-N
Both iscsi and iqn are repated twice for no reason, since "iqn" is
required by spec to be part of an iqn. This is also wrong on
multiple leves as actual allowed naming formats are :
iqn.2001-04.com.example:storage:diskarrays-sn-a8675309
eui.02004567A425678D
(RFC 3720 3.2.6.3)
and in the second case "iqn-eui" in the path would be misleading.
Change this to a more reasonable path of
kubernetes.io/iscsi/<portal>-<iqn>-lun-<id>
which also aligns up with how the /dev/by-path and sysfs entries
are created for iscsi devices on linux
* -- *
Update iSCSI README and sample json file
There seems to have been quite a skew in recent updates to these
files adding in wrong info or info that no longer lines up the
sample config with the README.
Fixed the following issues :
* Fix discrepancy in samples json using initiator iqn from previous
linked example as target iqn (which was just wrong)
* Generate sample output and README from the same json config provided.
* Remove recommendation to edit initiator name, this is not required
(open-iscsi warns against editing this manually and provides a utility
for the same)
* Update docker inspect command to one that works.
* Use separate LUNs for separate mount points instead of re-using.
IsLikelyNotMountPoint determines if a directory is not a mountpoint.
It is fast but not necessarily ALWAYS correct. If the path is in fact
a bind mount from one part of a mount to another it will not be detected.
mkdir /tmp/a /tmp/b; mount --bin /tmp/a /tmp/b; IsLikelyNotMountPoint("/tmp/b")
will return true. When in fact /tmp/b is a mount point. So this patch
renames the function and switches it from a positive to a negative (I
could think of a good positive name). This should make future users of
this function aware that it isn't quite perfect, but probably good
enough.