Automatic merge from submit-queue
Rackspace improvements (OpenStack Cinder)
This adds PV support via Cinder on Rackspace clusters. Rackspace Cloud Block Storage is pretty much vanilla OpenStack Cinder, so there is no need for a separate Volume Plugin. Instead I refactored the Cinder/OpenStack interaction a bit (by introducing a CinderProvider Interface and moving the device path detection logic to the OpenStack part).
Right now this is limited to `AttachDisk` and `DetachDisk`. Creation and deletion of Block Storage is not in scope of this PR.
Also the `ExternalID` and `InstanceID` cloud provider methods have been implemented for Rackspace.
Automatic merge from submit-queue
Add mpio support for iscsi
This allows the iscsi volume to check if a iscsi device belongs to a mpio device
If it does belong to the device then we make sure we mount the mpio device instead of
the raw device.
The code is based on the current FibreChannel volume support for mpio
example
/dev/disk/by-path/iqn-example.com.2999 -> /dev/sde
Then we check
/sys/block/[dm-X]/slaves/xx
until we find the [dm-X] containing /dev/sde and mount it
Additional work that can be done in future
1. Add multiple portal support to iscsi
2. Move the FibreChannel volume provider to use the code that has been extracted
If it does belong to the device then we make sure we mount the mpio device instead of
the raw device.
Heuristics
Login into /dev/disk/by-path/iqn-example.com.2999 -> /dev/sde
Check if sde existsin in /sys/block/[dm-X]/slaves/xx
If it does mount /dev/[dm-x] which will look like /dev/mapper/mpiodevicename in mount
examples/iscsi has more details
Automatic merge from submit-queue
Additional go vet fixes
Mostly:
- pass lock by value
- bad syntax for struct tag value
- example functions not formatted properly
AWS has soft support limit for 40 attached EBS devices. Assuming there is just
one root device, use the rest for persistent volumes.
The devices will have name /dev/xvdba - /dev/xvdcm, leaving /dev/sda - /dev/sdz
to the system.
Also, add better error handling and propagate error
"Too many EBS volumes attached to node XYZ" to a pod.
In podSecurityPolicy:
1. Rename .seLinuxContext to .seLinux
2. Rename .seLinux.type to .seLinux.rule
3. Rename .runAsUser.type to .runAsUser.rule
4. Rename .seLinux.SELinuxOptions
1,2,3 as suggested by thockin in #22159.
I added 3 for consistency with 2.
Similar to #11543, the local hostname is not guaranteed to be the node
name, as the AWS cloud provider looks up node name using
`private-dns-name`. This value can be different such as when using a
private hosted zone.
The previous code uses GetHostName(), which fails in this case. Instead,
pass in an empty string so the aws cloud provider will use the cached
self instance to find the instance id.
Authors: @balooo, @dogan-sky, @jsravn
This is a first-aid bandage to let admission controller ignore persistent
volumes that are being provisioned right now and thus may not exist in
external cloud infrastructure yet.
Volume names have now format <cluster-name>-dynamic-<pv-name>.
pv-name is guaranteed to be unique in Kubernetes cluster, adding
<cluster-name> ensures we don't conflict with any running cluster
in the cloud project (kube-controller-manager --cluster-name=XXX).
'kubernetes' is the default cluster name.
* Metrics will not be expose until they are hooked up to a handler
* Metrics are not cached and expose a dos vector, this must be fixed before release or the stats should not be exposed through an api endpoint
We are (sadly) using a copy-and-paste of the GCE PD code for AWS EBS.
This code hasn't been updated in a while, and it seems that the GCE code
has some code to make volume mounting more robust that we should copy.
Add a mutex to guard SetUpAt() and TearDownAt() calls - they should not
run in parallel. There is a race in these calls when there are two pods
using the same volume, one of them is dying and the other one starting.
TearDownAt() checks that a volume is not needed by any pods and detaches the
volume. It does so by counting how many times is the volume mounted
(GetMountRefs() call below).
When SetUpAt() of the starting pod already attached the volume and did not mount
it yet, TearDownAt() of the dying pod will detach it - GetMountRefs() does not
count with this volume.
These two threads run in parallel:
dying pod.TearDownAt("myVolume") starting pod.SetUpAt("myVolume")
| |
| AttachDisk("myVolume")
refs, err := mount.GetMountRefs() |
Unmount("myDir") |
if refs == 1 { |
| | Mount("myVolume", "myDir")
| | |
| DetachDisk("myVolume") |
| start containers - OOPS! The volume is detached!
|
finish the pod cleanup
Also, add some logs to cinder plugin for easier debugging in the future, add
a test and update the fake mounter to know about bind mounts.
GCE disks don't have tags, we must encode the tags into Description field.
It's encoded as JSON, which is both human and machine readable:
description: '{"kubernetes.io/created-for/pv/name":"pv-gce-oxwts","kubernetes.io/created-for/pvc/name":"myclaim","kubernetes.io/created-for/pvc/namespace":"default"}'
We adapt the existing code to work across all zones in a region.
We require a feature-flag to enable Ubernetes-Lite
Reasons:
* There are some behavioural changes if users create volumes with
the same name in two zones.
* We don't want to make one API call per zone if we're not running
Ubernetes-Lite.
* Ubernetes-Lite is still experimental.
There isn't a parallel flag implemented for AWS, because at the moment
there would be no behaviour changes from this.
This synchronizes Cinder with AWS EBS code, where we already tag volumes with
claim.Namespace and claim.Name (and pv.Name, as suggested in separate PR).
- Add volume.MetricsProvider function to Volume interface.
- Add volume.MetricsDu for providing metrics via executing "du".
- Add volulme.MetricsNil for unsupported Volumes.
This enables use of software or hardware transports viz. be2iscsi,
bnx2i, cxgb3i, cxgb4i, qla4xx, iser and ocs. The default transport
(tcp) happens to be called "default".
Use of non-default transports changes the disk path to the following format:
/dev/disk/by-path/pci-<pci_id>-ip-<portal>-iscsi-<iqn>-lun-<lun_id>
Code comments currently claim the default iscsi mount path as
kubernetes.io/pod/iscsi/<portal>-iqn-<iqn>-lun-<id>, however actual
path being used is
kubernetes.io/iscsi/iscsi/<portal>-iqn-<iqn>-lun-<id>
This leads to ultimate path being similar to this :
kubernetes.io/iscsi/iscsi/...iqn-iqn...-lun-N
Both iscsi and iqn are repated twice for no reason, since "iqn" is
required by spec to be part of an iqn. This is also wrong on
multiple leves as actual allowed naming formats are :
iqn.2001-04.com.example:storage:diskarrays-sn-a8675309
eui.02004567A425678D
(RFC 3720 3.2.6.3)
and in the second case "iqn-eui" in the path would be misleading.
Change this to a more reasonable path of
kubernetes.io/iscsi/<portal>-<iqn>-lun-<id>
which also aligns up with how the /dev/by-path and sysfs entries
are created for iscsi devices on linux
* -- *
Update iSCSI README and sample json file
There seems to have been quite a skew in recent updates to these
files adding in wrong info or info that no longer lines up the
sample config with the README.
Fixed the following issues :
* Fix discrepancy in samples json using initiator iqn from previous
linked example as target iqn (which was just wrong)
* Generate sample output and README from the same json config provided.
* Remove recommendation to edit initiator name, this is not required
(open-iscsi warns against editing this manually and provides a utility
for the same)
* Update docker inspect command to one that works.
* Use separate LUNs for separate mount points instead of re-using.
remount was originally needed to ensure rw/ro is set correctly. There is no such need since mount is using exec interface
Signed-off-by: Huamin Chen <hchen@redhat.com>
Flocker [1] is an open-source container data volume manager for
Dockerized applications.
This PR adds a volume plugin for Flocker.
The plugin interfaces the Flocker Control Service REST API [2] to
attachment attach the volume to the pod.
Each kubelet host should run Flocker agents (Container Agent and Dataset
Agent).
The kubelet will also require environment variables that contain the
host and port of the Flocker Control Service. (see Flocker architecture
[3] for more).
- `FLOCKER_CONTROL_SERVICE_HOST`
- `FLOCKER_CONTROL_SERVICE_PORT`
The contribution introduces a new 'flocker' volume type to the API with
fields:
- `datasetName`: which indicates the name of the dataset in Flocker
added to metadata;
- `size`: a human-readable number that indicates the maximum size of the
requested dataset.
Full documentation can be found docs/user-guide/volumes.md and examples
can be found at the examples/ folder
[1] https://clusterhq.com/flocker/introduction/
[2] https://docs.clusterhq.com/en/1.3.1/reference/api.html
[3] https://docs.clusterhq.com/en/1.3.1/concepts/architecture.html
rbd: if rbd image is not formatted, format it to the designated filesystem type
rbd: update example README.md and include instructions to get base64 encoded Ceph secret
if rbd fails to lock image, unmap the image before exiting
Signed-off-by: Huamin Chen <hchen@redhat.com>
GlusterFS by default uses a log file based on the mountpoint path munged into a
file, i.e. `/mnt/foo/bar` becomes `/var/log/glusterfs/mnt-foo-bar.log`.
On certain Kubernetes environments this can result in a log file that exceeds
the 255 character length most filesystems impose on filenames causing the mount
to fail. Instead, use the `log-file` mount option to place the log file under
the kubelet plugin directory with a filename of our choosing keeping it fairly
persistent in the case of troubleshooting.
A lot of packages use StringSet, but they don't use anything else from
the util package. Moving StringSet into another package will shrink
their dependency trees significantly.
This code was originally added because the first mount call did not
respect the ro option. This no longer seems to be the cause so there
is no need to use remount.
Signed-off-by: Sami Wagiaalla <swagiaal@redhat.com>
The GCE PD plugin uses safe_format_and_mount found on standard GCE images:
https://github.com/GoogleCloudPlatform/compute-image-packages/blob/master/google-startup-scripts/usr/share/google/safe_format_and_mount
On custom images where this is not available pods fail to format and
mount GCE PDs. This patch uses linux utilities in a similar way to the
safe_format_and_mount script to format and mount the GCE PD and AWS EBC
devices. That is first attempt a mount. If mount fails try to use file to
investigate the device. If 'file' fails to get any information about
the device and simply returns "data" then assume the device is not
formatted and format it and attempt to mount it again.
Signed-off-by: Sami Wagiaalla <swagiaal@redhat.com>
IsLikelyNotMountPoint determines if a directory is not a mountpoint.
It is fast but not necessarily ALWAYS correct. If the path is in fact
a bind mount from one part of a mount to another it will not be detected.
mkdir /tmp/a /tmp/b; mount --bin /tmp/a /tmp/b; IsLikelyNotMountPoint("/tmp/b")
will return true. When in fact /tmp/b is a mount point. So this patch
renames the function and switches it from a positive to a negative (I
could think of a good positive name). This should make future users of
this function aware that it isn't quite perfect, but probably good
enough.