mirror of https://github.com/k3s-io/k3s
Proposal for external dynamic provisioners
parent
a47ac1fa30
commit
95197536fc
|
@ -64,10 +64,7 @@ types of volumes within a single cloud.
|
||||||
|
|
||||||
One of our goals is to enable administrators to create out-of-tree
|
One of our goals is to enable administrators to create out-of-tree
|
||||||
provisioners, that is, provisioners whose code does not live in the Kubernetes
|
provisioners, that is, provisioners whose code does not live in the Kubernetes
|
||||||
project. Our experience since the 1.2 release with dynamic provisioning has
|
project.
|
||||||
shown that it is impossible to anticipate every aspect and manner of
|
|
||||||
provisioning that administrators will want to perform. The proposed design
|
|
||||||
should not prevent future work to allow out-of-tree provisioners.
|
|
||||||
|
|
||||||
## Design
|
## Design
|
||||||
|
|
||||||
|
@ -75,37 +72,60 @@ This design represents the minimally viable changes required to provision based
|
||||||
|
|
||||||
We propose that:
|
We propose that:
|
||||||
|
|
||||||
1. For the base impelementation storage class and volume selectors are mutually exclusive.
|
1. Both for in-tree and out-of-tree storage provisioners, the PV created by the
|
||||||
|
provisioners must match the PVC that led to its creations. If a provisioner
|
||||||
|
is unable to provision such a matching PV, it reports an error to the
|
||||||
|
user.
|
||||||
|
|
||||||
2. An api object will be incubated in storage.k8s.io/v1beta1 to hold the a `StorageClass`
|
2. The above point applies also to PVC label selector. If user submits a PVC
|
||||||
|
with a label selector, the provisioner must provision a PV with matching
|
||||||
|
labels. This directly implies that the provisioner understands meaning
|
||||||
|
behind these labels - if user submits a claim with selector that wants
|
||||||
|
a PV with label "region" not in "[east,west]", the provisioner must
|
||||||
|
understand what label "region" means, what available regions are there and
|
||||||
|
choose e.g. "north".
|
||||||
|
|
||||||
|
In other words, provisioners should either refuse to provision a volume for
|
||||||
|
a PVC that has a selector, or select few labels that are allowed in
|
||||||
|
selectors (such as the "region" example above), implement necessary logic
|
||||||
|
for their parsing, document them and refuse any selector that references
|
||||||
|
unknown labels.
|
||||||
|
|
||||||
|
3. An api object will be incubated in storage.k8s.io/v1beta1 to hold the a `StorageClass`
|
||||||
API resource. Each StorageClass object contains parameters required by the provisioner to provision volumes of that class. These parameters are opaque to the user.
|
API resource. Each StorageClass object contains parameters required by the provisioner to provision volumes of that class. These parameters are opaque to the user.
|
||||||
|
|
||||||
3. `PersistentVolume.Spec.Class` attribute is added to volumes. This attribute
|
4. `PersistentVolume.Spec.Class` attribute is added to volumes. This attribute
|
||||||
is optional and specifies which `StorageClass` instance represents
|
is optional and specifies which `StorageClass` instance represents
|
||||||
storage characteristics of a particular PV.
|
storage characteristics of a particular PV.
|
||||||
|
|
||||||
During incubation, `Class` is an annotation and not
|
During incubation, `Class` is an annotation and not
|
||||||
actual attribute.
|
actual attribute.
|
||||||
|
|
||||||
4. `PersistentVolume` instances do not require labels by the provisioner.
|
5. `PersistentVolume` instances do not require labels by the provisioner.
|
||||||
|
|
||||||
5. `PersistentVolumeClaim.Spec.Class` attribute is added to claims. This
|
6. `PersistentVolumeClaim.Spec.Class` attribute is added to claims. This
|
||||||
attribute specifies that only a volume with equal
|
attribute specifies that only a volume with equal
|
||||||
`PersistentVolume.Spec.Class` value can satisfy a claim.
|
`PersistentVolume.Spec.Class` value can satisfy a claim.
|
||||||
|
|
||||||
During incubation, `Class` is just an annotation and not
|
During incubation, `Class` is just an annotation and not
|
||||||
actual attribute.
|
actual attribute.
|
||||||
|
|
||||||
6. The existing provisioner plugin implementations be modified to accept
|
7. The existing provisioner plugin implementations be modified to accept
|
||||||
parameters as specified via `StorageClass`.
|
parameters as specified via `StorageClass`.
|
||||||
|
|
||||||
7. The persistent volume controller modified to invoke provisioners using `StorageClass` configuration and bind claims with `PersistentVolumeClaim.Spec.Class` to volumes with equivalent `PersistentVolume.Spec.Class`
|
8. The persistent volume controller modified to invoke provisioners using `StorageClass` configuration and bind claims with `PersistentVolumeClaim.Spec.Class` to volumes with equivalent `PersistentVolume.Spec.Class`
|
||||||
|
|
||||||
8. The existing alpha dynamic provisioning feature be phased out in the
|
9. The existing alpha dynamic provisioning feature be phased out in the
|
||||||
next release.
|
next release.
|
||||||
|
|
||||||
### Controller workflow for provisioning volumes
|
### Controller workflow for provisioning volumes
|
||||||
|
|
||||||
|
0. Kubernetes administator can configure name of a default StorageClass. This
|
||||||
|
StorageClass instance is then used when user requests a dynamically
|
||||||
|
provisioned volume, but does not specify a StorageClass. In other words,
|
||||||
|
`claim.Spec.Class == ""`
|
||||||
|
(or annotation `volume.beta.kubernetes.io/storage-class == ""`).
|
||||||
|
|
||||||
1. When a new claim is submitted, the controller attempts to find an existing
|
1. When a new claim is submitted, the controller attempts to find an existing
|
||||||
volume that will fulfill the claim.
|
volume that will fulfill the claim.
|
||||||
|
|
||||||
|
@ -125,30 +145,280 @@ We propose that:
|
||||||
periodically retries finding a matching volume or storage class again until
|
periodically retries finding a matching volume or storage class again until
|
||||||
a match is found. The claim is `Pending` during this period.
|
a match is found. The claim is `Pending` during this period.
|
||||||
|
|
||||||
4. With StorageClass instance, the controller finds volume plugin specified by
|
4. With StorageClass instance, the controller updates the claim:
|
||||||
StorageClass.Provisioner.
|
* `claim.Annotations["volume.beta.kubernetes.io/storage-provisioner"] = storageClass.Provisioner`
|
||||||
|
|
||||||
5. All provisioners are in-tree; they implement an interface called
|
* **In-tree provisioning**
|
||||||
`ProvisionableVolumePlugin`, which has a method called `NewProvisioner`
|
|
||||||
that returns a new provisioner.
|
|
||||||
|
|
||||||
6. The controller calls volume plugin `Provision` with Parameters from the `StorageClass` configuration object.
|
The controller tries to find an internal volume plugin referenced by
|
||||||
|
`storageClass.Provisioner`. If it is found:
|
||||||
|
|
||||||
7. If `Provision` returns an error, the controller generates an event on the
|
5. The internal provisioner implements interface`ProvisionableVolumePlugin`,
|
||||||
claim and goes back to step 1., i.e. it will retry provisioning periodically
|
which has a method called `NewProvisioner` that returns a new provisioner.
|
||||||
|
|
||||||
8. If `Provision` returns no error, the controller creates the returned
|
6. The controller calls volume plugin `Provision` with Parameters
|
||||||
`api.PersistentVolume`, fills its `Class` attribute with `claim.Spec.Class`
|
from the `StorageClass` configuration object.
|
||||||
and makes it already bound to the claim
|
|
||||||
|
|
||||||
1. If the create operation for the `api.PersistentVolume` fails, it is
|
7. If `Provision` returns an error, the controller generates an event on the
|
||||||
retried
|
claim and goes back to step 1., i.e. it will retry provisioning
|
||||||
|
periodically.
|
||||||
|
|
||||||
2. If the create operation does not succeed in reasonable time, the
|
8. If `Provision` returns no error, the controller creates the returned
|
||||||
controller attempts to delete the provisioned volume and creates an event
|
`api.PersistentVolume`, fills its `Class` attribute with `claim.Spec.Class`
|
||||||
on the claim
|
and makes it already bound to the claim
|
||||||
|
|
||||||
Existing behavior is un-changed for claims that do not specify `claim.Spec.Class`.
|
1. If the create operation for the `api.PersistentVolume` fails, it is
|
||||||
|
retried
|
||||||
|
|
||||||
|
2. If the create operation does not succeed in reasonable time, the
|
||||||
|
controller attempts to delete the provisioned volume and creates an event
|
||||||
|
on the claim
|
||||||
|
|
||||||
|
Existing behavior is un-changed for claims that do not specify
|
||||||
|
`claim.Spec.Class`.
|
||||||
|
|
||||||
|
* **Out of tree provisioning**
|
||||||
|
|
||||||
|
Following step 4. above, the controller tries to find internal plugin for the
|
||||||
|
`StorageClass`. If it is not found, it does not do anything, it just
|
||||||
|
periodically goes to step 1., i.e. tries to find available matching PV.
|
||||||
|
|
||||||
|
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
|
||||||
|
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
|
||||||
|
interpreted as described in RFC 2119.
|
||||||
|
|
||||||
|
External provisioner must have these features:
|
||||||
|
|
||||||
|
* It MUST have a distinct name, following Kubernetenes plugin naming scheme
|
||||||
|
`<vendor name>/<provisioner name>`, e.g. `gluster.org/gluster-volume`.
|
||||||
|
|
||||||
|
* The provisioner SHOULD send events on a claim to report any errors
|
||||||
|
related to provisioning a volume for the claim. This way, users get the same
|
||||||
|
experience as with internal provisioners.
|
||||||
|
|
||||||
|
* The provisioner MUST implement also a deleter. It must be able to delete
|
||||||
|
storage assets it created. It MUST NOT assume that any other internal or
|
||||||
|
external plugin is present.
|
||||||
|
|
||||||
|
The external provisioner runs in a separate process which watches claims, be
|
||||||
|
it an external storage appliance, a daemon or a Kubernetes pod. For every
|
||||||
|
claim creation or update, it implements these steps:
|
||||||
|
|
||||||
|
1. The provisioner inspects if
|
||||||
|
`claim.Annotations["volume.beta.kubernetes.io/storage-provisioner"] == <provisioner name>`.
|
||||||
|
All other claims MUST be ignored.
|
||||||
|
|
||||||
|
2. The provisioner MUST check that the claim is unbound, i.e. its
|
||||||
|
`claim.Spec.VolumeName` is empty. Bound volumes MUST be ignored.
|
||||||
|
|
||||||
|
*Race condition when the provisioner provisions a new PV for a claim and
|
||||||
|
at the same time Kubernetes binds the same claim to another PV that was
|
||||||
|
just created by admin is discussed below.*
|
||||||
|
|
||||||
|
3. It tries to find a StorageClass instance referenced by annotation
|
||||||
|
`claim.Annotations["volume.beta.kubernetes.io/storage-class"]`. If not
|
||||||
|
found, it SHOULD report an error (by sending an event to the claim) and it
|
||||||
|
SHOULD retry periodically with step i.
|
||||||
|
|
||||||
|
4. The provisioner MUST parse arguments in the `StorageClass` and
|
||||||
|
`claim.Spec.Selector` and provisions appropriate storage asset that matches
|
||||||
|
both the parameters and the selector.
|
||||||
|
When it encounters unknown parameters in `storageClass.Parameters` or
|
||||||
|
`claim.Spec.Selector` or the combination of these parameters is impossible
|
||||||
|
to achieve, it SHOULD report an error and it MUST NOT provision a volume.
|
||||||
|
All errors found during parsing or provisioning SHOULD be send as events
|
||||||
|
on the claim and the provisioner SHOULD retry periodically with step i.
|
||||||
|
|
||||||
|
As parsing (and understanding) claim selectors is hard, the sentence
|
||||||
|
"MUST parse ... `claim.Spec.Selector`" will in typical case lead to simple
|
||||||
|
refusal of claims that have any selector:
|
||||||
|
|
||||||
|
```go
|
||||||
|
if pvc.Spec.Selector != nil {
|
||||||
|
return Error("can't parse PVC selector!")
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
5. When the volume is provisioned, the provisioner MUST create a new PV
|
||||||
|
representing the storage asset and save it in Kubernetes. When this fails,
|
||||||
|
it SHOULD retry creating the PV again few times. If all attempts fail, it
|
||||||
|
MUST delete the storage asset. All errors SHOULD be sent as events to the
|
||||||
|
claim.
|
||||||
|
|
||||||
|
The created PV MUST have these properties:
|
||||||
|
|
||||||
|
* `pv.Spec.ClaimRef` MUST point to the claim that led to its creation
|
||||||
|
(including the claim UID).
|
||||||
|
|
||||||
|
*This way, the PV will be bound to the claim.*
|
||||||
|
|
||||||
|
* `pv.Annotations["pv.kubernetes.io/provisioned-by"]` MUST be set to name
|
||||||
|
of the external provisioner. This provisioner will be used to delete the
|
||||||
|
volume.
|
||||||
|
|
||||||
|
*The provisioner/delete should not assume there is any other
|
||||||
|
provisioner/deleter available that would delete the volume.*
|
||||||
|
|
||||||
|
* `pv.Annotations["volume.beta.kubernetes.io/storage-class"]` MUST be set
|
||||||
|
to name of the storage class requested by the claim.
|
||||||
|
|
||||||
|
*So the created PV matches the claim.*
|
||||||
|
|
||||||
|
* The provisioner MAY store any other information to the created PV as
|
||||||
|
annotations. It SHOULD save any information that is needed to delete the
|
||||||
|
storage asset there, as appropriate StorageClass instance may not exist
|
||||||
|
when the volume will be deleted. However, references to Secret instance
|
||||||
|
or direct username/password to a remote storage appliance MUST NOT be
|
||||||
|
stored there, see issue #34822.
|
||||||
|
|
||||||
|
* `pv.Labels` MUST be set to match `claim.spec.selector`. The provisioner
|
||||||
|
MAY add additional labels.
|
||||||
|
|
||||||
|
*So the created PV matches the claim.*
|
||||||
|
|
||||||
|
* `pv.Spec` MUST be set to match requirements in `claim.Spec`, especially
|
||||||
|
access mode and PV size. The provisioned volume size MUST NOT be smaller
|
||||||
|
than size requested in the claim, however it MAY be larger.
|
||||||
|
|
||||||
|
*So the created PV matches the claim.*
|
||||||
|
|
||||||
|
* `pv.Spec.PersistentVolumeSource` MUST be set to point to the created
|
||||||
|
storage asset.
|
||||||
|
|
||||||
|
* `pv.Spec.PersistentVolumeReclaimPolicy` SHOULD be set to `Delete` unless
|
||||||
|
user manually configures other reclaim policy.
|
||||||
|
|
||||||
|
* `pv.Name` MUST be unique. Internal provisioners use name based on
|
||||||
|
`claim.UID` to produce conflicts when two provisioners accidentally
|
||||||
|
provision a PV for the same claim, however external provisioners can use
|
||||||
|
any mechanism to generate an unique PV name.
|
||||||
|
|
||||||
|
Example of a claim that is to be provisioned by an external provisioner for
|
||||||
|
`foo.org/foo-volume`:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: v1
|
||||||
|
kind: PersistentVolumeClaim
|
||||||
|
metadata:
|
||||||
|
annotations:
|
||||||
|
volume.beta.kubernetes.io/storage-class: myClass
|
||||||
|
volume.beta.kubernetes.io/storage-provisioner: foo.org/foo-volume
|
||||||
|
name: fooclaim
|
||||||
|
namespace: default
|
||||||
|
resourceVersion: "53"
|
||||||
|
uid: 5a294561-7e5b-11e6-a20e-0eb6048532a3
|
||||||
|
spec:
|
||||||
|
accessModes:
|
||||||
|
- ReadWriteOnce
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
storage: 4Gi
|
||||||
|
# volumeName: must be empty!
|
||||||
|
```
|
||||||
|
|
||||||
|
Example of the created PV:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: v1
|
||||||
|
kind: PersistentVolume
|
||||||
|
metadata:
|
||||||
|
annotations:
|
||||||
|
pv.kubernetes.io/provisioned-by: foo.org/foo-volume
|
||||||
|
volume.beta.kubernetes.io/storage-class: myClass
|
||||||
|
foo.org/provisioner: "any other annotations as needed"
|
||||||
|
labels:
|
||||||
|
foo.org/my-label: "any labels as needed"
|
||||||
|
generateName: "foo-volume-"
|
||||||
|
spec:
|
||||||
|
accessModes:
|
||||||
|
- ReadWriteOnce
|
||||||
|
awsElasticBlockStore:
|
||||||
|
fsType: ext4
|
||||||
|
volumeID: aws://us-east-1d/vol-de401a79
|
||||||
|
capacity:
|
||||||
|
storage: 4Gi
|
||||||
|
claimRef:
|
||||||
|
apiVersion: v1
|
||||||
|
kind: PersistentVolumeClaim
|
||||||
|
name: fooclaim
|
||||||
|
namespace: default
|
||||||
|
resourceVersion: "53"
|
||||||
|
uid: 5a294561-7e5b-11e6-a20e-0eb6048532a3
|
||||||
|
persistentVolumeReclaimPolicy: Delete
|
||||||
|
```
|
||||||
|
|
||||||
|
As result, Kubernetes has a PV that represents the storage asset and is bound
|
||||||
|
to the claim. When everything went well, Kubernetes completed binding of the
|
||||||
|
claim to the PV.
|
||||||
|
|
||||||
|
Kubernetes was not blocked in any way during the provisioning and could
|
||||||
|
either bound the claim to another PV that was created by user or even the
|
||||||
|
claim may have been deleted by the user. In both cases, Kubernetes will mark
|
||||||
|
the PV to be delete using the protocol below.
|
||||||
|
|
||||||
|
The external provisioner MAY save any annotations to the claim that is
|
||||||
|
provisioned, however the claim may be modified or even deleted by the user at
|
||||||
|
any time.
|
||||||
|
|
||||||
|
|
||||||
|
### Controller workflow for deleting volumes
|
||||||
|
|
||||||
|
When the controller decides that a volume should be deleted it performs these
|
||||||
|
steps:
|
||||||
|
|
||||||
|
1. The controller changes `pv.Status.Phase` to `Released`.
|
||||||
|
|
||||||
|
2. The controller looks for `pv.Annotations["pv.kubernetes.io/provisioned-by"]`.
|
||||||
|
If found, it uses this provisioner/deleter to delete the volume.
|
||||||
|
|
||||||
|
3. If the volume is not annotated by `pv.kubernetes.io/provisioned-by`, the
|
||||||
|
controller inspects `pv.Spec` and finds in-tree deleter for the volume.
|
||||||
|
|
||||||
|
4. If the deleter found by steps 2. or 3. is internal, it calls it and deletes
|
||||||
|
the storage asset together with the PV that represents it.
|
||||||
|
|
||||||
|
5. If the deleter is not known to Kubernetes, it does not do anything.
|
||||||
|
|
||||||
|
6. External deleters MUST watch for PV changes. When
|
||||||
|
`pv.Status.Phase == Released && pv.Annotations['pv.kubernetes.io/provisioned-by'] == <deleter name>`,
|
||||||
|
the deleter:
|
||||||
|
|
||||||
|
* It MUST check reclaim policy of the PV and ignore all PVs whose
|
||||||
|
`Spec.PersistentVolumeReclaimPolicy` is not `Delete`.
|
||||||
|
|
||||||
|
* It MUST delete the storage asset.
|
||||||
|
|
||||||
|
* Only after the storage asset was successfully deleted, it MUST delete the
|
||||||
|
PV object in Kubernetes.
|
||||||
|
|
||||||
|
* Any error SHOULD be sent as an event on the PV being deleted and the
|
||||||
|
deleter SHOULD retry to delete the volume periodically.
|
||||||
|
|
||||||
|
* The deleter SHOULD NOT use any information from StorageClass instance
|
||||||
|
referenced by the PV. This is different to internal deleters, which
|
||||||
|
need to be StorageClass instance present at the time of deletion to read
|
||||||
|
Secret instances (see Gluster provisioner for example), however we would
|
||||||
|
like to phase out this behavior.
|
||||||
|
|
||||||
|
Note that watching `pv.Status` has been frowned upon in the past, however in
|
||||||
|
this particular case we could use it quite reliably to trigger deletion.
|
||||||
|
It's not trivial to find out if a PV is not needed and should be deleted.
|
||||||
|
*Alternatively, an annotation could be used.*
|
||||||
|
|
||||||
|
### Security considerations
|
||||||
|
|
||||||
|
Both internal and external provisioners and deleters may need access to
|
||||||
|
credentials (e.g. username+password) of an external storage appliance to
|
||||||
|
provision and delete volumes.
|
||||||
|
|
||||||
|
* For internal provisioners, a Secret instance in a well secured namespace
|
||||||
|
should be used. Pointer to the Secret instance shall be parameter of the
|
||||||
|
StorageClass and it MUST NOT be copied around the system e.g. in annotations
|
||||||
|
of PVs. See issue #34822.
|
||||||
|
|
||||||
|
* External provisioners running in pod should have appropriate credentials
|
||||||
|
mouted as Secret inside pods that run the provisioner. Namespace with the pods
|
||||||
|
and Secret instance should be well secured.
|
||||||
|
|
||||||
### `StorageClass` API
|
### `StorageClass` API
|
||||||
|
|
||||||
|
@ -253,7 +523,7 @@ parameters:
|
||||||
|
|
||||||
0. Annotation `volume.alpha.kubernetes.io/storage-class` is used instead of `claim.Spec.Class` and `volume.Spec.Class` during incubation.
|
0. Annotation `volume.alpha.kubernetes.io/storage-class` is used instead of `claim.Spec.Class` and `volume.Spec.Class` during incubation.
|
||||||
|
|
||||||
1. `claim.Spec.Selector` and `claim.Spec.Class` are mutually exclusive. User can either match existing volumes with `Selector` XOR match existing volumes with `Class` and get dynamic provisioning by using `Class`. This simplifies initial PR and also provisioners.
|
1. `claim.Spec.Selector` and `claim.Spec.Class` are mutually exclusive for now (1.4). User can either match existing volumes with `Selector` XOR match existing volumes with `Class` and get dynamic provisioning by using `Class`. This simplifies initial PR and also provisioners. This limitation may be lifted in future releases.
|
||||||
|
|
||||||
# Cloud Providers
|
# Cloud Providers
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue