Commit Graph

163 Commits (7e670fb847430f91319ebefd67deb870b9970428)

Author SHA1 Message Date
Jan Safranek 169076e7da Fix initialization of volume controller caches.
Fix PersistentVolumeController.initializeCaches() to pass pointers to volume
or claim to storeObjectUpdate() and add extra functions to enforce that the
right types are checked in the future.


Fixes #28076
2016-06-27 13:08:02 +02:00
Angus Salkeld b4f7e67d25 Fix startup type error in initializeCaches
The following error was getting logged:
PersistentVolumeController can't initialize caches, expected list of volumes, got:
&{TypeMeta:{Kind: APIVersion:} ListMeta:{SelfLink:/api/v1/persistentvolumes ResourceVersion:11} Items:[]}
2016-06-25 10:15:27 +10:00
k8s-merge-robot 07471cf90f Merge pull request #27553 from justinsb/pvc_zone_spreading_2
Automatic merge from submit-queue

AWS/GCE: Spread PetSet volume creation across zones, create GCE volumes in non-master zones

Long term we plan on integrating this into the scheduler, but in the
short term we use the volume name to place it onto a zone.
    
We hash the volume name so we don't bias to the first few zones.
    
If the volume name "looks like" a PetSet volume name (ending with
-<number>) then we use the number as an offset.  In that case we hash
the base name.
2016-06-22 01:22:16 -07:00
k8s-merge-robot ec518005a8 Merge pull request #27567 from saad-ali/blockKubeletOnAttachController
Automatic merge from submit-queue

Kubelet Volume Manager Wait For Attach Detach Controller and Backoff on Error

* Closes https://github.com/kubernetes/kubernetes/issues/27483
  * Modified Attach/Detach controller to report `Node.Status.AttachedVolumes` on successful attach (unique volume name along with device path).
  * Modified Kubelet Volume Manager wait for Attach/Detach controller to report success before proceeding with attach.
* Closes https://github.com/kubernetes/kubernetes/issues/27492
  * Implemented an exponential backoff mechanism for for volume manager and attach/detach controller to prevent operations (attach/detach/mount/unmount/wait for controller attach/etc) from executing back to back unchecked.
* Closes https://github.com/kubernetes/kubernetes/issues/26679
  * Modified volume `Attacher.WaitForAttach()` methods to uses the device path reported by the Attach/Detach controller in `Node.Status.AttachedVolumes` instead of calling out to cloud providers.
2016-06-20 20:36:08 -07:00
saadali e716ddc771 Controller wait for attach and exponential backoff
Modify attach/detach controller to keep track of volumes to report
attached in Node VolumeToAttach status.

Modify kubelet volume manager to wait for volume to show up in Node
VolumeToAttach status.

Implement exponential backoff for errors in volume manager and attach
detach controller
2016-06-20 18:19:55 -07:00
k8s-merge-robot d8b463dfd2 Merge pull request #27128 from markturansky/disable_provisioning
Automatic merge from submit-queue

Allow disabling of dynamic provisioning

Allow administrators to opt-out of dynamic provisioning.  Provisioning is still on by default, which is the current behavior.

Per a conversation with @jsafrane, a boolean toggle was added and plumbed through into the controller.  Deliberate disabling will simply return nil from `provisionClaim` whereas a misconfigured provisioner will continue on and generate error events for the PVC.

@kubernetes/rh-storage @saad-ali @thockin  @abhgupta
2016-06-20 02:10:43 -07:00
k8s-merge-robot 0730ffbff7 Merge pull request #27434 from jsafrane/pv-events-message
Automatic merge from submit-queue

Fill PV.Status.Message with deleter/recycler errors.

Instead of empty `Message` `kubectl describe pv` now shows:

```
Name:		nfs
Labels:		<none>
Status:		Failed
Claim:		default/nfs
Reclaim Policy:	Recycle
Access Modes:	RWX
Capacity:	1Mi
Message:	Recycler failed: Pod was active on the node longer than specified deadline
Source:
    Type:	NFS (an NFS mount that lasts the lifetime of a pod)
    Server:	10.999.999.999
    Path:	/
    ReadOnly:	false
```

This is actually a regression since 1.2

@kubernetes/sig-storage
2016-06-20 01:36:28 -07:00
markturansky 16ec36c591 added toggle to disable dynamic provisioning 2016-06-20 01:15:23 -04:00
Justin Santa Barbara e711cbf912 GCE/AWS: Spread PetSet volume creation across zones
Long term we plan on integrating this into the scheduler, but in the
short term we use the volume name to place it onto a zone.

We hash the volume name so we don't bias to the first few zones.

If the volume name "looks like" a PetSet volume name (ending with
-<number>) then we use the number as an offset.  In that case we hash
the base name.

Fixes #27256
2016-06-17 23:27:31 -04:00
goltermann 218645b346 Fix several spelling errors in comments. 2016-06-17 10:41:18 -07:00
saadali 542f2dc708 Introduce new kubelet volume manager
This commit adds a new volume manager in kubelet that synchronizes
volume mount/unmount (and attach/detach, if attach/detach controller
is not enabled).

This eliminates the race conditions between the pod creation loop
and the orphaned volumes loops. It also removes the unmount/detach
from the `syncPod()` path so volume clean up never blocks the
`syncPod` loop.
2016-06-15 09:34:08 -07:00
saadali 9b6a505f8a Rename UniqueDeviceName to UniqueVolumeName
Rename UniqueDeviceName to UniqueVolumeName and move helper functions
from attacherdetacher to volumehelper package.
Introduce UniquePodName alias
2016-06-15 09:32:12 -07:00
Jan Safranek 449e9f49d3 Fill PV.Status.Message with deleter/recycler errors. 2016-06-15 14:56:31 +02:00
Jan Safranek 6081bd61f0 Rework PV controller to use util/goroutinemap 2016-06-09 13:49:04 +02:00
k8s-merge-robot bd2bc25308 Merge pull request #25865 from jsafrane/devel/pv-convert-from-12
Automatic merge from submit-queue

volume controller: Convert PersistentVolumes from Kubernetes 1.2

In Kubernetes 1.2 we used template PersistentVolume for provisioning. When a claim for dynamic volume was detected, Kubernetes did:

- create template PV for the claim with dummy pointer to storage asset
- allocate storage asset such as AWS EBS
- fill real pointer to the created storage asset to the template PV

In refactored volume provisioner, Kubernetes allocates the storage asset first and then creates a Kubernetes PV instance already with the correct pointer to the storage asset.

To support seamles upgrade from 1.2 to 1.3 we need to remove these unprovisioned template PVs. The new controller does not use them, it will see PVC for dynamic provisioning and create real PV instead.

See https://github.com/pmorie/pv-haxxz/pull/3 for pseudocode.
2016-06-03 23:27:13 -07:00
k8s-merge-robot 4877153727 Merge pull request #26772 from jsafrane/flake-controller-cache-empty
Automatic merge from submit-queue

Wait for all volumes/claims to get synced in unit test.

Controller.HasSynced() returns true when all initial claims/volumes were sent
to appropriate goroutines, not when the goroutine has actually processed them.

Fixes #26712
2016-06-03 17:05:22 -07:00
Jan Safranek 27b11c5342 Convert PersistentVolumes from Kubernetes 1.2
In Kubernetes 1.2 we used template PersistentVolume for provisioning. When a
claim for dynamic volume was detected, Kubernetes did:
- create template PV for the claim with dummy pointer to storage asset
- allocate storage asset such as AWS EBS
- fill real pointer to the created storage asset to the template PV

In refactored volume provisioner, Kubernetes allocates the storage asset first
and then creates a Kubernetes PV instance already with the correct pointer
to the storage asset.

To support seamles upgrade from 1.2 to 1.3 we need to remove these
unprovisioned template PVs. The new controller does not use them, it will see
PVC for dynamic provisioning and create real PV instead.
2016-06-03 14:26:06 +02:00
k8s-merge-robot 59e008dbcb Merge pull request #26733 from pmorie/pv-controller-typos
Automatic merge from submit-queue

Fix typo and linewrap comments in PV controller

Fix some typos and linewrap long comments that I found while going over this code investigating something.
2016-06-03 04:51:08 -07:00
Jan Safranek 962505ad01 Wait for all volumes/claims to get synced in unit test.
Controller.HasSynced() returns true when all initial claims/volumes were sent
to appropriate goroutines, not when the goroutine has actually processed them.
2016-06-03 10:53:56 +02:00
k8s-merge-robot a41d84408c Merge pull request #26518 from jsafrane/initial-sync
Automatic merge from submit-queue

Fill controller caches on startup

The controller needs to fill its caches before it starts binding/recycling/ deleting or provisioning volumes and claims. This was done using blocking initial 'xxx added' from going through syncClaim/syncVolume. However, when the caches were full, the controller waited for the next sync period to do actual binding/recycling etc.

In this patch, the controller fills its caches directly from etcd and then processes initial 'xxx added' events to reconcile the world and bind/recycle/ delete/provision stuff, resulting in faster binding after startup.

Fixes #25967 (properly)
2016-06-02 21:44:56 -07:00
Paul Morie 277c0a4e90 Fix typo and linewrap comments in PV controller 2016-06-02 15:50:07 -04:00
k8s-merge-robot 335da9b125 Merge pull request #26410 from jsafrane/fix-test-race
Automatic merge from submit-queue

Fix data race in volume controller unit test.

Reactor must be locked when fiddling with reactor.volumes and reactor.claims. Therefore add new functions to add/delete volume/claim with sending an event.

Fixes #26345
2016-06-02 04:25:08 -07:00
Jan Safranek ee74cc4354 Fix fake event recorder race
Event recorder should wait for some time to get all expected events, the event
may be written by another goroutine that just have finished.

It should not slow down the test in most cases, only when there is a bug and
expected event is not sent.
2016-06-01 10:16:35 +02:00
Jan Safranek 2d43e4549e Fix data race in volume controller unit test.
Reactor must be locked when fiddling with reactor.volumes and reactor.claims.
Therefore add new functions to add/delete volume/claim with sending an event.
2016-06-01 08:35:33 +02:00
k8s-merge-robot 04f77dd602 Merge pull request #26556 from jsafrane/fix-format
Automatic merge from submit-queue

Fix log arguments.

'i' is not printed.
@kubernetes/sig-storage
2016-05-31 21:24:50 -07:00
k8s-merge-robot 38d5be4f36 Merge pull request #26555 from jsafrane/stabilize-test-flakes
Automatic merge from submit-queue

Stabilize controller unit tests.

Remove test "5-1", it's flaky as it depends on order of execution of goroutines. When the controller starts, existing claim is enqueued as "initial sync event" and a new volume is enqueued to separate goroutine. It is not deterministic which goroutine processes its events first and there is no way how to tell that the claim event was processed.

Also, force resync of the controllers after the test to make sure all events are processed.

Fixes unit test flakes.
@kubernetes/sig-storage
2016-05-31 17:06:12 -07:00
Jan Safranek 21059e8b6d Fix log arguments.
'i' is not printed.
2016-05-31 12:12:15 +02:00
Jan Safranek 011eac7c8b Stabilize controller unit tests.
Remove test "5-1", it's flaky as it depends on order of execution of
goroutines. When the controller starts, existing claim is enqueued as
"initial sync event" and a new volume is enqueued to separate goroutine.
It is not deterministic which goroutine processes its events first and
there is no way how to tell that the claim event was processed.

Also, force resync of the controllers after the test to make sure all
events are processed.
2016-05-31 12:07:47 +02:00
Paul Morie 4ffa3c6754 Add label selector to match criteria for claims to volumes 2016-05-30 12:11:12 -04:00
Paul Morie faa112bad1 Add selector to PersistentVolumeClaim 2016-05-30 12:09:50 -04:00
k8s-merge-robot 9aeeef1d81 Merge pull request #26414 from jsafrane/reduce-sync-period
Automatic merge from submit-queue

Reduce volume controller sync period

fixes #24236 and most probably also fixes #25294.
Needs #25881! With the cache, binder is not affected by sync period. Without the cache, binding of 1000 PVCs takes more than 5 minutes (instead of ~70 seconds).

15 seconds were chosen by fair 2d10 roll :-)
2016-05-30 05:54:51 -07:00
Jan Safranek df161c3a7e Fill controller caches on startup
The controller needs to fill its caches before it starts binding/recycling/
deleting or provisioning volumes and claims. This was done using blocking
initial 'xxx added' from going through syncClaim/syncVolume. However, when
the caches were full, the controller waited for the next sync period to do
actual binding/recycling etc.

In this patch, the controller fills its caches directly from etcd and then
processes initial 'xxx added' events to reconcile the world and bind/recycle/
delete/provision stuff, resulting in faster binding after startup.

Fixes #25967 (properly)
2016-05-30 13:16:45 +02:00
k8s-merge-robot 5643b7498f Merge pull request #25881 from jsafrane/devel/pv-add-cache
Automatic merge from submit-queue

volume controller: Add cache with the latest version of PVs and PVCs

When the controller binds a PV to PVC, it saves both objects to etcd. However, there is still an old version of these objects in the controller Informer cache. So, when a new PVC comes, the PV is still seen as available and may get bound to the new PVC. This will be blocked by etcd, still, it creates unnecessary traffic that slows everything down.

To make everything worse, when periodic sync with the old PVC is performed, this PVC is seen by the controller as Pending (while it's already Bound on etcd) and will be bound to a different PV. Writing to this PV won't be blocked by etcd, only subsequent write of the PVC fails. So, the controller will need to roll back the PV in another transaction(s). The controller can keep itself pretty busy this way.

Also, we save bound PVs (and PVCs) as two transactions - we save say PV.Spec first and then .Status. The controller gets "PV.Spec updated" event from etcd and tries to fix the Status, as it seems to the controller it's outdated. This write again fails - there already is a correct version in etcd.

As we can't influence the Informer cache, it is read-only to the controller, this patch introduces second cache in the controller, which holds latest and greatest version on PVs and PVCs to prevent these useless writes to etcd . It gets updated with events from etcd *and* after etcd confirms successful save of PV/PVC modified by the controller.

The cache stores only *pointers* to PVs/PVCs, so in ideal case it shares the actual object data with the informer cache. They will diverge only for a short time when the controller modifies something and the informer cache did not get update events yet.

@kubernetes/sig-storage
2016-05-30 04:13:18 -07:00
Jan Safranek 2aa9f1dd8f Reduce volume controller sync period 2016-05-30 09:59:31 +02:00
Alex Mohr 9803393a67 Merge pull request #25960 from jsafrane/do-not-sort-bind
volume controller: Speed up binding by not sorting volumes
2016-05-26 15:47:14 -07:00
k8s-merge-robot da7d3c189a Merge pull request #25869 from jsafrane/devel/operation-logs
Automatic merge from submit-queue

volume controller: use better operation names

Using volume/claim.UID in the operation name is not really useful, as UIDs are not logged by rest of the controller. On the other hand, volume.Name and claim.Namespace/Name is logged pretty often and it would help to log these also in operation name. Still, I'd prefer to have the operation name really unique to be protected from users deleting a volume and quickly creating another one with the same name, so UID is still part of the operation name.

This has been already proven to be very useful in controller debugging.
2016-05-25 17:58:07 -07:00
Brendan Burns 88663fc58b Add some extra checking in the tests to prevent flakes. 2016-05-23 16:25:02 -07:00
k8s-merge-robot 62a8394eb4 Merge pull request #25263 from jsafrane/devel/adopt-recycle-pod
Automatic merge from submit-queue

volume recycler: Don't start a new recycler pod if one already exists.

Recycling is a long duration process and when the recycler controller is restarted in the meantime, it should not start a new recycler pod if there is one already running.

This means that the recycler pod must have deterministic name based on name of the recycled PV, we then get name conflicts when creating the pod.

Two things need to be changed:

- recycler controller and recycler plugins must pass the PV.Name to place, where the pod is created. This is most of the patch and it should be pretty straightforward.

- create recycler pod with deterministic name and check "already exists" error.

When at it, remove useless 'resourceVersion' argument and make log messages starting with lowercase.

There is an unit test to check the behavior + there is an e2e test that checks that regular recycling is not broken (it does not try to run two recycler pods in parallel as the recycler is single-threaded now).
2016-05-21 02:28:26 -07:00
Jan Safranek c7da3abd5b volume controller: Speed up binding by not sorting volumes
The binder sorts all available volumes first, then it filters out volumes
that cannot be bound by processing each volume in a loop and then finds
the smallest matching volume by binary search.

So, if we process every available volume in a loop, we can also remember the
smallest matching one and save us potentially long sorting (and quick binary
search).
2016-05-20 12:26:39 +02:00
Jan Safranek 0279232360 volume controller: Add cache with the latest version of PVs and PVCs
When the controller binds a PV to PVC, it saves both objects to etcd.
However, there is still an old version of these objects in the controller
Informer cache. So, when a new PVC comes, the PV is still seen as available
and may get bound to the new PVC. This will be blocked by etcd, still, it
creates unnecessary traffic that slows everything down.

Also, we save bound PV/PVC as two transactions - we save PV/PVC.Spec first
and then .Status. The controller gets "PV/PVC.Spec updated" event from etcd
and tries to fix the Status, as it seems to the controller it's outdated.
This write again fails - there already is a correct version in etcd.

We can't influence the Informer cache, it is read-only to the controller.

To prevent these useless writes to etcd, this patch introduces second cache
in the controller, which holds latest and greatest version on PVs and PVCs.
It gets updated with events from etcd *and* after etcd confirms successful
save of PV/PVC modified by the controller.

The cache stores only *pointers* to PVs/PVCs, so in ideal case it shares the
actual object data with the informer cache. They will diverge only when
the controller modifies something and the informer cache did not get update
events yet.
2016-05-19 16:09:06 +02:00
Jan Safranek e9a6ec29a0 volume controller: use better operation names
Using volume/claim.UID in the operation name is not really useful, as UIDs
are not logged by rest of the controller. On the other hand, volume.Name and
claim.Namespace/Name is logged pretty often and it would help to log these
also in operation name.

This has been already proven to be very useful in controller debugging.
2016-05-19 14:19:33 +02:00
Jan Safranek 0ee9160f88 volume recycler: Don't start a new recycler pod if one already exists.
Recycling is a long duration process and when the recycler controller is
restarted in the meantime, it should not start a new recycler pod if there is
one already running.

This means that the recycler pod must have deterministic name based on name
of the recycled PV, we then get name conflicts when creating the pod.

Two things need to be changed:
- recycler controller and recycler plugins must pass the PV.Name to place,
  where the pod is created.

- create recycler pod with deterministic name and check "already exists" error.

When at it, remove useless 'resourceVersion' argument and make log messages
starting with lowercase.
2016-05-19 12:58:25 +02:00
Jan Safranek 61d630ddf7 volume controller: Fix method name in a log message
It's deleteVolume, not deleteClaim.
2016-05-19 12:54:17 +02:00
Jan Safranek 01b20d8e77 Generate shorter provisioned PV names.
GCE PD names are generated out of provisioned PV.Name, therefore it should be
as short as possible and still unique.
2016-05-18 10:06:51 +02:00
Jan Safranek 79b91b9ee0 Refactor persistent volume initialization
There should be only one initialization function, shared by the real
controller and unit tests.
2016-05-18 10:06:51 +02:00
Jan Safranek 7f549511e2 Big move and rename
- remove persistentvolume_ prefix from all files
- split controller.go into controller.go and controller_base.go (to have them
  under 1500 lines for github)
2016-05-18 10:06:51 +02:00
Jan Safranek c5fe1f943c Fixed binder logging
- we need the original volume/claim in error paths
- don't report version conflicts as errors (they happen pretty often and we
  recover from them)
2016-05-18 10:06:51 +02:00
Jan Safranek 41adcc5496 Speed up binding of provisioned volumes
This fixes e2e test for provisioning - it expects that provisioned volumes
are bound quickly.

Majority of this patch is update of test framework needs to initialize the
controller appropriately.
2016-05-18 10:06:51 +02:00
Jan Safranek c6f05c8056 provisioning: Add unit testso for provisioning errors. 2016-05-18 10:06:51 +02:00
Jan Safranek c24b33793c unit test: Add possibility to inject kubeclient errors. 2016-05-18 10:06:51 +02:00