Automatic merge from submit-queue
Add storageClass.mountOptions and use it in all applicable plugins
split off from https://github.com/kubernetes/kubernetes/pull/50919 and still dependent on it. cc @gnufied
issue: https://github.com/kubernetes/features/issues/168
```release-note
Add mount options field to StorageClass. The options listed there are automatically added to PVs provisioned using the class.
```
When API server crashes *after* saving a provisioned PV and before sending
200 OK, the controller tries to save the PV again. In this case, it gets
AlreadyExists error, which should be interpreted as success and not as error.
Especially, a volume that corresponds to the PV should not be deleted in the
underlying storage.
Automatic merge from submit-queue (batch tested with PRs 51441, 51356, 51460)
Don't update pvc.status.capacity if pvc is already Bound
As discussed here https://github.com/kubernetes/community/pull/657#discussion_r128008128, in order for `pvc.status.Capacity < pv.Spec.Capcity` to be the mechanism for volume filesystem* resize, the pv controller should stop updating pvc.status.Capacity every resync period.
/assign @jsafrane
/sig storage
```release-note
NONE
```
Automatic merge from submit-queue
Add volume operation metrics to operation executor and PV controller
This PR implements the proposal for high level volume metrics https://github.com/kubernetes/community/pull/809
**Special notes for your reviewer**:
~Differences from proposal:~ all resolved
~"verify_volume" is now "verify_volumes_are_attached" + "verify_volumes_are_attached_per_node" + "verify_controller_attached_volume." Which of them do we want?~
~There is no "mount_device" metric because the MountVolume operation combines MountDevice and mount (plugin.Setup). Do we want to extract the mount_device metric or is it okay to keep mountvolume as one? For attachable volumes, MountDevice is the actual mount and Setup is a bindmount + setvolumeownership. For unattachable, mountDevice does not occur and Setup is an actual mount + setvolumeownership.~
~PV controller metrics I did not implement following the proposal at all. I did not change goroutinemap nor scheduleOperation. Because provisionClaimOperation does not return an error, so it's impossible for the caller to know if there is actually a failure worth reporting. So I manually create a new metric inside the function according to some conditions.~
@gnufied
I have tested the operationexecutor metrics but not provision & delete. Sample:
![screen shot 2017-08-02 at 15 01 08](https://user-images.githubusercontent.com/13111288/28889980-a7093526-7793-11e7-9aa9-ad7158be76fa.png)
**Release note**:
```release-note
Add error count and time-taken metrics for storage operations such as mount and attach, per-volume-plugin.
```
This change is prerequisite for implementing iSCSI attacher
and detacher.
In order to use chap authentication at iSCSI plugin after
implementing attacher and detacher, secret is needed at
AttachDisk() which is called from WaitForAttach().
To obtain secret, pod information is required, but
WaitForAttach() doesn't pass pod information inside.
This patch adds 'pod' as an argument of WaitForAttach()
and adds changes to drivers who implements WaitForAttach().
Fixes#48953
Automatic merge from submit-queue
Allow attach of volumes to multiple nodes for vSphere
This is a fix for issue #50944 which doesn't allow a volume to be attached to a new node after the node is powered off where the volume was previously attached.
Current behaviour:
One of the cluster worker nodes was powered off in vCenter.
Pods running on this node have been rescheduled on different nodes but got stuck in ContainerCreating. It failed to attach the volume on the new node with error "Multi-Attach error for volume pvc-xxx, Volume is already exclusively attached to one node and can't be attached to another" and hence the application running in the pod has no data available because the volume is not attached to the new node. Since the volume is still attached to powered off node, any attempt to attach the volume on the new node failed with error "Multi-Attach error". It's stuck for 6 minutes until attach/detach controller forcefully tried to detach the volume on the powered off node. After the end of 6 minutes when volume is detached on powered off node, the volume is now successfully attached on the new node and application has now the data available.
What is expected to happen:
I would want the attach/detach controller to go ahead with the attach of the volume on new node where the pod got provisioned instead of waiting for the volume to be detached on the powered off node. It is ok to eventually delete the volume on the powered off node after 6 minutes. This way the application downtime is low and pods are up as soon as possible.
The current fix ignore, vSphere volumes/persistent volume to check for multi-attach scenario in attach/detach controller.
@jingxu97 @saad-ali : Can you please take a look at it.
@tusharnt @divyenpatel @rohitjogvmw @luomiao
```release-note
Allow attach of volumes to multiple nodes for vSphere
```
Automatic merge from submit-queue (batch tested with PRs 50806, 48789, 49922, 49935, 50438)
On AttachDetachController node status update, do not retry when node …
…doesn't exist but keep the node entry in cache.
**What this PR does / why we need it**: An alternative fix for https://github.com/kubernetes/kubernetes/issues/42438 which also fixes#50721.
Instead of removing the node entry entirely from the node status update cache (which prevents the node from ever being updated even when it recovers), here the node status updater does nothing, so that there won't be an update retry until the node is re-added, where the cache entry is set to true.
Will cherry pick to prior versions after this is merged.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#50721
**Release Note**:
``` release-note
On AttachDetachController node status update, do not retry when node doesn't exist but keep the node entry in cache.
```
/assign @jingxu97
/cc @saad-ali
/sig storage
/release-note
Automatic merge from submit-queue (batch tested with PRs 46458, 50934, 50766, 50970, 47698)
Prepare VolumeHost for running mount tools in containers
This is the first part of implementation of https://github.com/kubernetes/features/issues/278 - running mount utilities in containers.
It updates `VolumeHost` interface:
* `GetMounter()` now requires volume plugin name, as it is going to return different mounter to different volume plugings, because mount utilities for these plugins can be on different places.
* New `GetExec()` method that should volume plugins use to execute any utilities. This new `Exec` interface will execute them on proper place.
* `SafeFormatAndMount` is updated to the new `Exec` interface.
This is just a preparation, `GetExec` right now leads to simple `os.Exec` and mount utilities are executed on the same place as before. Also, the volume plugins will be updated in subsequent PRs (split into separate PRs, some plugins required lot of changes).
```release-note
NONE
```
@kubernetes/sig-storage-pr-reviews
@rootfs @gnufied
This will help us in tracking down problems related to pods
not getting added to desired state of world because of events
arriving out of order or some other problem related to that.
We want relatively short resync period of PV/PVCs and at the same time we
don't want to force such short resync to all shared informer consumers.
Therefore we need to make our own periodic resync.
Only relying on the NewAttacher/Detacher call counts is not enough as they
happen in parallel to the testing/verification code and thus the actual
attaching/detaching may not be done yet, resulting in flaky test results.
Fixes#46244
Automatic merge from submit-queue
volumes: SetNodeStatusUpdateNeeded on error
If an error happened during the UpdateNodeStatuses loop, there were some
code paths where we would not call SetNodeStatusUpdateNeeded, leaking
the state. Add it to all paths by adding a function.
Part of #40583
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 42252, 42251, 42249, 47512, 47887)
volumes: simplify append-to-slice code
Minor simplification - can append to empty/nil slice.
Part of #40583
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 42252, 42251, 42249, 47512, 47887)
volumes: promote some logs from info -> warning
Part of #40583
```release-note
NONE
```
Automatic merge from submit-queue
volumes: add comment on getNodeAndVolume
Add comments on getNodeAndVolume to explain the code - it is a little
subtle, and it confused me on first reading.
Part of #40583
```release-note
NONE
```
Internal attach/detach controller timers should be configurable and tests
should use much shorter values.
reconcilerSyncDuration is deliberately left out of TimerConfig because it's
the only one that's not a constant one, it's configurable by user.
If an error happened during the UpdateNodeStatuses loop, there were some
code paths where we would not call SetNodeStatusUpdateNeeded, leaking
the state. Add it to all paths by adding a function.
Part of #40583
Automatic merge from submit-queue (batch tested with PRs 46076, 43879, 44897, 46556, 46654)
Local storage plugin
**What this PR does / why we need it**:
Volume plugin implementation for local persistent volumes. Scheduler predicate will direct already-bound PVCs to the node that the local PV is at. PVC binding still happens independently.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
Part of #43640
**Release note**:
```
Alpha feature: Local volume plugin allows local directories to be created and consumed as a Persistent Volume. These volumes have node affinity and pods will only be scheduled to the node that the volume is at.
```
Automatic merge from submit-queue
Optimize provisioner plugin result check logic
If err is not returned by findProvisionablePlugin(...), storageClass is certainly not nil
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 46383, 45645, 45923, 44884, 46294)
Node status updater now deletes the node entry in attach updates...
… when node is missing in NodeInformer cache.
- Added RemoveNodeFromAttachUpdates as part of node status updater operations.
**What this PR does / why we need it**: Fixes issue of unnecessary node status updates when node is deleted.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#42438
**Special notes for your reviewer**: Unit tested added, but a more comprehensive test involving the attach detach controller requires certain testing functionality that is currently absent, and will require larger effort. Will be added at a later time.
There is an edge case caused by the following steps:
1) A node is deleted and restarted. The node exists, but is not yet recognized by Kubernetes.
2) A pod requiring a volume attach with nodeName specifically set to this node.
This would make the pod stuck in ContainerCreating state. This is low-pri since it's a specific edge case that can be avoided.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 45990, 45544, 45745, 45742, 45678)
Refactor reconciler volume log and error messages
**What this PR does / why we need it**:
Utilizes volume-specific error and log messages introduced in #44969, inside files that also log volume information.
Specifically:
- pkg/kubelet/volumemanager/reconciler/reconciler.go,
- pkg/controller/volume/attachdetach/reconciler/reconciler.go, and
- pkg/kubelet/volumemanager/populator/desired_state_of_world_populator.go
**Which issue this PR fixes** : fixes#40905
**Special notes for your reviewer**:
**Release note**:
```release-note
```
NONE
Automatic merge from submit-queue (batch tested with PRs 44722, 44704, 44681, 44494, 39732)
Fix issue #34242: Attach/detach should recover from a crash
When the attach/detach controller crashes and a pod with attached PV is deleted afterwards the controller will never detach the pod's attached volumes. To prevent this the controller should try to recover the state from the nodes status and figure out which volumes to detach. This requires some changes in the volume providers too: the only information available from the nodes is the volume name and the device path. The controller needs to find the correct volume plugin and reconstruct the volume spec just from the name. This required a small change also in the volume plugin interface.
Fixes Issue #34242.
cc: @jsafrane @jingxu97
When the attach/detach controller crashes and a pod with attached PV is deleted
afterwards the controller will never detach the pod's attached volumes. To
prevent this the controller should try to recover the state from the nodes
status.
Automatic merge from submit-queue (batch tested with PRs 44469, 44566, 44467, 44526)
WaitForCacheSync before running attachdetach controller
@gnufied you wrote the test and @ncdc the TODO comment. Let's just run the pv and pvc informers, we do not care about them in this test. But we want to be able to stop the pod Informer at will, hence not just using informers.Start, is my understanding.
```release-note
NONE
```
Automatic merge from submit-queue
Remove alphaProvisioner in PVController and AlphaStorageClassAnnotation
remove alpha annotation and alphaProvisioner
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Improve event msg for PV controller when using external provisioner
Improve event msg for PV controller when using external provisioner
**Which issue this PR fixes** *:
Fixed part of #42121
**Special notes for your reviewer**:
@jsafrane, as many of our users are confused by the original message, can we fix the message first and then consider how to control the count of the events?
Automatic merge from submit-queue (batch tested with PRs 41775, 39678, 42629, 42524, 43028)
matchPredicate does not fit findByClaim()
matchPredicate has two args which are type of PV,and is not used in function findByClaim(),remove it
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 43453, 42992)
make sure that the volume satisfies the requirements of the claim before binding
check if the volume requested by the claim satisfies the requirements of the claim before binding when
syncUnboundClaim and claim.Spec.VolumeName is not set, although the volume is asked by user
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Adding gnufied as reviewer for volume controller
I have helped review several PRs and made new
PRs to this area.
cc @childsb @saad-ali
check if the volume requested by the claim satisfies the requirements of the claim before binding when
syncUnboundClaim and claim.Spec.VolumeName is not set
Automatic merge from submit-queue
Make Constants Public so that They Can Be Used in an Ext. Provisioner
Out-of-tree external provisioners have the same purpose as in-tree provisioners. As external provisioners work with PV and PVC datastructures it's an advantage to import certain Kubernetes packages instead of copy-pasting the Kubernetes code.
That's why the constants are made public so that they can be used in an external provisioner.
@jsafrane @kubernetes/sig-storage-pr-reviews
```
NONE
```
Automatic merge from submit-queue
update pkg/controller/volume/OWNER to add appropriate approvers for both volume controllers
Update pkg/controller/volume approvers so that the attach/detach and binding controllers have approvers.
Automatic merge from submit-queue
Attach/detach controller: fix potential race in constructor
**What this PR does / why we need it**:
There is a potential race condition in the Attach/detach controller: The "constructor" first installs informer event handlers and then creates and initializes the other data structures. However there is no guarantee an event cannot arrive before the data structures required by the event handlers are ready. This may result in nil pointer derefernces and potential crashes (e.g. the nodeAdd method calls adc.actualStateOfWorld.SetNodeStatusUpdateNeeded even though the actualStateOfWorld might be still nil).
It should be enough just to move the event handlers installation at the end of the constructor function.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
fix createProvisionedPV result err judgenment bug
When ctrl.kubeClient.Core().PersistentVolumes().Create(volume) returns no err, and storeVolumeUpdate() fails, we save PV sucessfully ,but here err is not nil,we should not run the codes next in block if err != nil {} to delete the storage asset.
same in the deletion retries below
change the err names to make it clear
**Release note**:
```release-note
NONE
```
@jsafrane @saad-ali PTAL. Thanks
When ctrl.kubeClient.Core().PersistentVolumes().Create(volume) returns no err, but storeVolumeUpdate() failed, we save PV sucessfully ,but here err is not nil,
we should not run the codes next in block if err != nil {}
same in the deletion retries below
Automatic merge from submit-queue (batch tested with PRs 42522, 42545, 42556, 42006, 42631)
optimize the binding logic of bindClaimToVolume
extract var shouldSetBoundByController and do not need to judge volumename twice
**Release note**:
```release-note
NONE
```
Out-of-tree external provisioners have the same purpose as in-tree provisioners. As external provisioners work with PV and PVC datastructures it's an advantage to import certain Kubernetes packages instead of copy-pasting the Kubernetes code.
That's why the constants are made public so that they can be used in an external provisioner.
Automatic merge from submit-queue (batch tested with PRs 41306, 42187, 41666, 42275, 42266)
Implement bulk polling of volumes
This implements Bulk volume polling using ideas presented by
justin in https://github.com/kubernetes/kubernetes/pull/39564
But it changes the implementation to use an interface
and doesn't affect other implementations.
cc @justinsb
This implements Bulk volume polling using ideas presented by
justin in https://github.com/kubernetes/kubernetes/pull/39564
But it changes the implementation to use an interface
and doesn't affect other implementations.
Automatic merge from submit-queue (batch tested with PRs 41814, 41922, 41957, 41406, 41077)
pv_controller: Do not report exponential backoff as error.
It's not an error when recycle/delete/provision operation cannot be started
because it has failed recently. It will be restarted automatically when
backoff expires.
This just pollutes logs without any useful information:
```
E0214 08:00:30.428073 77288 pv_controller.go:1410] error scheduling operaion "delete-pvc-1fa0e8b4-f2b5-11e6-a8bb-fa163ecb84eb[1fbd52ee-f2b5-11e6-a8bb-fa163ecb84eb]": Failed to create operation with name "delete-pvc-1fa0e8b4-f2b5-11e6-a8bb-fa163ecb84eb[1fbd52ee-f2b5-11e6-a8bb-fa163ecb84eb]". An operation with that name failed at 2017-02-14 08:00:15.631133152 -0500 EST. No retries permitted until 2017-02-14 08:00:31.631133152 -0500 EST (16s). Last error: "Cannot delete the volume \"11a4faea-bfc7-4713-88b3-dec492480dba\", it's still attached to a node".
```
```release-note
NONE
```
@kubernetes/sig-storage-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 41364, 40317, 41326, 41783, 41782)
changes to cleanup the volume plugin for recycle
**What this PR does / why we need it**:
Code cleanup. Changing from creating a new interface from the plugin, that then calls a function to recycle a volume, to adding the function to the plugin itself.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#26230
**Special notes for your reviewer**:
Took same approach from closed PR #28432.
Do you want the approach to be the same for NewDeleter(), NewMounter(), NewUnMounter() and should they be in this same PR or submit different PR's for those?
**Release note**:
```NONE
```
Automatic merge from submit-queue
Remove alpha provisioning
This is the first part of https://github.com/kubernetes/features/issues/36
@kubernetes/sig-storage-misc
**Release note**:
```release-note
Alpha version of dynamic volume provisioning is removed in this release. Annotation
"volume.alpha.kubernetes.io/storage-class" does not have any special meaning. A default storage class
and DefaultStorageClass admission plugin can be used to preserve similar behavior of Kubernetes cluster,
see https://kubernetes.io/docs/user-guide/persistent-volumes/#class-1 for details.
```
It's not an error when recycle/delete/provision operation cannot be started
because it has failed recently. It will be restarted automatically when
backoff expires.
Automatic merge from submit-queue
Replace hand-written informers with generated ones
Replace existing uses of hand-written informers with generated ones.
Follow-up commits will switch the use of one-off informers to shared
informers.
This is a precursor to #40097. That PR will switch one-off informers to shared informers for the majority of the code base (but not quite all of it...).
NOTE: this does create a second set of shared informers in the kube-controller-manager. This will be resolved back down to a single factory once #40097 is reviewed and merged.
There are a couple of places where I expanded the # of caches we wait for in the calls to `WaitForCacheSync` - please pay attention to those. I also added in a commented-out wait in the attach/detach controller. If @kubernetes/sig-storage-pr-reviews is ok with enabling the waiting, I'll do it (I'll just need to tweak an integration test slightly).
@deads2k @sttts @smarterclayton @liggitt @soltysh @timothysc @lavalamp @wojtek-t @gmarek @sjenning @derekwaynecarr @kubernetes/sig-scalability-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 40855, 40859)
PV binding: send an event when there are no PVs to bind
This is similar to scheduler that says "no nodes available to schedule pods"
when it can't schedule a pod.
@kubernetes/sig-storage-pr-reviews
This fix prevents the PV controller from forcefully overwriting the provisioned volume's name with the generated PV name. Instead, it allows dynamic provisioner implementers to set the name of the volume to a value that they choose.
Automatic merge from submit-queue (batch tested with PRs 40126, 40565, 38777, 40564, 40572)
Do not swallow error in asw.updateNodeStatusUpdateNeeded
Ref #39056
Bubble the error up to `SetNodeUpdateStatusNeeded` and log it out.
NOTE: This does not modify interface of `SetNodeUpdateStatusNeeded`
Automatic merge from submit-queue (batch tested with PRs 39064, 40294)
Refactor persistent volume tests
This is an attempt to make the binder tests a bit more concise. The PVCs are being created by a "templating" function. There is also a handful of PVs in the tests but those vary quite more and I don't think similar approach would save us much code.
Reference:
https://reviewable.kubernetes.io/reviews/kubernetes/kubernetes/29006#-KPJuVeDE0O6TvDP9jia
@jsafrane: I hope this is what you have on mind.
Automatic merge from submit-queue (batch tested with PRs 39628, 39551, 38746, 38352, 39607)
Increasing times on reconciling volumes fixing impact to AWS.
#**What this PR does / why we need it**:
We are currently blocked by API timeouts with PV volumes. See https://github.com/kubernetes/kubernetes/issues/39526. This is a workaround, not a fix.
**Special notes for your reviewer**:
A second PR will be dropped with CLI cobra options in it, but we are starting with increasing the reconciliation periods. I am dropping this without major testing and will test on our AWS account. Will be marked WIP until I run smoke tests.
**Release note**:
```release-note
Provide kubernetes-controller-manager flags to control volume attach/detach reconciler sync. The duration of the syncs can be controlled, and the syncs can be shut off as well.
```
PV controller should not use Controller.Requeue, as as it is not available in
shared informers. We need to implement our own work queues instead where we
can enqueue volumes/claims as we want.
Automatic merge from submit-queue
Curating Owners: pkg/controller
cc @jsafrane @mikedanese @bprashanth @derekwaynecarr @thockin @saad-ali
In an effort to expand the existing pool of reviewers and establish a
two-tiered review process (first someone **lgtms** and then someone
experienced in the project **approves**), we are adding new reviewers to
existing owners files.
## If You Care About the Process:
We did this by algorithmically figuring out who’s contributed code to
the project and in what directories. Unfortunately, that doesn’t work
perfectly: people that have made mechanical code changes (e.g change the
copyright header across all directories) end up as reviewers in lots of
places.
Instead of using pure commit data, we generated an excessively large
list of reviewers and pruned based on all time commit data, recent
commit data and review data (number of PRs commented on).
At this point we have a decent list of reviewers, but it needs one last
pass for fine tuning.
## TLDR:
As an owner of a sig/directory and a leader of the project, here’s what
we need from you:
1. Use PR https://github.com/kubernetes/kubernetes/pull/35715 as an example.
2. The pull-request is made editable, please edit the OWNERS file to add
the names of people that should be reviewing code in the future in the **reviewers** section. You probably do NOT need to modify the **approvers** section.
3. Notify me if you want some OWNERS file to be removed. Being an approver or reviewer
of a parent directory makes you a reviewer/approver of the subdirectories too, so not all
OWNERS files may be necessary.
4. Please use ALIAS if you want to use the same list of people over and
over again (don't hesitate to ask me for help, or use the pull-request
above as an example)
Automatic merge from submit-queue (batch tested with PRs 38377, 36365, 36648, 37691, 38339)
Exponential back off when volume delete fails
**What this PR does / why we need it**:
This PR implements ability in pv_controller to back off when deleting a volume fails from plugin API.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
Partly fixes#38295 , but I think volume delete is most problematic thing happening in pv_controller without any sort of backoff.
After this change the attempts of volume deletion look like:
```
controller : I1208 00:18:35.532061 16388 aws_util.go:55] Error deleting EBS Disk volume aws://us-east-1d/vol-abcdefg: VolumeInUse: Volume vol-abcdefg is currently attached to i-1234567
controller : I1208 00:20:50.578325 16388 aws_util.go:55] Error deleting EBS Disk volume aws://us-east-1d/vol-abcdefg: VolumeInUse: Volume vol-abcdefg is currently attached to i-1234567
controller : I1208 00:23:05.563488 16388 aws_util.go:55] Error deleting EBS Disk volume aws://us-east-1d/vol-abcdefg: VolumeInUse: Volume vol-abcdefg is currently attached to i-1234567
controller : I1208 00:25:20.599158 16388 aws_util.go:55] Error deleting EBS Disk volume aws://us-east-1d/vol-abcdefg: VolumeInUse: Volume vol-abcdefg is currently attached to i-1234567
controller : I1208 00:27:35.560009 16388 aws_util.go:55] Error deleting EBS Disk volume aws://us-east-1d/vol-abcdefg: VolumeInUse: Volume vol-abcdefg is currently attached to i-1234567
controller : I1208 00:29:50.594967 16388 aws_util.go:55] Error deleting EBS Disk volume aws://us-east-1d/vol-abcdefg: VolumeInUse: Volume vol-abcdefg is currently attached to i-1234567
controller : I1208 00:32:05.539168 16388 aws_util.go:55] Error deleting EBS Disk volume aws://us-east-1d/vol-abcdefg: VolumeInUse: Volume vol-abcdefg is currently attached to i-1234567
controller : I1208 00:34:20.581665 16388 aws_util.go:55] Error deleting EBS Disk volume aws://us-east-1d/vol-abcdefg: VolumeInUse: Volume vol-abcdefg is currently attached to i-1234567
```
This implements pv_controller to exponentially backoff
when deleting a volume fails in Cloud API. It ensures that
we aren't making too many calls to Cloud API
This adds tests for code introduced here :
https://github.com/kubernetes/kubernetes/issues/26994
Via integration test we can now verify that if pod delete
event is somehow missed by AttachDetach controller - it still
get cleaned up by Desired State of World populator.
Automatic merge from submit-queue (batch tested with PRs 37608, 37103, 37320, 37607, 37678)
Fix issue #37377: Report an event on successful PVC provisioning
This is a simple patch to fix the issue #37377: On a successful PVC provisioning an event is emitted so it's clear the provisioning actually succeeded.
cc: @jsafrane
Automatic merge from submit-queue
SetNodeUpdateStatusNeeded whenever nodeAdd event is received
**What this PR does / why we need it**:
Bug fix and SetNodeStatusUpdateNeeded for a node whenever its api object is added. This is to ensure that we don't lose the attached list of volumes in the node when its api object is deleted and recreated.
fixes https://github.com/kubernetes/kubernetes/issues/37586https://github.com/kubernetes/kubernetes/issues/37585
**Special notes for your reviewer**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
Automatic merge from submit-queue
controller manager refactors
The controller manager needs some significant cleanup. This starts us down the patch by respecting parameters like `stopCh`, simplifying discovery checks, removing unnecessary parameters, preventing unncessary fatals, and using our client builder.
@sttts @ncdc
Automatic merge from submit-queue
Fix kubectl Stratigic Merge Patch compatibility
As @smarterclayton pointed out in [comment1](https://github.com/kubernetes/kubernetes/pull/35647#pullrequestreview-8290820) and [comment2](https://github.com/kubernetes/kubernetes/pull/35647#pullrequestreview-8290847) in PR #35647,
we cannot assume the API servers publish version and they shares the same version.
This PR removes all the calls of GetServerSupportedSMPatchVersion().
Change the behavior of `apply` and `edit` to:
Retrying with the old patch version, if the new version fails.
Default other usage of SMPatch to the new version, since they don't update list of primitives.
fixes#36916
cc: @pwittrock @smarterclayton