Automatic merge from submit-queue (batch tested with PRs 67709, 67556). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Fix volume scheduling issue with pod affinity and anti-affinity
**What this PR does / why we need it**:
The previous design of the volume scheduler had volume assume + bind done before pod assume + bind. This causes issues when trying to evaluate future pods with pod affinity/anti-affinity because the pod has not been assumed while the volumes have been decided.
This PR changes the design so that volume and pod are assumed first, followed by volume and pod binding. Volume binding waits (asynchronously) for the operations to complete or error. This eliminates the subsequent passes through the scheduler to wait for volume binding to complete (although pod events or resyncs may still cause the pod to run through scheduling while binding is still in progress). This design also aligns better with the scheduler framework design, so will make it easier to migrate in the future.
Many changes had to be made in the volume scheduler to handle this new design, mostly around:
* How we cache pending binding operations. Now, any delayed binding PVC that is not fully bound must have a cached binding operation. This also means bind API updates may be repeated.
* Waiting for the bind operation to fully complete, and detecting failure conditions to abort the bind and retry scheduling.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#65131
**Special notes for your reviewer**:
**Release note**:
```release-note
Fixes issue where pod scheduling may fail when using local PVs and pod affinity and anti-affinity without the default StatefulSet OrderedReady pod management policy
```
Automatic merge from submit-queue (batch tested with PRs 66840, 68159). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
CSI Cluster Registry and Node Info CRDs Improvements
**What this PR does / why we need it**:
https://github.com/kubernetes/kubernetes/pull/67803 merged before I could address @lavalamp's feedback. This PR addresses his feedback
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Follow up on PR https://github.com/kubernetes/kubernetes/pull/67803
**Special notes for your reviewer**:
**Release note**:
```release-note
```
/assign @lavalamp
/assign @thockin
CC @jsafrane @vladimirvivien @verult @gnufied @childsb
* FindPodVolumes
* Prebound PVCs are treated like unbound immediate PVCs and will error
* Always check for fully bound PVCs and cache bindings for not fully
bound PVCs
* BindPodVolumes
* Retry API updates for not fully bound PVCs even if the assume cache
already marked it
* Wait for PVCs to be fully bound after making the API updates
* Error when detecting binding/provisioning failure conditions
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Add gnufied as approver for attach/detach controller
Hopefully has reviewed and made enough fixes in this
area to understand the code thoroughly.
```release-note
None
```
/assign @saad-ali @jsafrane
Automatic merge from submit-queue (batch tested with PRs 67745, 67432, 67569, 67825, 67943). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Fix VMWare VM freezing bug by reverting #51066
**What this PR does / why we need it**: kube-controller-manager, VSphere specific: When the controller tries to attach a Volume to Node A that is already attached to Node B, Node A freezes until the volume is attached. Kubernetes continues to try to attach the volume as it thinks that it's 'multi-attachable' when it's not. #51066 is the culprit.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/vmware/kubernetes/issues/500 / https://github.com/vmware/kubernetes/issues/502 (same issue)
**Special notes for your reviewer**:
- Repro:
Vsphere installation, any k8s version from 1.8 and above, pod with attached PV/PVC/VMDK:
1. cordon the node which the pod is in
2. `kubectl delete po/[pod] --force --grace-period=0`
3. the pod is immediately rescheduled to a new node. Grab the new node from a `kubectl describe [pod]` and attempt to Ping it or SSH into it.
4. you can see that pings/ssh fail to reach the new node. `kubectl get node` shows it as 'NotReady'. New node is frozen until the volume is attached - usually 1 minute freeze for 1 volume in a low-load cluster, and many minutes more with higher loads and more volumes involved.
- Patch verification:
Tested a custom patched 1.9.10 kube-controller-manager with #51066 reverted and the above bug is resolved - can't repro it anymore. New node doesn't freeze at all, and attaching happens quite quickly, in a few seconds.
**Release note**:
```
Fix VSphere VM Freezing bug by reverting #51066
```
Automatic merge from submit-queue (batch tested with PRs 67062, 67169, 67539, 67504, 66876). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Double check PVC if not found in syncVolume
**What this PR does / why we need it**:
Double check PVC if not found in syncVolume.
If PV is bound by external PV binder (e.g. kube-scheduler), it's possible on heavy load that corresponding PVC is not synced to controller local cache yet.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#66287
**Special notes for your reviewer**:
**Release note**:
```release-note
Double check PVC if not found in syncVolume to prevent reclaiming PV wrongly.
```
If PV is bound by external PV binder (e.g. kube-scheduler), it's
possible on heavy load that corresponding PVC is not synced to
controller local cache yet.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Attacher/Detacher refactor for local storage
Proposal link: https://github.com/kubernetes/community/pull/2438
**What this PR does / why we need it**:
Attacher/Detacher refactor for the plugins which just need to mount device, but do not need to attach, such as local storage plugin.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
```release-note
Attacher/Detacher refactor for local storage
```
/sig storage
/kind feature
Automatic merge from submit-queue (batch tested with PRs 66491, 66587, 66856, 66657, 66923). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
PVC Protection: Wait for Pod delete
Currently, the PVC protection controller will remove its finalizer when
all Pods using a PVC reach at least a Terminating state. However,
certain volumes cannot be guaranteed to be umounted until a Pod is
deleted. Only Pods not in the current pods list can be considered
deleted, so we're removing the exception to not check Terminating Pods.
```release-note
NONE
```
Resolves: #65552
Signed-off-by: Jose A. Rivera <jarrpa@redhat.com>
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
attachdetach controller: attach volumes immediately when Pod's PVCs are bound
**What this PR does / why we need it**:
Let attachdetach controller to attach volumes immediately when Pod's PVCs are bound.
Current attachdetach controller calls `util.ProcessPodVolume` to add pod volumes into `desiredStateOfWorld` on these events:
- podAdd event
- podUpdate event
- podDelete event
- periodical `desiredStateOfWorldPopulator.findAndAddActivePod`
But if a pod is created with PVCs not bound, no volumes will be added into `desiredStateOfWorld` [because PVCs not bound](https://github.com/kubernetes/kubernetes/blob/v1.12.0-alpha.0/pkg/controller/volume/attachdetach/util/util.go#L99). When pv controller binds PVCs successfully, attachdetach controller will not add pod volumes immediately because it does not watch on PVC events.
It will wait until a pod update event is triggered (normally will not happen because no new status will be reported by kubelet) or `desiredStateOfWorldPopulator.findAndAddActivePod` is called (maybe 0~3 minutes later, see [timer configs](https://github.com/kubernetes/kubernetes/blob/v1.12.0-alpha.0/pkg/controller/volume/attachdetach/attach_detach_controller.go)).
In bad case, pod start time will be very long (~3 minutes + ~2 minutes (kubelet max exponential backoff)), for example: https://github.com/kubernetes/kubernetes/issues/64549#issuecomment-409440546.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#64549
**Special notes for your reviewer**:
**Release note**:
```release-note
attachdetach controller attaches volumes immediately when Pod's PVCs are bound
```
Currently, the PVC protection controller will remove its finalizer when
all Pods using a PVC reach at least a Terminating state. However,
certain volumes cannot be guaranteed to be umounted until a Pod is
deleted. Only Pods not in the current pods list can be considered
deleted, so we're removing the exception to not check Terminating Pods.
Signed-off-by: Jose A. Rivera <jarrpa@redhat.com>
Automatic merge from submit-queue (batch tested with PRs 65570, 65616). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Retry scheduling on StorageClass events
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#56163
**Special notes for your reviewer**:
I have taken over #60006.
It's hard to test in e2e, because we cannot know reschedule of pod is triggered by which event (periodically service/node events will move pods to active queue too). ~~I'll add integration tests for this functionality after [this PR](https://github.com/kubernetes/kubernetes/pull/65296) get merged.~~ (already added)
**Release note**:
```release-note
NONE
```
There are two motivations for this change:
(1) CSI plugins are soon going to support volume expansion. For such
plugins, admission controller doesn't know whether the plugins are
capabale of supporting volume expansion or not.
(2) Currently, admission controller rejects PVC updates for in-tree plugins
that don't support volume expansion (e.g., NFS, iSCSI). This change allows
external controllers to expand volumes similar to how external provisioners
operate.
Automatic merge from submit-queue (batch tested with PRs 66076, 65792, 65649). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubernetes: fix printf format errors
These are all flagged by Go 1.11's
more accurate printf checking in go vet,
which runs as part of go test.
```release-note
NONE
```
These are all flagged by Go 1.11's
more accurate printf checking in go vet,
which runs as part of go test.
Lubomir I. Ivanov <neolit123@gmail.com>
applied ammend for:
pkg/cloudprovider/provivers/vsphere/nodemanager.go
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add lichuqiang as reviewer of persistentvolume controller (for volume scheduling)
Now that I've been working on the storage topology-aware feature for quite a time. Really hope that I can help do some review.
```release-note
NONE
```
/assign @msau42
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Volume topology aware dynamic provisioning: work based on new API
**What this PR does / why we need it**:
The PR has been split to 3 parts:
Part1: https://github.com/kubernetes/kubernetes/pull/63232 for basic scheduler and PV controller plumbing
Part2: https://github.com/kubernetes/kubernetes/pull/63233 for API change
and the PR itself includes work based on the API change:
- Dynamic provisioning allowed topologies scheduler work
- Update provisioning interface to be aware of selected node and topology
**Which issue(s) this PR fixes**
Feature: https://github.com/kubernetes/features/issues/561
Design: https://github.com/kubernetes/community/issues/2168
**Special notes for your reviewer**:
/sig storage
/sig scheduling
/assign @msau42 @jsafrane @saad-ali @bsalamat
@kubernetes/sig-storage-pr-reviews
@kubernetes/sig-scheduling-pr-reviews
**Release note**:
```release-note
Volume topology aware dynamic provisioning
```
Automatic merge from submit-queue (batch tested with PRs 60012, 63692, 63977, 63960, 64008). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Only override objects from informer when version has increased.
**What this PR does / why we need it**:
We don't want an informer resync to override assumed volumes if the version has not increased.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#63467
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
should use time.Since instead of time.Now().Sub
**What this PR does / why we need it**:
should use time.Since instead of time.Now().Sub
**Special notes for your reviewer**: