Commit Graph

219 Commits (dc734edb3c21012b2baa08f7645f2ab8915406d8)

Author SHA1 Message Date
Mike Danese a765d59932 move informer and controller to pkg/client/cache
Signed-off-by: Mike Danese <mikedanese@google.com>
2016-09-15 12:50:08 -07:00
Jing Xu efaceb28cc Fix race condition in updating attached volume between master and node
This PR tries to fix issue #29324. This cause of this issue is a race
condition happens when marking volumes as attached for node status. This
PR tries to clean up the logic of when and where to mark volumes as
attached/detached. Basically the workflow as follows,
1. When volume is attached sucessfully, the volume and node info is
added into nodesToUpdateStatusFor to mark the volume as attached to the
node.
2. When detach request comes in, it will check whether it is safe to
detach now. If the check passes, remove the volume from volumesToReportAsAttached
to indicate the volume is no longer considered as attached now.
Afterwards, reconciler tries to update node status and trigger detach
operation. If any of these operation fails, the volume is added back to
the volumesToReportAsAttached list showing that it is still attached.

These steps should make sure that kubelet get the right (might be
outdated) information about which volume is attached or not. It also
garantees that if detach operation is pending, kubelet should not
trigger any mount operations.
2016-09-12 13:51:08 -07:00
Jing Xu b9157b7524 Post event message for volume attachment
This PR is to add event message when attaching volume fails to help
users to debug. For detach failure, may address in a different PR since
it requires more data structure change.
2016-09-01 16:24:36 -07:00
Kubernetes Submit Queue 3d7a105d9b Merge pull request #30903 from jingxu97/cherrypick-8-19
Automatic merge from submit-queue

Avoid failure message flush log when node no longer exist

When node is deleted, attach-detach controller cache may contain stale
information of this node, and update node status fails in reconciler
loop. This message easily flush the log file. This PR is just a quick
fix of this issue. More complete fix including make controller cache
up to date will be addressed in another PR.
2016-08-19 15:45:58 -07:00
Kubernetes Submit Queue 6ce405c6ee Merge pull request #27778 from screeley44/k8-vol-executor
Automatic merge from submit-queue

Add Events for operation_executor to show status of mounts, failed/successful to show in describe events

Fixes #27590 
@saad-ali @pmorie @erinboyd

After talking with @pmorie last week about the above issue, I decided to poke around and see if I could remedy.  The refactoring broke my previous UXP merged PR's that correctly showed failed mount errors in the describe events.  However, Not sure I implemented correctly, but it tested out and seems to be working, let me know what I missed or if this is not the correct approach.

```
Events:
  FirstSeen	LastSeen	Count	From			SubobjectPath	Type		Reason		Message
  ---------	--------	-----	----			-------------	--------	------		-------
  2m		2m		1	{default-scheduler }			Normal		Scheduled	Successfully assigned nfs-bb-pod1 to 127.0.0.1
  44s		44s		1	{kubelet 127.0.0.1}			Warning		FailedMount	Unable to mount volumes for pod "nfs-bb-pod1_default(a94f64f1-37c9-11e6-9aa5-52540073d346)": timeout expired waiting for volumes to attach/mount for pod "nfs-bb-pod1"/"default". list of unattached/unmounted volumes=[nfsvol]
  44s		44s		1	{kubelet 127.0.0.1}			Warning		FailedSync	Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "nfs-bb-pod1"/"default". list of unattached/unmounted volumes=[nfsvol]
  38s		38s		1	{kubelet }				Warning		FailedMount	Unable to mount volumes for pod "a94f64f1-37c9-11e6-9aa5-52540073d346": Mount failed: exit status 32
Mounting arguments: nfs1.rhs:/opt/data99 /var/lib/kubelet/pods/a94f64f1-37c9-11e6-9aa5-52540073d346/volumes/kubernetes.io~nfs/nfsvol nfs []
Output: mount.nfs: Connection timed out

Resolution hint: Check and make sure the NFS Server exists (ensure that correct IPAddress/Hostname was given) and is available/reachable.
Also make sure firewall ports are open on both client and NFS Server (2049 v4 and 2049, 20048 and 111 for v3).
Use commands telnet <nfs server> <port> and showmount <nfs server> to help test connectivity.
```
2016-08-19 08:27:48 -07:00
Jing Xu 70deeb0ae4 node not exist during node status update should not block others
When node is deleted, attach-detach controller cache may contain stale
information of this node, and update node status fails in reconciler
loop. But one node update failure should not block updating other nodes.
Also the warning message easily flush the log file. This PR is just a quick
fix of this issue. More complete fix including make sure controller cache
up to date will be addressed in another PR.
2016-08-18 13:51:30 -07:00
Kubernetes Submit Queue 9696a27aa0 Merge pull request #30737 from saad-ali/fix29358Round2
Automatic merge from submit-queue

Skip safe to detach check if node API object no longer exists

Fixes #29358
2016-08-18 04:00:05 -07:00
Scott Creeley 782d7d9815 Add Events for operation_executor to show status of mounts, failed or successful 2016-08-17 09:53:47 -04:00
saadali 0c72568247 Skip safe to detach if node api obj doesn't exist 2016-08-16 21:30:51 -07:00
Avesh Agarwal 52a60fe3be Fix default resource limits (node capacities) for downward api volumes 2016-08-16 14:41:17 -04:00
Dominika Hodovska 816f6d32ca Collapse duplicate informer creation paths 2016-08-04 09:02:13 +02:00
Paul Morie c884297990 Fix collisions issues / timeouts for mounts
For non-attachable volumes, do not call GetVolumeName on the plugin and instead
generate a unique name based on the identity of the pod and the name of the volume
within the pod.
2016-07-27 17:53:50 -04:00
saadali 89fd358c52 Assume volume detached if node doesn't exist
Fixes #29358
2016-07-22 22:07:32 -07:00
k8s-merge-robot 99e24da2ff Merge pull request #29077 from saad-ali/fixIssue29051NamespaceDeletion
Automatic merge from submit-queue

Fix "PVC Volume not detached if pod deleted via namespace deletion" issue

Fixes #29051: "PVC Volume not detached if pod deleted via namespace deletion"

This PR:
* Fixes a bug in `desired_state_of_the_world_populator.go` to check the value of `exists` returned by the `podInformer` so that it can delete pods even if the delete event is missed (or fails).
* Reduces the desired state of the world populators sleep period from 5 min to 1 min (reducing the amount of time a volume would remain attached if a volume delete event is missed or fails).
2016-07-20 20:40:32 -07:00
saadali afd8a58e5c Reduce DSW populator sleep period from 5 min to 1 2016-07-20 01:03:04 -07:00
saadali d210c2231f Check pod exist in attach controller DSW populator
Fix bug in desired_state_of_the_world_populator.go to check exists so
that it can delete pods even if the delete event is missed (or fails)
2016-07-20 01:03:04 -07:00
saadali 88d495026d Allow mounts to run in parallel for non-attachable
Allow mount volume operations to run in parallel for non-attachable
volume plugins.

Allow unmount volume operations to run in parallel for all volume
plugins.
2016-07-19 21:54:26 -07:00
Morgan Bauer 69719167a3
close channel to prevent memory leak
- wait.JitterUntil goroutine is never cleaned up when used with wait.NeverStop
 - fixup comment
2016-07-06 09:34:20 -07:00
saadali 0dd17fff22 Reorganize volume controllers and manager 2016-07-01 18:50:25 -07:00