github/k3s - k3s - https://git.xinac.net

Commit Graph

Author	SHA1	Message	Date
Justin Santa Barbara	54195d590f	Use strongly-typed types.NodeName for a node name We had another bug where we confused the hostname with the NodeName. To avoid this happening again, and to make the code more self-documenting, we use types.NodeName (a typedef alias for string) whenever we are referring to the Node.Name. A tedious but mechanical commit therefore, to change all uses of the node name to use types.NodeName Also clean up some of the (many) places where the NodeName is referred to as a hostname (not true on AWS), or an instanceID (not true on GCE), etc.	2016-09-27 10:47:31 -04:00
Kubernetes Submit Queue	0a4316f11e	Merge pull request #32807 from jingxu97/stateupdateNeeded-9-15 Automatic merge from submit-queue Fix race condition in setting node statusUpdateNeeded flag This PR fixes the race condition in setting node statusUpdateNeeded flag in master's attachdetach controller. This flag is used to indicate whether a node status has been updated by the node_status_updater or not. When updater finishes update a node status, it is set to false. When the node status is changed such as volume is detached or new volume is attached to the node, the flag is set to true so that updater can update the status again. The previous workflow has a race condition as follows 1. updater gets the currently attached volume list from the node which needs to be updated. 2. A new volume A is attached to the same node right after 1 and set the flag to TRUE 3. updater updates the node attached volume list (which does not include volume A) and then set the flag to FALSE. The result is that volume A will be never added to the attached volume list so at node side, this volume is never attached. So in this PR, the flag is set to FALSE when updater tries to get the attached volume list (as in an atomic operation). So in the above example, after step 2, the flag will be TRUE again, in step 3, updater does not set the flag if updates is sucessful. So after that, flag is still TRUE and in next round of update, the node status will be updated.	2016-09-23 11:25:16 -07:00
Jing Xu	14cad206f5	Fix race conditino in setting node statusUpdateNeeded flag This PR fixes the race condition in setting node statusUpdateNeeded flag in master's attachdetach controller. This flag is used to indicate whether a node status has been updated by the node_status_updater or not. When updater finishes update a node status, it is set to false. When the node status is changed such as volume is detached or new volume is attached to the node, the flag is set to true so that updater can update the status again. The previous workflow has a race condition as follows 1. updater gets the currently attached volume list from the node which needs to be updated. 2. A new volume A is attached to the same node right after 1 and set the flag to TRUE 3. updater updates the node attached volume list (which does not include volume A) and then set the flag to FALSE. The result is that volume A will be never added to the attached volume list so at node side, this volume is never attached. So in this PR, the flag is set to FALSE when updater tries to get the attached volume list (as in an atomic operation). So in the above example, after step 2, the flag will be TRUE again, in step 3, updater does not set the flag if updates is sucessful. So after that, flag is still TRUE and in next round of update, the node status will be updated. This PR also changes a unit test due to the workflow changes	2016-09-22 14:02:30 -07:00
Mike Danese	a765d59932	move informer and controller to pkg/client/cache Signed-off-by: Mike Danese <mikedanese@google.com>	2016-09-15 12:50:08 -07:00
Jing Xu	efaceb28cc	Fix race condition in updating attached volume between master and node This PR tries to fix issue #29324. This cause of this issue is a race condition happens when marking volumes as attached for node status. This PR tries to clean up the logic of when and where to mark volumes as attached/detached. Basically the workflow as follows, 1. When volume is attached sucessfully, the volume and node info is added into nodesToUpdateStatusFor to mark the volume as attached to the node. 2. When detach request comes in, it will check whether it is safe to detach now. If the check passes, remove the volume from volumesToReportAsAttached to indicate the volume is no longer considered as attached now. Afterwards, reconciler tries to update node status and trigger detach operation. If any of these operation fails, the volume is added back to the volumesToReportAsAttached list showing that it is still attached. These steps should make sure that kubelet get the right (might be outdated) information about which volume is attached or not. It also garantees that if detach operation is pending, kubelet should not trigger any mount operations.	2016-09-12 13:51:08 -07:00
Jing Xu	b9157b7524	Post event message for volume attachment This PR is to add event message when attaching volume fails to help users to debug. For detach failure, may address in a different PR since it requires more data structure change.	2016-09-01 16:24:36 -07:00
Kubernetes Submit Queue	3d7a105d9b	Merge pull request #30903 from jingxu97/cherrypick-8-19 Automatic merge from submit-queue Avoid failure message flush log when node no longer exist When node is deleted, attach-detach controller cache may contain stale information of this node, and update node status fails in reconciler loop. This message easily flush the log file. This PR is just a quick fix of this issue. More complete fix including make controller cache up to date will be addressed in another PR.	2016-08-19 15:45:58 -07:00
Kubernetes Submit Queue	6ce405c6ee	Merge pull request #27778 from screeley44/k8-vol-executor Automatic merge from submit-queue Add Events for operation_executor to show status of mounts, failed/successful to show in describe events Fixes #27590 @saad-ali @pmorie @erinboyd After talking with @pmorie last week about the above issue, I decided to poke around and see if I could remedy. The refactoring broke my previous UXP merged PR's that correctly showed failed mount errors in the describe events. However, Not sure I implemented correctly, but it tested out and seems to be working, let me know what I missed or if this is not the correct approach. ``` Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 2m 2m 1 {default-scheduler } Normal Scheduled Successfully assigned nfs-bb-pod1 to 127.0.0.1 44s 44s 1 {kubelet 127.0.0.1} Warning FailedMount Unable to mount volumes for pod "nfs-bb-pod1_default(a94f64f1-37c9-11e6-9aa5-52540073d346)": timeout expired waiting for volumes to attach/mount for pod "nfs-bb-pod1"/"default". list of unattached/unmounted volumes=[nfsvol] 44s 44s 1 {kubelet 127.0.0.1} Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "nfs-bb-pod1"/"default". list of unattached/unmounted volumes=[nfsvol] 38s 38s 1 {kubelet } Warning FailedMount Unable to mount volumes for pod "a94f64f1-37c9-11e6-9aa5-52540073d346": Mount failed: exit status 32 Mounting arguments: nfs1.rhs:/opt/data99 /var/lib/kubelet/pods/a94f64f1-37c9-11e6-9aa5-52540073d346/volumes/kubernetes.io~nfs/nfsvol nfs [] Output: mount.nfs: Connection timed out Resolution hint: Check and make sure the NFS Server exists (ensure that correct IPAddress/Hostname was given) and is available/reachable. Also make sure firewall ports are open on both client and NFS Server (2049 v4 and 2049, 20048 and 111 for v3). Use commands telnet <nfs server> <port> and showmount <nfs server> to help test connectivity. ```	2016-08-19 08:27:48 -07:00
Jing Xu	70deeb0ae4	node not exist during node status update should not block others When node is deleted, attach-detach controller cache may contain stale information of this node, and update node status fails in reconciler loop. But one node update failure should not block updating other nodes. Also the warning message easily flush the log file. This PR is just a quick fix of this issue. More complete fix including make sure controller cache up to date will be addressed in another PR.	2016-08-18 13:51:30 -07:00
Kubernetes Submit Queue	9696a27aa0	Merge pull request #30737 from saad-ali/fix29358Round2 Automatic merge from submit-queue Skip safe to detach check if node API object no longer exists Fixes #29358	2016-08-18 04:00:05 -07:00
Scott Creeley	782d7d9815	Add Events for operation_executor to show status of mounts, failed or successful	2016-08-17 09:53:47 -04:00
saadali	0c72568247	Skip safe to detach if node api obj doesn't exist	2016-08-16 21:30:51 -07:00
Avesh Agarwal	52a60fe3be	Fix default resource limits (node capacities) for downward api volumes	2016-08-16 14:41:17 -04:00
Dominika Hodovska	816f6d32ca	Collapse duplicate informer creation paths	2016-08-04 09:02:13 +02:00
Paul Morie	c884297990	Fix collisions issues / timeouts for mounts For non-attachable volumes, do not call GetVolumeName on the plugin and instead generate a unique name based on the identity of the pod and the name of the volume within the pod.	2016-07-27 17:53:50 -04:00
saadali	89fd358c52	Assume volume detached if node doesn't exist Fixes #29358	2016-07-22 22:07:32 -07:00
k8s-merge-robot	99e24da2ff	Merge pull request #29077 from saad-ali/fixIssue29051NamespaceDeletion Automatic merge from submit-queue Fix "PVC Volume not detached if pod deleted via namespace deletion" issue Fixes #29051: "PVC Volume not detached if pod deleted via namespace deletion" This PR: * Fixes a bug in `desired_state_of_the_world_populator.go` to check the value of `exists` returned by the `podInformer` so that it can delete pods even if the delete event is missed (or fails). * Reduces the desired state of the world populators sleep period from 5 min to 1 min (reducing the amount of time a volume would remain attached if a volume delete event is missed or fails).	2016-07-20 20:40:32 -07:00
saadali	afd8a58e5c	Reduce DSW populator sleep period from 5 min to 1	2016-07-20 01:03:04 -07:00
saadali	d210c2231f	Check pod exist in attach controller DSW populator Fix bug in desired_state_of_the_world_populator.go to check exists so that it can delete pods even if the delete event is missed (or fails)	2016-07-20 01:03:04 -07:00
saadali	88d495026d	Allow mounts to run in parallel for non-attachable Allow mount volume operations to run in parallel for non-attachable volume plugins. Allow unmount volume operations to run in parallel for all volume plugins.	2016-07-19 21:54:26 -07:00
Morgan Bauer	69719167a3	close channel to prevent memory leak - wait.JitterUntil goroutine is never cleaned up when used with wait.NeverStop - fixup comment	2016-07-06 09:34:20 -07:00
saadali	0dd17fff22	Reorganize volume controllers and manager	2016-07-01 18:50:25 -07:00

22 Commits (9e1960c5076628a864aee24ee339073fe5e490c1)