Commit Graph

167 Commits (750881c0ab62a9f59d53582bc33826caa167aa83)

Author SHA1 Message Date
Cheng Xing f9dc2d5ca3 Node status updater now deletes the node entry in attach updates when node is missing in NodeInformer cache. Fixes #42438.
- Added RemoveNodeFromAttachUpdates as part of node status updater operations.
2017-05-24 18:31:47 -07:00
NickrenREN add091b1fb fix regression in UX experience for double attach volume
send event when volume is not allowed to multi-attach
2017-05-25 09:27:24 +08:00
Michelle Au dd46c7f88e Local volume plugin 2017-05-22 14:44:51 -07:00
Michelle Au 06f25b03eb Check volume node affinity before mount 2017-05-22 14:44:06 -07:00
FengyunPan 4a6e1f2a1d Don't return err when volume's status is 'attaching'
When volume's status is 'attaching', its attachments will be None,
controllermanager can't get device path and make some failed event.
But it is normal, let's fix it.
2017-05-12 19:53:50 +08:00
Kubernetes Submit Queue 5b3d0bbe66 Merge pull request #44714 from jamiehannaford/unix_user_type
Automatic merge from submit-queue (batch tested with PRs 44590, 44969, 45325, 45208, 44714)

Use dedicated UnixUserID and UnixGroupID types

**What this PR does / why we need it**:

DRYs up type definitions by using the dedicated types in apimachinery 

**Which issue this PR fixes**

#38120

**Release note**:
```release-note
UIDs and GIDs now use apimachinery types
```
2017-05-05 14:08:17 -07:00
Jamie Hannaford 9440a68744 Use dedicated Unix User and Group ID types 2017-05-05 14:07:38 +02:00
Ian Chakeres bbc8859176 Refactor volume operation log and error messages 2017-05-04 13:29:01 -07:00
Tomas Smetana 852c44ae59 Fix issue #34242: Attach/detach should recover from a crash
When the attach/detach controller crashes and a pod with attached PV is deleted
afterwards the controller will never detach the pod's attached volumes. To
prevent this the controller should try to recover the state from the nodes
status.
2017-04-20 13:04:50 +02:00
Mike Danese a05c3c0efd autogenerated 2017-04-14 10:40:57 -07:00
Kubernetes Submit Queue b625085230 Merge pull request #42325 from tsmetana/remove-unused-method-from-og
Automatic merge from submit-queue

Remove unused method from operation_generator

This is only a removal of the GerifyVolumeIsSafeToDetach [sic] method from operation_executor. The method is not called from anywhere, moreover there is a private method named verifyVolumeIsSafeToDetach (which is being used). This looks like a cut&paste mistake that deserves to be cleaned.
```release-note
NONE
```
2017-03-31 10:56:40 -07:00
Kubernetes Submit Queue 803369b9cc Merge pull request #42006 from screeley44/error-events3
Automatic merge from submit-queue (batch tested with PRs 42522, 42545, 42556, 42006, 42631)

Fixes MountVolume.NewMounter errors not displayed to users via describe events

Fixes #42004 

This fixes the problem of mount errors being eaten and not displayed to users again.  Specifically erros caught in MountVolume.NewMounter (like missing endpoints, etc...)

Current behavior for any mount failure:

```
Events:
  FirstSeen    LastSeen    Count    From            SubObjectPath    Type        Reason        Message
  ---------    --------    -----    ----            -------------    --------    ------        -------
  12m        12m        1    default-scheduler            Normal        Scheduled    Successfully assigned glusterfs-bb-pod1 to 127.0.0.1
  10m        1m        5    kubelet, 127.0.0.1            Warning        FailedMount    Unable to mount volumes for pod "glusterfs-bb-pod1_default(67c9dfa7-f9f5-11e6-aee2-5254003a59cf)": timeout expired waiting for volumes to attach/mount for pod "default"/"glusterfs-bb-pod1". list of unattached/unmounted volumes=[glusterfsvol]
  10m        1m        5    kubelet, 127.0.0.1            Warning        FailedSync    Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "default"/"glusterfs-bb-pod1". list of unattached/unmounted volumes=[glusterfsvol]
```

New Behavior:

For example on glusterfs - deliberately didn't create endpoints, now correct message is displayed:
```
Events:
  FirstSeen	LastSeen	Count	From			SubObjectPath	Type		Reason		Message
  ---------	--------	-----	----			-------------	--------	------		-------
  2m		2m		1	default-scheduler			Normal		Scheduled	Successfully assigned glusterfs-bb-pod1 to 127.0.0.1
  54s		54s		1	kubelet, 127.0.0.1			Warning		FailedMount	Unable to mount volumes for pod "glusterfs-bb-pod1_default(8edd2c25-fa09-11e6-92ae-5254003a59cf)": timeout expired waiting for volumes to attach/mount for pod "default"/"glusterfs-bb-pod1". With error timed out waiting for the condition. list of unattached/unmounted volumes=[glusterfsvol]
  54s		54s		1	kubelet, 127.0.0.1			Warning		FailedSync	Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "default"/"glusterfs-bb-pod1". With error timed out waiting for the condition. list of unattached/unmounted volumes=[glusterfsvol]
  2m		6s		814	kubelet, 127.0.0.1			Warning		FailedMount	MountVolume.NewMounter failed for volume "kubernetes.io/glusterfs/8edd2c25-fa09-11e6-92ae-5254003a59cf-glusterfsvol" (spec.Name: "glusterfsvol") pod "8edd2c25-fa09-11e6-92ae-5254003a59cf" (UID: "8edd2c25-fa09-11e6-92ae-5254003a59cf") with: endpoints "glusterfs-cluster" not found
```
2017-03-24 15:10:33 -07:00
Hemant Kumar 786da1de12 Impement bulk polling of volumes
This implements Bulk volume polling using ideas presented by
justin in https://github.com/kubernetes/kubernetes/pull/39564

But it changes the implementation to use an interface
and doesn't affect other implementations.
2017-03-02 14:59:59 -05:00
Scott Creeley 762ca8e8a9 adding some debug 2017-03-01 13:30:21 -05:00
Hemant Kumar 2d3008fc56 Implement support for mount options in PVs
Add support for mount options via annotations on PVs
2017-03-01 11:50:40 -05:00
Tomas Smetana 58edea18de Remove unused method from operation_generator 2017-03-01 10:42:53 +01:00
James Ravn 9992bd23c2 Mark detached only if no pending operations
To safely mark a volume detached when the volume controller manager is used.

An example of one such problem:

1. pod is created, volume is added to desired state of the world
2. reconciler process starts
3. reconciler starts MountVolume, which is kicked off asynchronously via
   operation_executor.go
4. MountVolume mounts the volume, but hasn't yet marked it as mounted
5. pod is deleted, volume is removed from desired state of the world
6. reconciler detects volume is no longer in desired state of world,
   removes it from volumes in use
7. MountVolume tries to mark volume in use, throws an error because
   volume is no longer in actual state of world list.
8. controller-manager tries to detach the volume, this fails because it
   is still mounted to the OS.
9. EBS gets stuck indefinitely in busy state trying to detach.
2017-02-13 11:51:44 +00:00
Luca Bruno 85b1def175
test: update to use mounttest:0.8 and mounttest-user:0.5 2017-02-02 20:41:18 +00:00
deads2k 8a12000402 move client/record 2017-01-31 19:14:13 -05:00
Dr. Stefan Schimanski a0137e9b28 Update generated files 2017-01-25 19:49:45 +01:00
Dr. Stefan Schimanski d7eb3b6870 pkg/util: move uuid and strategicpatch into k8s.io/apimachinery 2017-01-25 19:45:09 +01:00
Kubernetes Submit Queue f56b606985 Merge pull request #36520 from apelisse/owners-pkg-volume
Automatic merge from submit-queue

Curating Owners: pkg/volume

cc @jsafrane @spothanis @agonzalezro @justinsb @johscheuer @simonswine @nelcy @pmorie @quofelix @sdminonne @thockin @saad-ali @rootfs

In an effort to expand the existing pool of reviewers and establish a
two-tiered review process (first someone lgtms and then someone
experienced in the project approves), we are adding new reviewers to
existing owners files.


If You Care About the Process:
------------------------------

We did this by algorithmically figuring out who’s contributed code to
the project and in what directories.  Unfortunately, that doesn’t work
well: people that have made mechanical code changes (e.g change the
copyright header across all directories) end up as reviewers in lots of
places.

Instead of using pure commit data, we generated an excessively large
list of reviewers and pruned based on all time commit data, recent
commit data and review data (number of PRs commented on).

At this point we have a decent list of reviewers, but it needs one last
pass for fine tuning.

Also, see https://github.com/kubernetes/contrib/issues/1389.

TLDR:
-----

As an owner of a sig/directory and a leader of the project, here’s what
we need from you:

1. Use PR https://github.com/kubernetes/kubernetes/pull/35715 as an example.

2. The pull-request is made editable, please edit the `OWNERS` file to
remove the names of people that shouldn't be reviewing code in the
future in the **reviewers** section. You probably do NOT need to modify
the **approvers** section. Names asre sorted by relevance, using some
secret statistics.

3. Notify me if you want some OWNERS file to be removed.  Being an
approver or reviewer of a parent directory makes you a reviewer/approver
of the subdirectories too, so not all OWNERS files may be necessary.

4. Please use ALIAS if you want to use the same list of people over and
over again (don't hesitate to ask me for help, or use the pull-request
above as an example)
2017-01-17 19:56:39 -08:00
Clayton Coleman 9a2a50cda7
refactor: use metav1.ObjectMeta in other types 2017-01-17 16:17:19 -05:00
deads2k 77b4d55982 mechanical 2017-01-16 09:35:12 -05:00
deads2k 6a4d5cd7cc start the apimachinery repo 2017-01-11 09:09:48 -05:00
Jeff Grafton 20d221f75c Enable auto-generating sources rules 2017-01-05 14:14:13 -08:00
Kubernetes Submit Queue fd7408d076 Merge pull request #39288 from rkouj/unit-test-operation-executor
Automatic merge from submit-queue

Add unit tests for operation_executor

Add unit test for `Unmount operations should start in parallel for all volume plugins`

cc: @saad-ali
2017-01-04 18:52:22 -08:00
rkouj 7ebcab8c19 Add unit tests for operation_executor 2016-12-29 18:37:22 -08:00
Mike Danese 161c391f44 autogenerated 2016-12-29 13:04:10 -08:00
rkouj e7e3c55ad7 Add unit tests for MountVolume() of operation executor 2016-12-27 16:07:06 -08:00
rkouj d5f7610b82 Refactor operation_executor to make it unit testable 2016-12-27 15:12:16 -08:00
Harry Zhang 5a7661b483 Raise markVolMountedErr instead of mount err 2016-12-26 07:52:51 +00:00
Chao Xu 03d8820edc rename /release_1_5 to /clientset 2016-12-14 12:39:48 -08:00
Mike Danese c87de85347 autoupdate BUILD files 2016-12-12 13:30:07 -08:00
Wojciech Tyczynski aa7da5231f Update bazel files 2016-12-09 09:42:02 +01:00
Wojciech Tyczynski e8d1cba875 GetOptions in client calls 2016-12-09 09:42:01 +01:00
Jing Xu bb8b54af18 Fix unmountDevice issue caused by shared mount in GCI
This is a fix on top #38124. In this fix, we move the logic to filter
out shared mount references into operation_executor's UnmountDevice
function to avoid this part is being used by other types volumes such as
rdb, azure etc. This filter function should be only needed during
unmount device for GCI image.
2016-12-08 13:34:45 -08:00
Jess Frazelle 4d27212149
fix golint errors on master for 1.6
Signed-off-by: Jess Frazelle <acidburn@google.com>
2016-12-05 15:01:33 -08:00
Kubernetes Submit Queue e94118411c Merge pull request #36900 from vwfs/volume_reconciler_verbosity
Automatic merge from submit-queue

Reduce verbosity of volume reconciler

**What this PR does / why we need it**:
It reduces the log verbosity for attaching of volumes

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:

**Release note**:
```release-note
Reduce verbosity of volume reconciler when attaching volumes
```

Set logging level for information about attaching of volumes to from 1 to 4
Otherwise the log is spammed with one line per 100ms while attaching is
in progress and afterwards as long as the volume is attached.
2016-11-28 11:42:10 -08:00
Chao Xu bcc783c594 run hack/update-all.sh 2016-11-23 15:53:09 -08:00
Chao Xu bb675d395f dependencies: pkg/volume 2016-11-23 15:53:09 -08:00
Alexander Block 1c35e3c275 Add missing nodeName parameter to log call 2016-11-19 13:56:38 +01:00
Rajat Ramesh Koujalagi d81e216fc6 Better messaging for missing volume components on host to perform mount 2016-11-09 15:16:11 -08:00
Antoine Pelisse fd510b1207 Update OWNERS approvers and reviewers: pkg/volume 2016-11-09 10:17:36 -08:00
Kubernetes Submit Queue 00269a6c60 Merge pull request #35434 from rootfs/deviceopen
Automatic merge from submit-queue

refactor DeviceOpened() so it won't return error if device doesn't exist

<!--  Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->

**What this PR does / why we need it**:
DeviceOpened() is called after device is unmounted but before detached. Some volumes such as rbd don't support 3rd party detach, they have to be detached during unmount. Once detached, the device path vanishes. This causes false alarm when DeviceOpened() is called.

The fix is to ignore error IsNotExist 

**Which issue this PR fixes** _(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)_: fixes #

**Special notes for your reviewer**:
@kubernetes/sig-storage 

**Release note**:

<!--  Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access) 
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`. 
-->

``` release-note
```

Signed-off-by: Huamin Chen hchen@redhat.com
2016-11-04 07:40:03 -07:00
Huamin Chen 901e084a98 checking if the device path is valid before calling DeviceOpened() to avoid false negative on devices that don't exist any more
Signed-off-by: Huamin Chen <hchen@redhat.com>
2016-11-03 15:02:15 -04:00
Jing Xu abbde43374 Add sync state loop in master's volume reconciler
At master volume reconciler, the information about which volumes are
attached to nodes is cached in actual state of world. However, this
information might be out of date in case that node is terminated (volume
is detached automatically). In this situation, reconciler assume volume
is still attached and will not issue attach operation when node comes
back. Pods created on those nodes will fail to mount.

This PR adds the logic to periodically sync up the truth for attached volumes kept in the actual state cache. If the volume is no longer attached to the node, the actual state will be updated to reflect the truth. In turn, reconciler will take actions if needed.

To avoid issuing many concurrent operations on cloud provider, this PR
tries to add batch operation to check whether a list of volumes are
attached to the node instead of one request per volume.

More details are explained in PR #33760
2016-10-28 09:24:53 -07:00
Jing Xu b02481708a Fix volume states out of sync problem after kubelet restarts
When kubelet restarts, all the information about the volumes will be
gone from actual/desired states. When update node status with mounted
volumes, the volume list might be empty although there are still volumes
are mounted and in turn causing master to detach those volumes since
they are not in the mounted volumes list. This fix is to make sure only
update mounted volumes list after reconciler starts sync states process.
This sync state process will scan the existing volume directories and
reconstruct actual states if they are missing.

This PR also fixes the problem during orphaned pods' directories. In
case of the pod directory is unmounted but has not yet deleted (e.g.,
interrupted with kubelet restarts), clean up routine will delete the
directory so that the pod directoriy could be cleaned up (it is safe to
delete directory since it is no longer mounted)

The third issue this PR fixes is that during reconstruct volume in
actual state, mounter could not be nil since it is required for creating
container.VolumeMap. If it is nil, it might cause nil pointer exception
in kubelet.

Details are in proposal PR #33203
2016-10-25 12:29:12 -07:00
Mike Danese 3b6a067afc autogenerated 2016-10-21 17:32:32 -07:00
Jedrzej Nowak f0988b95e7 Typos and englishify pkg/volume 2016-10-03 22:39:33 +02:00
Justin Santa Barbara 54195d590f Use strongly-typed types.NodeName for a node name
We had another bug where we confused the hostname with the NodeName.

To avoid this happening again, and to make the code more
self-documenting, we use types.NodeName (a typedef alias for string)
whenever we are referring to the Node.Name.

A tedious but mechanical commit therefore, to change all uses of the
node name to use types.NodeName

Also clean up some of the (many) places where the NodeName is referred
to as a hostname (not true on AWS), or an instanceID (not true on GCE),
etc.
2016-09-27 10:47:31 -04:00
Jing Xu efaceb28cc Fix race condition in updating attached volume between master and node
This PR tries to fix issue #29324. This cause of this issue is a race
condition happens when marking volumes as attached for node status. This
PR tries to clean up the logic of when and where to mark volumes as
attached/detached. Basically the workflow as follows,
1. When volume is attached sucessfully, the volume and node info is
added into nodesToUpdateStatusFor to mark the volume as attached to the
node.
2. When detach request comes in, it will check whether it is safe to
detach now. If the check passes, remove the volume from volumesToReportAsAttached
to indicate the volume is no longer considered as attached now.
Afterwards, reconciler tries to update node status and trigger detach
operation. If any of these operation fails, the volume is added back to
the volumesToReportAsAttached list showing that it is still attached.

These steps should make sure that kubelet get the right (might be
outdated) information about which volume is attached or not. It also
garantees that if detach operation is pending, kubelet should not
trigger any mount operations.
2016-09-12 13:51:08 -07:00
Jing Xu b9157b7524 Post event message for volume attachment
This PR is to add event message when attaching volume fails to help
users to debug. For detach failure, may address in a different PR since
it requires more data structure change.
2016-09-01 16:24:36 -07:00
Kubernetes Submit Queue 6ce405c6ee Merge pull request #27778 from screeley44/k8-vol-executor
Automatic merge from submit-queue

Add Events for operation_executor to show status of mounts, failed/successful to show in describe events

Fixes #27590 
@saad-ali @pmorie @erinboyd

After talking with @pmorie last week about the above issue, I decided to poke around and see if I could remedy.  The refactoring broke my previous UXP merged PR's that correctly showed failed mount errors in the describe events.  However, Not sure I implemented correctly, but it tested out and seems to be working, let me know what I missed or if this is not the correct approach.

```
Events:
  FirstSeen	LastSeen	Count	From			SubobjectPath	Type		Reason		Message
  ---------	--------	-----	----			-------------	--------	------		-------
  2m		2m		1	{default-scheduler }			Normal		Scheduled	Successfully assigned nfs-bb-pod1 to 127.0.0.1
  44s		44s		1	{kubelet 127.0.0.1}			Warning		FailedMount	Unable to mount volumes for pod "nfs-bb-pod1_default(a94f64f1-37c9-11e6-9aa5-52540073d346)": timeout expired waiting for volumes to attach/mount for pod "nfs-bb-pod1"/"default". list of unattached/unmounted volumes=[nfsvol]
  44s		44s		1	{kubelet 127.0.0.1}			Warning		FailedSync	Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "nfs-bb-pod1"/"default". list of unattached/unmounted volumes=[nfsvol]
  38s		38s		1	{kubelet }				Warning		FailedMount	Unable to mount volumes for pod "a94f64f1-37c9-11e6-9aa5-52540073d346": Mount failed: exit status 32
Mounting arguments: nfs1.rhs:/opt/data99 /var/lib/kubelet/pods/a94f64f1-37c9-11e6-9aa5-52540073d346/volumes/kubernetes.io~nfs/nfsvol nfs []
Output: mount.nfs: Connection timed out

Resolution hint: Check and make sure the NFS Server exists (ensure that correct IPAddress/Hostname was given) and is available/reachable.
Also make sure firewall ports are open on both client and NFS Server (2049 v4 and 2049, 20048 and 111 for v3).
Use commands telnet <nfs server> <port> and showmount <nfs server> to help test connectivity.
```
2016-08-19 08:27:48 -07:00
Scott Creeley 782d7d9815 Add Events for operation_executor to show status of mounts, failed or successful 2016-08-17 09:53:47 -04:00
saadali 0c72568247 Skip safe to detach if node api obj doesn't exist 2016-08-16 21:30:51 -07:00
Jing Xu f19a1148db This change supports robust kubelet volume cleanup
Currently kubelet volume management works on the concept of desired
and actual world of states. The volume manager periodically compares the
two worlds and perform volume mount/unmount and/or attach/detach
operations. When kubelet restarts, the cache of those two worlds are
gone. Although desired world can be recovered through apiserver, actual
world can not be recovered which may cause some volumes cannot be cleaned
up if their information is deleted by apiserver. This change adds the
reconstruction of the actual world by reading the pod directories from
disk. The reconstructed volume information is added to both desired
world and actual world if it cannot be found in either world. The rest
logic would be as same as before, desired world populator may clean up
the volume entry if it is no longer in apiserver, and then volume
manager should invoke unmount to clean it up.
2016-08-15 11:29:15 -07:00
Paul Morie c884297990 Fix collisions issues / timeouts for mounts
For non-attachable volumes, do not call GetVolumeName on the plugin and instead
generate a unique name based on the identity of the pod and the name of the volume
within the pod.
2016-07-27 17:53:50 -04:00
saadali 88d495026d Allow mounts to run in parallel for non-attachable
Allow mount volume operations to run in parallel for non-attachable
volume plugins.

Allow unmount volume operations to run in parallel for all volume
plugins.
2016-07-19 21:54:26 -07:00
Cindy Wang e13c678e3b Make volume unmount more robust using exclusive mount w/ O_EXCL 2016-07-18 16:20:08 -07:00
saadali 0dd17fff22 Reorganize volume controllers and manager 2016-07-01 18:50:25 -07:00
David McMahon ef0c9f0c5b Remove "All rights reserved" from all the headers. 2016-06-29 17:47:36 -07:00
saadali e06b32b1ef Mark VolumeInUse before checking if it is Attached
Ensure that kublet marks VolumeInUse before checking if it is Attached.
Also ensures that the attach/detach controller always fetches a fresh
copy of the node object before detach (instead ofKubelet relying on node
informer cache).
2016-06-28 14:05:59 -07:00
saadali dfe8e606c1 Fix device path used by volume WaitForAttach 2016-06-22 12:56:58 -07:00
saadali e716ddc771 Controller wait for attach and exponential backoff
Modify attach/detach controller to keep track of volumes to report
attached in Node VolumeToAttach status.

Modify kubelet volume manager to wait for volume to show up in Node
VolumeToAttach status.

Implement exponential backoff for errors in volume manager and attach
detach controller
2016-06-20 18:19:55 -07:00
saadali d72f88bf3a Modify Attach method to return device path 2016-06-19 23:54:02 -07:00
saadali 542f2dc708 Introduce new kubelet volume manager
This commit adds a new volume manager in kubelet that synchronizes
volume mount/unmount (and attach/detach, if attach/detach controller
is not enabled).

This eliminates the race conditions between the pod creation loop
and the orphaned volumes loops. It also removes the unmount/detach
from the `syncPod()` path so volume clean up never blocks the
`syncPod` loop.
2016-06-15 09:34:08 -07:00