Automatic merge from submit-queue (batch tested with PRs 61203, 61071). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix issue with race condition during pod deletion
This PR fixes two issues
1. When desired_state_populator removes podvolume state, it should check
whether the actual state already has the volume before deleting it to
make sure actual state has a chance to add the volume into the state
2. When checking podVolume still exists, it not only checks the actual
state, but also the volume disk directory because actual state might not
reflect the real world when kubelet starts.
fixes issue #60645
This PR fixes two issues
1. When desired_state_populator removes podvolume state, it should check
whether the actual state already has the volume before deleting it to
make sure actual state has a chance to add the volume into the state
2. When checking podVolume still exists, it not only checks the actual
state, but also the volume disk directory because actual state might not
reflect the real world when kubelet starts.
Automatic merge from submit-queue (batch tested with PRs 60888, 61225). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Mark reconstructed volumes as reported InUse
When a newly started kubelet finds a directory where a volume should be,
it can be fairly confident that the volume was mounted by previous kubelet
and therefore the volume must have been in node.status.volumesInUse.
Therefore we can mark reconstructed volumes as already reported so
subsequent reconcile() can fix the directory and put the mounted volume
into actual state of world.
Fixes: #60645
**Release note**:
```release-note
NONE
```
/sig storage
/sig node
cc: @gnufied @jingxu97
When a newly started kubelet finds a directory where a volume should be,
it can be fairly confident that the volume was mounted by previous kubelet
and therefore the volume must have been in node.status.volumesInUse.
Therefore we can mark reconstructed volumes as already reported so
subsequent reconcile() can fix the directory and put the mounted volume
into actual state of world.
Users must not be allowed to step outside the volume with subPath.
Therefore the final subPath directory must be "locked" somehow
and checked if it's inside volume.
On Windows, we lock the directories. On Linux, we bind-mount the final
subPath into /var/lib/kubelet/pods/<uid>/volume-subpaths/<container name>/<subPathName>,
it can't be changed to symlink user once it's bind-mounted.
Automatic merge from submit-queue (batch tested with PRs 60342, 60505, 59218, 52900, 60486). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fixed log calls in VolumeManager
Use glog.Infof() instead of glog.Info()
**Release note**:
```release-note
NONE
```
/sig storage
/sig node
Automatic merge from submit-queue (batch tested with PRs 60236, 60332, 57375, 60451, 57408). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
improve code comment
**What this PR does / why we need it**:
improve code comment
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
use GetUniqueVolumeNameFromSpec instead of implementing it manually for kubelet volume manager
**What this PR does / why we need it**:
`volumeName` is only used for attachable plugin, so we should resolve it inside the `if` statement. Besides, we can use the already exist `GetUniqueVolumeNameFromSpec` mthod instead of invoking `GetVolumeName` and `GetUniqueVolumeName` manually.
**Release note**:
```release-note
NONE
```
/sig storage
/kind cleanup
Automatic merge from submit-queue (batch tested with PRs 59767, 56454, 59237, 59730, 55479). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Block Volume: Refactor volumehandler in operationexecutor
**What this PR does / why we need it**:
Based on discussion with @saad-ali at #51494, we need refactor volumehandler in operationexecutor
for Block Volume feature. We don't need to add volumehandler as separated object.
```
VolumeHandler does not need to be a separate object that is constructed inline like this.
You can create a new operation, e.g. UnmountOperation to which you pass the spec,
and it can return either a UnmountVolume or UnmapVolume.
```
**Which issue(s) this PR fixes** : no related issue.
**Special notes for your reviewer**:
@saad-ali @msau42
**Release note**:
```release-note
NONE
```
This PR is the first part of redesign of volume reconstruction work. The
changes include
1. Remove dependency on volume spec stored in actual state for volume
cleanup process (UnmountVolume and UnmountDevice)
Modify AttachedVolume struct to add DeviceMountPath so that volume
unmount operation can use this information instead of constructing from
volume spec
2. Modify reconciler's volume reconstruction process (syncState). Currently workflow
is when kubelet restarts, syncState() is only called once before
reconciler starts its loop.
a. If volume plugin supports reconstruction, it will use the
reconstructed volume spec information to update actual state as before.
b. If volume plugin cannot support reconstruction, it will use the
scanned mount path information to clean up the mounts.
In this PR, all the plugins still support reconstruction (except
glusterfs), so reconstruction of some plugins will still have issues.
The next PR will modify those plugins that cannot support reconstruction
well.
This PR addresses issue #52683, #54108 (This PR includes the changes to
update devicePath after local attach finishes)
This message is verbose and repeated over and over again in log files
creating a lot of noise. Leave the messsage in, but require a -v in
order to actually log it.
Fixes#29059
Automatic merge from submit-queue (batch tested with PRs 45990, 45544, 45745, 45742, 45678)
Refactor reconciler volume log and error messages
**What this PR does / why we need it**:
Utilizes volume-specific error and log messages introduced in #44969, inside files that also log volume information.
Specifically:
- pkg/kubelet/volumemanager/reconciler/reconciler.go,
- pkg/controller/volume/attachdetach/reconciler/reconciler.go, and
- pkg/kubelet/volumemanager/populator/desired_state_of_world_populator.go
**Which issue this PR fixes** : fixes#40905
**Special notes for your reviewer**:
**Release note**:
```release-note
```
NONE
To safely mark a volume detached when the volume controller manager is used.
An example of one such problem:
1. pod is created, volume is added to desired state of the world
2. reconciler process starts
3. reconciler starts MountVolume, which is kicked off asynchronously via
operation_executor.go
4. MountVolume mounts the volume, but hasn't yet marked it as mounted
5. pod is deleted, volume is removed from desired state of the world
6. reconciler detects volume is no longer in desired state of world,
removes it from volumes in use
7. MountVolume tries to mark volume in use, throws an error because
volume is no longer in actual state of world list.
8. controller-manager tries to detach the volume, this fails because it
is still mounted to the OS.
9. EBS gets stuck indefinitely in busy state trying to detach.
During state reconstruction when kubelet restarts, outerVolueSpecName
cannot be recovered by scanning the disk directories. But this
information is used by volume manager to check whether pod's volume is
mounted or not. There are two possible cases:
1. pod is not deleted during kubelet restarts so that desired state
should have the information. reconciler.updateState() will use this
inforamtion to update.
2. pod is deleted during this period, reconciler has to use
InnerVolumeSpecName, but it should be ok since this information will not
be used for volume cleanup (umount)
When kubelet restarts, all the information about the volumes will be
gone from actual/desired states. When update node status with mounted
volumes, the volume list might be empty although there are still volumes
are mounted and in turn causing master to detach those volumes since
they are not in the mounted volumes list. This fix is to make sure only
update mounted volumes list after reconciler starts sync states process.
This sync state process will scan the existing volume directories and
reconstruct actual states if they are missing.
This PR also fixes the problem during orphaned pods' directories. In
case of the pod directory is unmounted but has not yet deleted (e.g.,
interrupted with kubelet restarts), clean up routine will delete the
directory so that the pod directoriy could be cleaned up (it is safe to
delete directory since it is no longer mounted)
The third issue this PR fixes is that during reconstruct volume in
actual state, mounter could not be nil since it is required for creating
container.VolumeMap. If it is nil, it might cause nil pointer exception
in kubelet.
Details are in proposal PR #33203
We had another bug where we confused the hostname with the NodeName.
To avoid this happening again, and to make the code more
self-documenting, we use types.NodeName (a typedef alias for string)
whenever we are referring to the Node.Name.
A tedious but mechanical commit therefore, to change all uses of the
node name to use types.NodeName
Also clean up some of the (many) places where the NodeName is referred
to as a hostname (not true on AWS), or an instanceID (not true on GCE),
etc.
Currently kubelet volume management works on the concept of desired
and actual world of states. The volume manager periodically compares the
two worlds and perform volume mount/unmount and/or attach/detach
operations. When kubelet restarts, the cache of those two worlds are
gone. Although desired world can be recovered through apiserver, actual
world can not be recovered which may cause some volumes cannot be cleaned
up if their information is deleted by apiserver. This change adds the
reconstruction of the actual world by reading the pod directories from
disk. The reconstructed volume information is added to both desired
world and actual world if it cannot be found in either world. The rest
logic would be as same as before, desired world populator may clean up
the volume entry if it is no longer in apiserver, and then volume
manager should invoke unmount to clean it up.
Allow mount volume operations to run in parallel for non-attachable
volume plugins.
Allow unmount volume operations to run in parallel for all volume
plugins.