This PR fixes two issues
1. When desired_state_populator removes podvolume state, it should check
whether the actual state already has the volume before deleting it to
make sure actual state has a chance to add the volume into the state
2. When checking podVolume still exists, it not only checks the actual
state, but also the volume disk directory because actual state might not
reflect the real world when kubelet starts.
Automatic merge from submit-queue (batch tested with PRs 50381, 51307, 49645, 50995, 51523)
Remove deprecated and experimental fields from KubeletConfiguration
As we work towards providing a stable (v1) kubeletconfig API,
we cannot afford to have deprecated or "experimental" (alpha) fields
living in the KubeletConfiguration struct. This removes all existing
experimental or deprecated fields, and places them in KubeletFlags
instead.
I'm going to send another PR after this one that organizes the remaining
fields into substructures for readability. Then, we should try to move
to v1 ASAP (maybe not v1 in 1.8, given how close we are, but definitely in 1.9).
It makes far more sense to focus on a clean API in kubeletconfig v2,
than to try and further clean up the existing "API" that everyone
already depends on.
fixes: #51657
**Release note**:
```release-note
NONE
```
As we work towards providing a stable (v1) kubeletconfig API,
we cannot afford to have deprecated or "experimental" (alpha) fields
living in the KubeletConfiguration struct. This removes all existing
experimental or deprecated fields, and places them in KubeletFlags
instead.
I'm going to send another PR after this one that organizes the remaining
fields into substructures for readability. Then, we should try to move
to v1 ASAP.
It makes far more sense to focus on a clean API in kubeletconfig v2,
than to try and further clean up the existing "API" that everyone
already depends on.
With bug #27653, kubelet could remove mounted volumes and delete user data.
The bug itself is fixed, however our trust in kubelet is significantly lower.
Let's add an extra version of RemoveAll that does not cross mount boundary
(rm -rf --one-file-system).
It calls lstat(path) three times for each removed directory - once in
RemoveAllOneFilesystem and twice in IsLikelyNotMountPoint, however this way
it's platform independent and the directory that is being removed by kubelet
should be almost empty.
When kubelet restarts, all the information about the volumes will be
gone from actual/desired states. When update node status with mounted
volumes, the volume list might be empty although there are still volumes
are mounted and in turn causing master to detach those volumes since
they are not in the mounted volumes list. This fix is to make sure only
update mounted volumes list after reconciler starts sync states process.
This sync state process will scan the existing volume directories and
reconstruct actual states if they are missing.
This PR also fixes the problem during orphaned pods' directories. In
case of the pod directory is unmounted but has not yet deleted (e.g.,
interrupted with kubelet restarts), clean up routine will delete the
directory so that the pod directoriy could be cleaned up (it is safe to
delete directory since it is no longer mounted)
The third issue this PR fixes is that during reconstruct volume in
actual state, mounter could not be nil since it is required for creating
container.VolumeMap. If it is nil, it might cause nil pointer exception
in kubelet.
Details are in proposal PR #33203
Currently it is possible that the mounter object stored in Mounted
Volume data structure in the actual state of kubelet volume manager is
nil if this information is recovered from state sync process. This will
cause nil pointer issue when calculating stats in volume_stat_calculator.
A quick fix is to not return the volume if its mounter is nil. A more
complete fix is to also recover mounter object when reconstructing the
volume data structure which will be addressed in PR #33616
This commit adds a new volume manager in kubelet that synchronizes
volume mount/unmount (and attach/detach, if attach/detach controller
is not enabled).
This eliminates the race conditions between the pod creation loop
and the orphaned volumes loops. It also removes the unmount/detach
from the `syncPod()` path so volume clean up never blocks the
`syncPod` loop.