Commit Graph

84 Commits (bb29efb0c36ad047a560cb8d1593354b021279a1)

Author SHA1 Message Date
Kubernetes Submit Queue 6dfe5c49f6 Merge pull request #38865 from vwfs/ext4_no_lazy_init
Automatic merge from submit-queue

Enable lazy initialization of ext3/ext4 filesystems

**What this PR does / why we need it**: It enables lazy inode table and journal initialization in ext3 and ext4.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #30752, fixes #30240

**Release note**:
```release-note
Enable lazy inode table and journal initialization for ext3 and ext4
```

**Special notes for your reviewer**:
This PR removes the extended options to mkfs.ext3/mkfs.ext4, so that the defaults (enabled) for lazy initialization are used.

These extended options come from a script that was historically located at */usr/share/google/safe_format_and_mount* and later ported to GO so this dependency to the script could be removed. After some search, I found the original script here: https://github.com/GoogleCloudPlatform/compute-image-packages/blob/legacy/google-startup-scripts/usr/share/google/safe_format_and_mount

Checking the history of this script, I found the commit [Disable lazy init of inode table and journal.](4d7346f7f5). This one introduces the extended flags with this description:
```
Now that discard with guaranteed zeroing is supported by PD,
initializing them is really fast and prevents perf from being affected
when the filesystem is first mounted.
```

The problem is, that this is not true for all cloud providers and all disk types, e.g. Azure and AWS. I only tested with magnetic disks on Azure and AWS, so maybe it's different for SSDs on these cloud providers. The result is that this performance optimization dramatically increases the time needed to format a disk in such cases.

When mkfs.ext4 is told to not lazily initialize the inode tables and the check for guaranteed zeroing on discard fails, it falls back to a very naive implementation that simply loops and writes zeroed buffers to the disk. Performance on this highly depends on free memory and also uses up all this free memory for write caching, reducing performance of everything else in the system. 

As of https://github.com/kubernetes/kubernetes/issues/30752, there is also something inside kubelet that somehow degrades performance of all this. It's however not exactly known what it is but I'd assume it has something to do with cgroups throttling IO or memory. 

I checked the kernel code for lazy inode table initialization. The nice thing is, that the kernel also does the guaranteed zeroing on discard check. If it is guaranteed, the kernel uses discard for the lazy initialization, which should finish in a just few seconds. If it is not guaranteed, it falls back to using *bio*s, which does not require the use of the write cache. The result is, that free memory is not required and not touched, thus performance is maxed and the system does not suffer.

As the original reason for disabling lazy init was a performance optimization and the kernel already does this optimization by default (and in a much better way), I'd suggest to completely remove these flags and rely on the kernel to do it in the best way.
2017-01-18 09:09:52 -08:00
deads2k 6a4d5cd7cc start the apimachinery repo 2017-01-11 09:09:48 -05:00
Jeff Grafton 20d221f75c Enable auto-generating sources rules 2017-01-05 14:14:13 -08:00
Mike Danese 161c391f44 autogenerated 2016-12-29 13:04:10 -08:00
Alexander Block 13a2bc8afb Enable lazy initialization of ext3/ext4 filesystems 2016-12-18 11:08:51 +01:00
Mike Danese c87de85347 autoupdate BUILD files 2016-12-12 13:30:07 -08:00
Jing Xu bb8b54af18 Fix unmountDevice issue caused by shared mount in GCI
This is a fix on top #38124. In this fix, we move the logic to filter
out shared mount references into operation_executor's UnmountDevice
function to avoid this part is being used by other types volumes such as
rdb, azure etc. This filter function should be only needed during
unmount device for GCI image.
2016-12-08 13:34:45 -08:00
Jing Xu 896e0b867e Fix unmount issue cuased by GCI mounter
this is a workaround for the unmount device issue caused by gci mounter. In GCI cluster, if gci mounter is used for mounting, the container started by mounter script will cause additional mounts created in the container. Since these mounts are irrelavant to the original mounts, they should be not considered when checking the mount references. By comparing the mount path prefix, those additional mounts can be filtered out.

Plan to work on better approach to solve this issue.
2016-12-06 12:24:07 -08:00
Jing Xu 37136e9780 Enable containerized mounter only for nfs and glusterfs types
This change is to only enable containerized mounter for nfs and
glusterfs types. For other types such as tmpfs, ext2/3/4 or empty type,
we should still use mount from $PATH
2016-12-02 15:06:24 -08:00
Pengfei Ni f584ed4398 Fix package aliases to follow golang convention 2016-11-30 15:40:50 +08:00
Jing Xu 3d3e44e77e fix issue in converting aws volume id from mount paths
This PR is to fix the issue in converting aws volume id from mount
paths. Currently there are three aws volume id formats supported. The
following lists example of those three formats and their corresponding
global mount paths:
1. aws:///vol-123456
(/var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/vol-123456)
2. aws://us-east-1/vol-123456
(/var/lib/kubelet/plugins/kubernetes.io/mounts/aws/us-est-1/vol-123455)
3. vol-123456
(/var/lib/kubelet/plugins/kubernetes.io/mounts/aws/us-est-1/vol-123455)

For the first two cases, we need to check the mount path and convert
them back to the original format.
2016-11-16 22:35:48 -08:00
Vishnu kannan dd8ec911f3 Revert "Revert "Merge pull request #35821 from vishh/gci-mounter-scope""
This reverts commit 402116aed4.
2016-11-08 11:09:10 -08:00
saadali 402116aed4 Revert "Merge pull request #35821 from vishh/gci-mounter-scope"
This reverts commit 973fa6b334, reversing
changes made to 41b5fe86b6.
2016-11-03 20:23:25 -07:00
Vishnu Kannan 414e4ae549 Revert "Adding a root filesystem override for kubelet mounter"
This reverts commit e861a5761d.
2016-11-02 15:18:09 -07:00
Vishnu Kannan 1ecc12f724 [Kubelet] Do not use custom mounter script for bind mounts, ext* and tmpfs mounts
Signed-off-by: Vishnu Kannan <vishnuk@google.com>
2016-11-02 15:18:08 -07:00
Kubernetes Submit Queue 9302dcb6fb Merge pull request #35721 from apprenda/fix_mounter_struct
Automatic merge from submit-queue

Fixes PR #35652

This is breaking the build. Fixes #35564

/cc @vishh @sjenning

Fixes PR #35652
2016-10-27 15:20:48 -07:00
Paulo Pires 01adb460de
Fixes PR #35652 2016-10-27 15:55:01 -04:00
Vishnu kannan 7fd03c4b6e Fix source and target path with overriden rootfs in mount utility package
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2016-10-27 09:46:33 -07:00
Vishnu kannan e861a5761d Adding a root filesystem override for kubelet mounter
This is useful for supporting hostPath volumes via containerized
mounters in kubelet.

Signed-off-by: Vishnu kannan <vishnuk@google.com>
2016-10-26 21:42:59 -07:00
Huamin Chen 758e8b8b8f add IsNotMountPoint() to mount_unsupported.go
Signed-off-by: Huamin Chen <hchen@redhat.com>
2016-10-25 20:20:17 -04:00
Jing Xu b02481708a Fix volume states out of sync problem after kubelet restarts
When kubelet restarts, all the information about the volumes will be
gone from actual/desired states. When update node status with mounted
volumes, the volume list might be empty although there are still volumes
are mounted and in turn causing master to detach those volumes since
they are not in the mounted volumes list. This fix is to make sure only
update mounted volumes list after reconciler starts sync states process.
This sync state process will scan the existing volume directories and
reconstruct actual states if they are missing.

This PR also fixes the problem during orphaned pods' directories. In
case of the pod directory is unmounted but has not yet deleted (e.g.,
interrupted with kubelet restarts), clean up routine will delete the
directory so that the pod directoriy could be cleaned up (it is safe to
delete directory since it is no longer mounted)

The third issue this PR fixes is that during reconstruct volume in
actual state, mounter could not be nil since it is required for creating
container.VolumeMap. If it is nil, it might cause nil pointer exception
in kubelet.

Details are in proposal PR #33203
2016-10-25 12:29:12 -07:00
Michael Taufen dba917c5b7 Include mount command in Kubelet mounter output 2016-10-24 05:50:24 -07:00
Mike Danese 3b6a067afc autogenerated 2016-10-21 17:32:32 -07:00
Klaus Ma d2e18d6bd3 Added 'mounterPath' to Mounter interface in 'mount_unsupported.go'. 2016-10-21 16:33:43 +08:00
Jing Xu 34ef93aa0c Add mounterPath to mounter interface
In order to be able to use new mounter library, this PR adds the
mounterPath flag to kubelet which passes the flag to the mount
interface. If flag is empty, mount uses default mount path.
2016-10-20 14:15:27 -07:00
Jędrzej Nowak 7a7c36261e fixed absense to absence 2016-10-14 16:28:46 +02:00
Brendan Burns 07c8f9a173 Don't return an error if a file doesn't exist for IsPathDevice(...) 2016-09-05 20:45:22 -07:00
Morgan Bauer 92a043e833
ensure pkg/util/mount compiles & crosses
- move compile time check from linux code to generic code
2016-08-21 17:47:24 -07:00
Tamer Tas fe039573b7 pkg/util/mount: remove method redeclaration
Fix the `GetDeviceNameFromMount` method thats declared twice.
2016-08-19 16:49:28 +03:00
Kubernetes Submit Queue 2bc5414de6 Merge pull request #30666 from feiskyer/fix-mount
Automatic merge from submit-queue

Fix pkg/util/mount for osx

Fix #30665.
2016-08-17 11:37:53 -07:00
Jing Xu 89de4f2f55 Add GetDeviceNameFromMount in mount_unsupported.go
Add GetDeviceNameFromMount in mount_unsupported.go
2016-08-16 16:34:10 -07:00
Pengfei Ni 12d7c4f380 Fix mount for osx 2016-08-16 08:26:15 +08:00
Jing Xu f19a1148db This change supports robust kubelet volume cleanup
Currently kubelet volume management works on the concept of desired
and actual world of states. The volume manager periodically compares the
two worlds and perform volume mount/unmount and/or attach/detach
operations. When kubelet restarts, the cache of those two worlds are
gone. Although desired world can be recovered through apiserver, actual
world can not be recovered which may cause some volumes cannot be cleaned
up if their information is deleted by apiserver. This change adds the
reconstruction of the actual world by reading the pod directories from
disk. The reconstructed volume information is added to both desired
world and actual world if it cannot be found in either world. The rest
logic would be as same as before, desired world populator may clean up
the volume entry if it is no longer in apiserver, and then volume
manager should invoke unmount to clean it up.
2016-08-15 11:29:15 -07:00
Scott Creeley 11d1289afa Add volume and mount logging 2016-07-21 09:10:00 -04:00
Cindy Wang e13c678e3b Make volume unmount more robust using exclusive mount w/ O_EXCL 2016-07-18 16:20:08 -07:00
Davanum Srinivas 2b0ed014b7 Use Go canonical import paths
Add canonical imports only in existing doc.go files.
https://golang.org/doc/go1.4#canonicalimports

Fixes #29014
2016-07-16 13:48:21 -04:00
David McMahon ef0c9f0c5b Remove "All rights reserved" from all the headers. 2016-06-29 17:47:36 -07:00
k8s-merge-robot 51dd3d562d Merge pull request #27380 from rootfs/fix-nsenter-list
Automatic merge from submit-queue

in nsenter mounter, read  hosts PID 1 /proc/mounts to list the mounts

fix #27378
2016-06-19 18:38:54 -07:00
Huamin Chen 8e67a308ed in nsenter mounter, read hosts PID 1 /proc/mounts to list the mounts
Signed-off-by: Huamin Chen <hchen@redhat.com>
2016-06-14 17:54:59 +00:00
Jing Xu 809dae1978 Fix bug in isLikelyNotMountPoint function
In nsenter_mount.go/isLikelyNotMountPoint function, the returned output
from findmnt command misses the last letter. Modify the code to make sure
that output has the full target path. fix #26421 #25056 #22911
2016-06-13 17:28:38 -07:00
chengyli 1fb33aa2da fix cinder volume dir umount issue #24717
fix https://github.com/kubernetes/kubernetes/issues/24717
If kubelet root-dir is a symlink, the pod's cinder volume dir can't be
umounted even after pod is deleted.
This patch reads target path of symlink before comparing with entries in
proc mounts.
2016-04-29 00:50:33 +08:00
k8s-merge-robot 04df711a35 Merge pull request #24035 from jsafrane/devel/nsenter-error-messages
Automatic merge from submit-queue

Add path to log messages.

It is useful to see the path that failed to mount in the log messages.

Fixes #23990.
2016-04-14 03:47:26 -07:00
Jan Safranek be2ca8dec3 Add path to log messages.
It is useful to see the path that failed to mount in the log messages.
2016-04-08 09:41:09 +02:00
Jan Safranek 92f181174b Fixed mounting with containerized kubelet.
Kubelet was not able to mount volumes when running inside a container and
using nsenter mounter,

NsenterMounter.IsLikelyNotMountPoint() should return ErrNotExist when the
checked directory does not exists as the regular mounted does this and
some volume plugins depend on this behavior.
2016-03-24 17:27:11 +01:00
Filip Grzadkowski 217d9f38e7 Don't return error when findmnt exits with error.
Fixes #21303
2016-02-19 14:09:33 +01:00
laushinka 7ef585be22 Spelling fixes inspired by github.com/client9/misspell 2016-02-18 06:58:05 +07:00
Jan Safranek 220163f67d Fixed races in Cinder volume attach/detach.
Add a mutex to guard SetUpAt() and TearDownAt() calls - they should not
run in parallel.  There is a race in these calls when there are two pods
using the same volume, one of them is dying and the other one starting.

TearDownAt() checks that a volume is not needed by any pods and detaches the
volume. It does so by counting how many times is the volume mounted
(GetMountRefs() call below).

When SetUpAt() of the starting pod already attached the volume and did not mount
it yet, TearDownAt() of the dying pod will detach it - GetMountRefs() does not
count with this volume.

These two threads run in parallel:

 dying pod.TearDownAt("myVolume")          starting pod.SetUpAt("myVolume")
   |                                       |
   |                                       AttachDisk("myVolume")
   refs, err := mount.GetMountRefs()       |
   Unmount("myDir")                        |
   if refs == 1 {                          |
   |  |                                    Mount("myVolume", "myDir")
   |  |                                    |
   |  DetachDisk("myVolume")               |
   |                                       start containers - OOPS! The volume is detached!
   |
   finish the pod cleanup


Also, add some logs to cinder plugin for easier debugging in the future, add
a test and update the fake mounter to know about bind mounts.
2016-02-02 14:38:49 +01:00
Mike Danese 82509a46a7 Merge pull request #19585 from smarterclayton/skip_mount
Skip format and mount test on windows or mac
2016-01-14 17:03:44 -08:00
Clayton Coleman 389d5e48cb Skip format and mount test on windows or mac 2016-01-12 21:43:50 -05:00
Sami Wagiaalla c18f342ac6 Use constants for fsck return values 2015-12-08 10:51:12 -05:00