Automatic merge from submit-queue (batch tested with PRs 51805, 51725, 50925, 51474, 51638)
Flexvolume dynamic plugin discovery: Prober unit tests and basic e2e test.
**What this PR does / why we need it**: Tests for changes introduced in PR #50031 .
As part of the prober unit test, I mocked filesystem, filesystem watch, and Flexvolume plugin initialization.
Moved the filesystem event goroutine to watcher implementation.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#51147
**Special notes for your reviewer**:
First commit contains added functionality of the mock filesystem.
Second commit is the refactor for moving mock filesystem into a common util directory.
Third commit is the unit and e2e tests.
**Release note**:
```release-note
NONE
```
/release-note-none
/sig storage
/assign @saad-ali @liggitt
/cc @mtaufen @chakri-nelluri @wongma7
Automatic merge from submit-queue
Make /var/lib/kubelet as shared during startup
This is part of ~~https://github.com/kubernetes/community/pull/589~~https://github.com/kubernetes/community/pull/659
We'd like kubelet to be able to consume mounts from containers in the future, therefore kubelet should make sure that `/var/lib/kubelet` has shared mount propagation to be able to see these mounts.
On most distros, root directory is already mounted with shared mount propagation and this code will not do anything. On older distros such as Debian Wheezy, this code detects that `/var/lib/kubelet` is a directory on `/` which has private mount propagation and kubelet bind-mounts `/var/lib/kubelet` as rshared.
Both "regular" linux mounter and `NsenterMounter` are updated here.
@kubernetes/sig-storage-pr-reviews @kubernetes/sig-node-pr-reviews
@vishh
Release note:
```release-note
Kubelet re-binds /var/lib/kubelet directory with rshared mount propagation during startup if it is not shared yet.
```
Automatic merge from submit-queue (batch tested with PRs 51574, 51534, 49257, 44680, 48836)
Task 1: Tainted node by condition.
**What this PR does / why we need it**:
Tainted node by condition for MemoryPressure, OutOfDisk and so on.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: part of #42001
**Release note**:
```release-note
Tainted nodes by conditions as following:
* 'node.kubernetes.io/network-unavailable=:NoSchedule' if NetworkUnavailable is true
* 'node.kubernetes.io/disk-pressure=:NoSchedule' if DiskPressure is true
* 'node.kubernetes.io/memory-pressure=:NoSchedule' if MemoryPressure is true
* 'node.kubernetes.io/out-of-disk=:NoSchedule' if OutOfDisk is true
```
Kubelet makes sure that /var/lib/kubelet is rshared when it starts.
If not, it bind-mounts it with rshared propagation to containers
that mount volumes to /var/lib/kubelet can benefit from mount propagation.
fsGroup check will be enforcing that if a volume has already been
mounted by one pod and another pod wants to mount it but has a different
fsGroup value, this mount operation will not be allowed.
Automatic merge from submit-queue (batch tested with PRs 46458, 50934, 50766, 50970, 47698)
kubeadm: Warn in preflight checks if KubernetesVersion is of a newer branch than kubeadm
**What this PR does / why we need it**:
see https://github.com/kubernetes/kubeadm/issues/307
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
https://github.com/kubernetes/kubeadm/issues/307
**Special notes for your reviewer**:
**Release note**:
Automatic merge from submit-queue (batch tested with PRs 46458, 50934, 50766, 50970, 47698)
Prepare VolumeHost for running mount tools in containers
This is the first part of implementation of https://github.com/kubernetes/features/issues/278 - running mount utilities in containers.
It updates `VolumeHost` interface:
* `GetMounter()` now requires volume plugin name, as it is going to return different mounter to different volume plugings, because mount utilities for these plugins can be on different places.
* New `GetExec()` method that should volume plugins use to execute any utilities. This new `Exec` interface will execute them on proper place.
* `SafeFormatAndMount` is updated to the new `Exec` interface.
This is just a preparation, `GetExec` right now leads to simple `os.Exec` and mount utilities are executed on the same place as before. Also, the volume plugins will be updated in subsequent PRs (split into separate PRs, some plugins required lot of changes).
```release-note
NONE
```
@kubernetes/sig-storage-pr-reviews
@rootfs @gnufied
Automatic merge from submit-queue (batch tested with PRs 50693, 50831, 47506, 49119, 50871)
Use reflect.DeepEqual to replace slicesEqual
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/kubernetes/kubernetes/issues/50952
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
verify pkg/util contains no code
**What this PR does / why we need it**:
verify util packages contain no go codes
related issue: #49923
Automatic merge from submit-queue (batch tested with PRs 50550, 50768)
Cleanup locking in configz
**What this PR does / why we need it**:
- Reduce scope of lock in `write()` method
- Use the read lock in `write()` method
**Release note**:
```release-note
NONE
```
/kind cleanup
@mikedanese
p.s. looks like the `Set()` method could be removed if the value is accepted as an argument to `New()`. I.e. looks like to code re-sets the value.
Automatic merge from submit-queue (batch tested with PRs 50537, 49699, 50160, 49025, 50205)
AddOrUpdateTaint should ignore duplicate Taint.
The parameter of AddOrUpdateTaint is Taint pointer, so should use
Taint object itself to compare with the node's taint list to ignore
duplicate taint.
While doing #49384, found this issue and fixed.
Fixed part of #49384, other test cases will be added in the following patch
**Release note**:
```
None
```
For iptables save and restore operations, kube-proxy currently uses
the IPv4 versions of the iptables save and restore utilities
(iptables-save and iptables-restore, respectively). For IPv6 operation,
the IPv6 versions of these utilities needs to be used
(ip6tables-save and ip6tables-restore, respectively).
Both this change and PR #48551 are needed to get Kubernetes services
to work in an IPv6-only Kubernetes cluster (along with setting
'--bind-address ::0' on the kube-proxy command line. This change
was alluded to in a discussion on services for issue #1443.
fixes#50474
Automatic merge from submit-queue
Run mount in its own systemd scope.
Kubelet needs to run /bin/mount in its own cgroup.
- When kubelet runs as a systemd service, "systemctl restart kubelet" may kill all processes in the same cgroup and thus terminate fuse daemons that are needed for gluster and cephfs mounts.
- When kubelet runs in a docker container, restart of the container kills all fuse daemons started in the container.
Killing fuse daemons is bad, it basically unmounts volumes from running pods.
This patch runs mount via "systemd-run --scope /bin/mount ...", which makes sure that any fuse daemons are forked in its own systemd scope (= cgroup) and they will survive restart of kubelet's systemd service or docker container.
This helps with #34965
As a downside, each new fuse daemon will run in its own transient systemd service and systemctl output may be cluttered.
@kubernetes/sig-storage-pr-reviews
@kubernetes/sig-node-pr-reviews
```release-note
fuse daemons for GlusterFS and CephFS are now run in their own systemd scope when Kubernetes runs on a system with systemd.
```
Automatic merge from submit-queue
Switch from package syscall to golang.org/x/sys/unix
**What this PR does / why we need it**:
The syscall package is locked down and the comment in https://github.com/golang/go/blob/master/src/syscall/syscall.go#L21-L24 advises to switch code to use the corresponding package from golang.org/x/sys. This PR does so and replaces usage of package syscall with package golang.org/x/sys/unix where applicable. This will also allow to get updates and fixes
without having to use a new go version.
In order to get the latest functionality, golang.org/x/sys/ is re-vendored. This also allows to use Eventfd() from this package instead of calling the eventfd() C function.
**Special notes for your reviewer**:
This follows previous works in other Go projects, see e.g. moby/moby#33399, cilium/cilium#588
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50029, 48517, 49739, 49866, 49782)
iptables_test should not run on OSX or Windows
**What this PR does / why we need it**:
Fix for failing tests. Let's just skip these on darwin and windows
platforms as iptables is not available on these.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Fixes#48509
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Kubelet needs to run /bin/mount in its own cgroup.
- When kubelet runs as a systemd service, "systemctl restart kubelet" may kill
all processes in the same cgroup and thus terminate fuse daemons that are
needed for gluster and cephfs mounts.
- When kubelet runs in a docker container, restart of the container kills all
fuse daemons started in the container.
Killing fuse daemons is bad, it basically unmounts volumes from running pods.
This patch runs mount via "systemd-run --scope /bin/mount ...", which makes
sure that any fuse daemons are forked in its own systemd scope (= cgroup) and
they will survive restart of kubelet's systemd service or docker container.
As a downside, each new fuse daemon will run in its own transient systemd
service and systemctl output may be cluttered.
Automatic merge from submit-queue (batch tested with PRs 49420, 49296, 49299, 49371, 46514)
Refactoring taint functions to reduce sprawl
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#45060
**Special notes for your reviewer**:
@gmarek @timothysc @k82cn @jayunit100 - I moved some fn's to helpers and some to utils. LMK, if you are ok with this change.
**Release note**:
```release-note
NONE
```
The syscall package is locked down and the comment in [1] advises to
switch code to use the corresponding package from golang.org/x/sys. Do
so and replace usage of package syscall with package
golang.org/x/sys/unix where applicable.
[1] https://github.com/golang/go/blob/master/src/syscall/syscall.go#L21-L24
This will also allow to get updates and fixes for syscall wrappers
without having to use a new go version.
Errno, Signal and SysProcAttr aren't changed as they haven't been
implemented in /x/sys/. Stat_t from syscall is used if standard library
packages (e.g. os) require it. syscall.SIGTERM is used for
cross-platform files.
Automatic merge from submit-queue
Fix findmnt parsing in containerized kubelet
NsEnterMounter should not stop parsing findmnt output on the first space but on the last one, just in case the mount point name itself contains a space.
Fixes#49106
```release-note
Fixed unmounting of vSphere volumes when kubelet runs in a container.
```
@kubernetes/sig-storage-pr-reviews
NsEnterMounter should not stop parsing findmnt output on the first space but
on the last one, just in case the mount point name itself contains a space.
Automatic merge from submit-queue
Fix subPath existence check to not follow symlink
**What this PR does / why we need it**:
Volume mounting logic introduced in #43775 and #45623 checks
for subPath existence before attempting to create a directory,
should subPath not be present.
This breaks if subPath is a dangling symlink, os.Stat returns
"do not exist" status, yet `os.MkdirAll` can't create directory
as symlink is present at the given path.
This patch makes existence check to use os.Lstat which works for
normal files/directories as well as doesn't not attempt to follow
symlink, therefore it's "do not exist" status is more reliable when
making a decision whether to create directory or not.
subPath symlinks can be dangling in situations where kubelet is
running in a container itself with access to docker socket, such
as CoreOS's kubelet-wrapper script
**Release note**:
```release-note
Fix pods failing to start when subPath is a dangling symlink from kubelet point of view, which can happen if it is running inside a container
```
Automatic merge from submit-queue
fix system language judging bug in loadSystemLanguage
Signed-off-by: allencloud <allen.sun@daocloud.io>
**What this PR does / why we need it**:
This PR removes some unused code in loadSystemLanguage. Since in code `pieces := strings.Split(langStr, ".")`, even `langStr` is an empty string, `piece` is a slice with one element of empty string, so there is no chance that len(pieces) == 0.
According to these, I think it is OK to remove the unused code in loadSystemLanguage.
According to the discuss we had, finally we decided to use a more accurate way to change the code, using `if len(pieces) != 1` to make the decision.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
NONE
**Special notes for your reviewer**:
NONE
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Local storage teardown fix
**What this PR does / why we need it**: Local storage uses bindmounts and the method IsLikelyNotMountPoint does not detect these as mountpoints. Therefore, local PVs are not properly unmounted when they are deleted.
**Which issue this PR fixes**: fixes#48331
**Special notes for your reviewer**:
You can use these e2e tests to reproduce the issue and validate the fix works appropriately https://github.com/kubernetes/kubernetes/pull/47999
The existing method IsLikelyNotMountPoint purposely does not check mountpoints reliability (4c5b22d4c6/pkg/util/mount/mount_linux.go (L161)), since the number of mountpoints can be large. 4c5b22d4c6/pkg/util/mount/mount.go (L46)
This implementation changes the behavior for local storage to detect mountpoints reliably, and avoids changing the behavior for any other callers to a UnmountPath.
**Release note**:
```
Fixes bind-mount teardown failure with non-mount point Local volumes (issue https://github.com/kubernetes/kubernetes/issues/48331).
```
Added IsNotMountPoint method to mount utils (pkg/util/mount/mount.go)
Added UnmountMountPoint method to volume utils (pkg/volume/util/util.go)
Call UnmountMountPoint method from local storage (pkg/volume/local/local.go)
IsLikelyNotMountPoint behavior was not modified, so the logic/behavior for UnmountPath is not modified
Automatic merge from submit-queue (batch tested with PRs 47234, 48410, 48514, 48529, 48348)
Remove unused sub-pkgs in pkg/util
**What this PR does / why we need it**:
Remove no longer used sug-pkgs in pkg/util
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#48386
**Special notes for your reviewer**:
/cc @deads2k
**Release note**:
```release-note
NONE
```
Volume mounting logic introduced in #43775 and #45623 checks
for subPath existence before attempting to create a directory,
should subPath not be present.
This breaks if subPath is a dangling symlink, os.Stat returns
"do not exist" status, yet `os.MkdirAll` can't create directory
as symlink is present at the given path.
This patch makes existence check to use os.Lstat which works for
normal files/directories as well as doesn't not attempt to follow
symlink, therefore it's "do not exist" status is more reliable when
making a decision whether to create directory or not.
subPath symlinks can be dangling in situations where kubelet is
running in a container itself with access to docker socket, such
as CoreOS's kubelet-wrapper script
Automatic merge from submit-queue (batch tested with PRs 48264, 48324, 48125, 47944, 47489)
fix CopyStrings and ShuffleStrings for slice when slice is nil
Signed-off-by: allencloud <allen.sun@daocloud.io>
**What this PR does / why we need it**:
This PR fixes two functions in util/slice.go, in which I think `CopyStrings` and `ShuffleStrings` miss one case. The case is input data is nil, in this case I think the data returned should be nil as well rather than a non-nil slice with 0 element.
In addition, I added some test code for this.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
NONE, I did not raise a issue for this code. I ran into this when code learning.
**Special notes for your reviewer**:
NONE
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 43005, 46660, 46385, 46991, 47103)
Consolidate sysctl commands for kubelet
**What this PR does / why we need it**:
These commands are important enough to be in the Kubelet itself.
By default, Ubuntu 14.04 and Debian Jessie have these set to 200 and
20000. Without this setting, nodes are limited in the number of
containers that they can start.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#26005
**Special notes for your reviewer**:
I had a difficult time writing tests for this. It is trivial to create a fake sysctl for testing, but the Kubelet does not have any tests for the prior settings.
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 46235, 44786, 46833, 46756, 46669)
Add Japanese translation for kubectl
**What this PR does / why we need it**:
I messed up the original PR(#45562) which was already been reviewed and approved. This PR provides first attempt to translate kubectl in Japanese (related to #40645 and #40591).
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
No issues
**Special notes for your reviewer**:
Should be reviewed by member of Japanese k8s community (I stayed in Japan for 4 years, but my language is not as good as native Japanese)
Automatic merge from submit-queue (batch tested with PRs 43275, 45014, 46449, 46488, 46525)
Fix typo
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 44774, 46266, 46248, 46403, 46430)
kube-proxy: ratelimit runs of iptables by sync-period flags
This bounds how frequently iptables can be synced. It will be no more often than every 10 seconds and no less often than every 1 minute, by default.
@timothysc FYI
@dcbw @freehan FYI
Automatic merge from submit-queue (batch tested with PRs 45573, 46354, 46376, 46162, 46366)
Add Simplified Chinese translation for kubectl
What this PR does / why we need it:
This PR provides first attempt to translate kubectl in Simplified Chinese.
Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #
No issues
Special notes for your reviewer:
Although I'm a native speaker for Mandarin Chinese, but I think translation is a whole different knowledge which I'm not good at it, so this pr absolutely need to be polished.
@adohe @mengqiy @resouer @k82cn @caesarxuchao @wanghaoran1988 sorry I think there are so many folks who are good at Chinese I haven't mention, feel free to leave a comment on it : )
also cc @brendandburns
Automatic merge from submit-queue (batch tested with PRs 46022, 46055, 45308, 46209, 43590)
Remove Save() from iptables interface
This is what @thockin requested in one of the reviews.
Automatic merge from submit-queue (batch tested with PRs 38990, 45781, 46225, 44899, 43663)
migrate set generation to go_genrule
Depends on https://github.com/kubernetes/release/pull/238
Includes these changes:
- Modified so that IPv6 CIDRs can be converted correctly.
- Added test cases for IPv6 addresses.
- Split UTs for hexCIDR() and asciiCIDR() so that masking can be tested.
- Add UTs for failure cases.
Note: Some code that calls hexCIDR() builds a CIDR from the pod IP string
and the concatenation of "/32". These should, in the future, use "128",
if/when the pod IP is IPv6. Not addressed as part of this commit.
Automatic merge from submit-queue
Make fake iptables' Save operation more realistic
Ref https://github.com/kubernetes/kubernetes/pull/45622#issuecomment-301624384 (2nd point)
This would make fake IPtables actually return the iptable contents it stores.
cc @kubernetes/sig-scalability-misc @wojtek-t
Module remotecommand originally part of kubernetes/pkg/client/unversioned was moved
to client-go/tools, and will be used as authoritative in kubectl, e2e and other places.
Module remotecommand relies on util/exec module which will be copied to client-go/pkg/util
Automatic merge from submit-queue
util/iptables: grab iptables locks if iptables-restore doesn't support --wait
When iptables-restore doesn't support --wait (which < 1.6.2 don't), it may
conflict with other iptables users on the system, like docker, because it
doesn't acquire the iptables lock before changing iptables rules. This causes
sporadic docker failures when starting containers.
To ensure those don't happen, essentially duplicate the iptables locking
logic inside util/iptables when we know iptables-restore doesn't support
the --wait option.
Unfortunately iptables uses two different locking mechanisms, one until
1.4.x (abstract socket based) and another from 1.6.x (/run/xtables.lock
flock() based). We have to grab both locks, because we don't know what
version of iptables-restore exists since iptables-restore doesn't have
a --version option before 1.6.2. Plus, distros (like RHEL) backport the
/run/xtables.lock patch to 1.4.x versions.
Related: https://github.com/kubernetes/kubernetes/pull/43575
See also: https://github.com/openshift/origin/pull/13845
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1417234
@kubernetes/rh-networking @kubernetes/sig-network-misc @eparis @knobunc @danwinship @thockin @freehan
Automatic merge from submit-queue
fix the typos of e.g.
fix the typos of e.g.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
These commands are important enough to be in the Kubelet itself.
By default, Ubuntu 14.04 and Debian Jessie have these set to 200 and
20000. Without this setting, nodes are limited in the number of
containers that they can start.
Several of these loops overlap, and when they are the reason a failure
is happening it is difficult to sort them out. Slighly misalign these
loops to make their impact obvious.
When iptables-restore doesn't support --wait (which < 1.6.2 don't), it may
conflict with other iptables users on the system, like docker, because it
doesn't acquire the iptables lock before changing iptables rules. This causes
sporadic docker failures when starting containers.
To ensure those don't happen, essentially duplicate the iptables locking
logic inside util/iptables when we know iptables-restore doesn't support
the --wait option.
Unfortunately iptables uses two different locking mechanisms, one until
1.4.x (abstract socket based) and another from 1.6.x (/run/xtables.lock
flock() based). We have to grab both locks, because we don't know what
version of iptables-restore exists since iptables-restore doesn't have
a --version option before 1.6.2. Plus, distros (like RHEL) backport the
/run/xtables.lock patch to 1.4.x versions.
Related: https://github.com/kubernetes/kubernetes/pull/43575
See also: https://github.com/openshift/origin/pull/13845
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1417234
Automatic merge from submit-queue (batch tested with PRs 40060, 44860, 44865, 44825, 44162)
Remove unused chmod/chown abstractions
**What this PR does / why we need it**: Simplifies the code
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 43575, 44672)
util/iptables: check for and use new iptables-restore 'wait' argument
iptables-restore did not previously perform any locking, meaning that
when callers (like kube-proxy) asked iptables-restore to write large
numbers of rules, the iptables-restore process might run in parallel
with other 'iptables' invocations in kubelet (hostports), docker,
and other software. This causes errors like:
"CNI request failed with status 400: 'Failed to ensure that nat chain
POSTROUTING jumps to MASQUERADE: error checking rule: exit status 4:
iptables: Resource temporarily unavailable."
or from Docker:
"Failed to allocate and map port 1095-1095: iptables failed:
iptables --wait -t nat -A DOCKER -p tcp -d 0/0 --dport 1095
-j DNAT --to-destination 10.1.0.2:1095 ! -i lbr0: iptables:
Resource temporarily unavailable.\n (exit status 4)"
iptables-restore "wait" functionality was added in iptables git
commit 999eaa241212d3952ddff39a99d0d55a74e3639e which
is not yet in a release.
See also https://bugzilla.redhat.com/show_bug.cgi?id=1417234
@eparis @knobunc @kubernetes/rh-networking @kubernetes/sig-network-misc @freehan @thockin @brendandburns
iptables-restore did not previously perform any locking, meaning that
when callers (like kube-proxy) asked iptables-restore to write large
numbers of rules, the iptables-restore process might run in parallel
with other 'iptables' invocations in kubelet (hostports), docker,
and other software. This causes errors like:
"CNI request failed with status 400: 'Failed to ensure that nat chain
POSTROUTING jumps to MASQUERADE: error checking rule: exit status 4:
iptables: Resource temporarily unavailable."
or from Docker
"Failed to allocate and map port 1095-1095: iptables failed:
iptables --wait -t nat -A DOCKER -p tcp -d 0/0 --dport 1095
-j DNAT --to-destination 10.1.0.2:1095 ! -i lbr0: iptables:
Resource temporarily unavailable.\n (exit status 4)"
iptables-restore "wait" functionality was added in iptables git
commit 999eaa241212d3952ddff39a99d0d55a74e3639e but is NOT YET
in a released version of iptables.
See also https://bugzilla.redhat.com/show_bug.cgi?id=1417234
Automatic merge from submit-queue (batch tested with PRs 44222, 44614, 44292, 44638)
Smarter generic getters and describers
Makes printers and describers smarter for generic resources.
This traverses unstructured objects and prints their attributes for generic resources (TPR, federated API, etc) in `kubectl get` and `kubectl describe`. Makes use of the object's field names to come up with a best guess for describer labels and get headers, and field value types to understand how to better print it, indent, etc.
A nice intermediate solution while we don't have [get and describe extensions](https://github.com/kubernetes/community/pull/308).
Examples:
```
$ kubectl get serviceclasses
NAME KIND BINDABLE BROKER NAME OSB GUID
user-provided-service ServiceClass.v1alpha1.servicecatalog.k8s.io false ups-broker 4f6e6cf6-ffdd-425f-a2c7-3c9258ad2468
```
```
$ kubectl describe serviceclasses/user-provided-service
Name: user-provided-service
Namespace:
Labels: <none>
Annotations: FOO=BAR
openshift.io/deployment.phase=test
OSB Metadata: <nil>
Kind: ServiceClass
Metadata:
Self Link: /apis/servicecatalog.k8s.io/v1alpha1/serviceclassesuser-provided-service
UID: 1509bd96-1b05-11e7-98bd-0242ac110006
Resource Version: 256
Creation Timestamp: 2017-04-06T20:10:29Z
Broker Name: ups-broker
Bindable: false
Plan Updatable: false
OSB GUID: 4f6e6cf6-ffdd-425f-a2c7-3c9258ad2468
API Version: servicecatalog.k8s.io/v1alpha1
Plans:
Name: default
OSB GUID: 86064792-7ea2-467b-af93-ac9694d96d52
OSB Free: true
OSB Metadata: <nil>
Events: <none>
```
**Release note**:
```release-note
Improved output on 'kubectl get' and 'kubectl describe' for generic objects.
```
PTAL @pmorie @pwittrock @kubernetes/sig-cli-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 40055, 42085, 44509, 44568, 43956)
Fix gofmt errors
**What this PR does / why we need it**:
There were some gofmt errors on master. Ran the following to fix:
```
hack/verify-gofmt.sh | grep ^diff | awk '{ print $2 }' | xargs gofmt -w -s
```
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: none
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 44406, 41543, 44071, 44374, 44299)
Decouple remotecommand
Refactored unversioned/remotecommand to decouple it from undesirable dependencies:
- term package now is not required, and functionality required to resize terminal size can be plugged in directly in kubectl
- in order to remove dependency on kubelet package - constants from kubelet/server/remotecommand were moved to separate util package (pkg/util/remotecommand)
- remotecommand_test.go moved to pkg/client/tests module
Automatic merge from submit-queue (batch tested with PRs 43378, 43216, 43384, 43083, 43428)
Do not reformat devices with partitions
`lsblk` reports FSTYPE of devices with partition tables as empty string `""`,
which is indistinguishable from empty devices. We must look for dependent
devices (i.e. partitions) to see that the device is really empty and report
error otherwise.
The main point of this patch is to run `lsblk` without `"-n"`, i.e. print all
dependent devices and check it output.
Sample output:
```
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
10s 10s 1 default-scheduler Normal Scheduled Successfully assigned testpod to ip-172-18-11-149.ec2.internal
2s 2s 1 kubelet, ip-172-18-11-149.ec2.internal Warning FailedMount MountVolume.MountDevice failed for volume "kubernetes.io/aws-ebs/vol-0fa9da8b91913b187" (spec.Name: "vol") pod "b74f68c5-0d6a-11e7-9233-0e11251010c0" (UID: "b74f68c5-0d6a-11e7-9233-0e11251010c0") with: failed to mount the volume as "ext4", it already contains unknown data, probably partitions. Mount error: mount failed: exit status 32
Mounting command: mount
Mounting arguments: /dev/xvdbb /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/vol-0fa9da8b91913b187 ext4 [defaults]
Output: mount: wrong fs type, bad option, bad superblock on /dev/xvdbb,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so.
```
Without this patch, the device would be reformatted and all data in the device partitions would be lost.
Fixes#13212
Release note:
```release-note
NONE
```
@kubernetes/sig-storage-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 41139, 41186, 38882, 37698, 42034)
Make kubelet never delete files on mounted filesystems
With bug #27653, kubelet could remove mounted volumes and delete user data.
The bug itself is fixed, however our trust in kubelet is significantly lower.
Let's add an extra version of RemoveAll that does not cross mount boundary
(rm -rf --one-file-system).
It calls lstat(path) three times for each removed directory - once in
RemoveAllOneFilesystem and twice in IsLikelyNotMountPoint, however this way
it's platform independent and the directory that is being removed by kubelet
should be almost empty.
lsblk reports FSTYPE of devices with partition tables as empty string "",
which is indistinguishable from empty devices. We must look for dependent
devices (i.e. partitions) to see that the device is really empty and report
error otherwise.
I checked that LVM, LUKS and MD RAID have their own FSTYPE in lsblk output,
so it should be only a partition table that has empty FSTYPE.
The main point of this patch is to run lsblk without "-n", i.e. print all
dependent devices and check if they're there.
With this PR, the second call to `Acquire()` will block unless the lock is released (process exits).
Also removed the memory mutex in the previous code since we don't need `Release()` here so no need to save and protect the local fd.
Fix#42929.
Automatic merge from submit-queue (batch tested with PRs 42316, 41618, 42201, 42113, 42191)
Support unqualified and partially qualified domain name in DNS query in Windows kube-proxy
**What this PR does / why we need it**:
In Windows container networking, --dns-search is not currently supported on Windows Docker. Besides, even with --dns-suffix, inside Windows container DNS suffix is not appended to DNS query names. That makes unqualified domain name or partially qualified domain name in DNS query not able to resolve.
This PR provides a solution to resolve unqualified domain name or partially qualified domain name in DNS query for Windows container in Windows kube-proxy. It uses well-known Kubernetes DNS suffix as well host DNS suffix search list to append to the name in DNS query. DNS packet in kube-proxy UDP stream is modified as appropriate.
This PR affects the Windows kube-proxy only.
**Special notes for your reviewer**:
This PR is based on top of Anthony Howe's commit 48647fb, 0e37f0a and 7e2c71f which is already included in the PR 41487. Please only review commit b9dfb69.
**Release note**:
```release-note
Add DNS suffix search list support in Windows kube-proxy.
```
Automatic merge from submit-queue (batch tested with PRs 35094, 42095, 42059, 42143, 41944)
Use chroot for containerized mounts
This PR is to modify the containerized mounter script to use chroot
instead of rkt fly. This will avoid the problem of possible large number
of mounts caused by rkt containers if they are not cleaned up.
With bug #27653, kubelet could remove mounted volumes and delete user data.
The bug itself is fixed, however our trust in kubelet is significantly lower.
Let's add an extra version of RemoveAll that does not cross mount boundary
(rm -rf --one-file-system).
It calls lstat(path) three times for each removed directory - once in
RemoveAllOneFilesystem and twice in IsLikelyNotMountPoint, however this way
it's platform independent and the directory that is being removed by kubelet
should be almost empty.
Automatic merge from submit-queue (batch tested with PRs 41116, 41804, 42104, 42111, 42120)
make kubectl taint command respect effect NoExecute
**What this PR does / why we need it**:
Part of feature forgiveness implementation, make kubectl taint command respect effect NoExecute.
**Which issue this PR fixes**:
Related Issue: #1574
Related PR: #39469
**Special notes for your reviewer**:
**Release note**:
```release-note
make kubectl taint command respect effect NoExecute
```
Automatic merge from submit-queue (batch tested with PRs 41116, 41804, 42104, 42111, 42120)
Add support for attacher/detacher interface in Flex volume
Add support for attacher/detacher interface in Flex volume
This change breaks backward compatibility and requires to be release noted.
```release-note
Flex volume plugin is updated to support attach/detach interfaces. It broke backward compatibility. Please update your drivers and implement the new callouts.
```
Automatic merge from submit-queue (batch tested with PRs 40665, 41094, 41351, 41721, 41843)
Update i18n tools and process.
@fabianofranz @zen @kubernetes/sig-cli-pr-reviews
This is an update to the translation process based on feedback from folks.
The main changes are:
* `msgctx` is being removed from the files.
* String wrapping and string extraction have been separated.
* More tools from the `gettext` family of tools are being used
* Extracted strings are being sorted for canonical ordering
* A `.pot` template has been added.
Automatic merge from submit-queue (batch tested with PRs 41714, 41510, 42052, 41918, 31515)
Show specific error when a volume is formatted by unexpected filesystem.
kubelet now detects that e.g. xfs volume is being mounted as ext3 because of
wrong volume.Spec.
Mount error is left in the error message to diagnose issues with mounting e.g.
'ext3' volume as 'ext4' - they are different filesystems, however kernel should
mount ext3 as ext4 without errors.
Example kubectl describe pod output:
```
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
41s 3s 7 {kubelet ip-172-18-3-82.ec2.internal} Warning FailedMount MountVolume.MountDevice failed for volume "kubernetes.io/aws-ebs/aws://us-east-1d/vol-ba79c81d" (spec.Name: "pvc-ce175cbb-6b82-11e6-9fe4-0e885cca73d3") pod "3d19cb64-6b83-11e6-9fe4-0e885cca73d3" (UID: "3d19cb64-6b83-11e6-9fe4-0e885cca73d3") with: failed to mount the volume as "ext4", it's already formatted with "xfs". Mount error: mount failed: exit status 32
Mounting arguments: /dev/xvdba /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/us-east-1d/vol-ba79c81d ext4 [defaults]
Output: mount: wrong fs type, bad option, bad superblock on /dev/xvdba,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so.
```
This PR is to modify the containerized mounter script to use chroot
instead of rkt fly. This will avoid the problem of possible large number
of mounts caused by rkt containers if they are not cleaned up.
Automatic merge from submit-queue (batch tested with PRs 41706, 39063, 41330, 41739, 41576)
Fix regex match doc of procfs.PidOf
Fixes#41247.
cc @bboreham
OSX 10.11.x has `/var` symlinked to `/private/var`, which was tripping
up logic in `mount.GetMountRefs`
This fixes unit tests for pkg/volume/fc and pkg/volume/iscsi
kubelet now detects that e.g. xfs volume is being mounted as ext3 because of
wrong volume.Spec.
Mount error is left in the error message to diagnose issues with mounting e.g.
'ext3' volume as 'ext4' - they are different filesystems, however kernel should
mount ext3 as ext4 without errors.
Automatic merge from submit-queue
Add initial french translations for kubectl
Add initial French translations, mostly as an example of how to add a new language.
@fabianofranz @kubernetes/sig-cli-pr-reviews
Move over only the conversions that are needed, create a new scheme that
is private to meta and only accessible via ParameterCodec. Move half of
pkg/util/labels/.readonly to pkg/apis/meta/v1/labels.go
Automatic merge from submit-queue
Improve TerminationMessagePath to be more flexible
* Support `terminationMessagePolicy: fallbackToLogsOnError` which allows pod authors to get useful information from containers as per kubernetes/community#154
* Set an upper bound on the size of the termination message path or log output to prevent callers from DoSing the master
* Add tests for running as root, non-root, and for the new terminationMessagePolicy cases.
I set the limit to 4096 bytes, but this may be too high for large pod containers. Probably need to set an absolute bound, i.e. max message size allowed is 20k total, and we truncate if we're above that limit.
Fixes#31839, #23569
```release-note
A new field `terminationMessagePolicy` has been added to containers that allows a user to request `FallbackToLogsOnError`, which will read from the container's logs to populate the termination message if the user does not write to the termination message log file. The termination message file is now properly readable for end users and has a maximum size (4k bytes) to prevent abuse. Each pod may have up to 12k bytes of termination messages before the contents of each will be truncated.
```
These files have been created lately, so we don't have much information
about them anyway, so let's just:
- Remove assignees and make them approvers
- Copy approves as reviewers
Enforce the following limits:
12kb for total message length in container status
4kb for the termination message path file
2kb or 80 lines (whichever is shorter) from the log on error
Fallback to log output if the user requests it.
Automatic merge from submit-queue
Enable lazy initialization of ext3/ext4 filesystems
**What this PR does / why we need it**: It enables lazy inode table and journal initialization in ext3 and ext4.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#30752, fixes#30240
**Release note**:
```release-note
Enable lazy inode table and journal initialization for ext3 and ext4
```
**Special notes for your reviewer**:
This PR removes the extended options to mkfs.ext3/mkfs.ext4, so that the defaults (enabled) for lazy initialization are used.
These extended options come from a script that was historically located at */usr/share/google/safe_format_and_mount* and later ported to GO so this dependency to the script could be removed. After some search, I found the original script here: https://github.com/GoogleCloudPlatform/compute-image-packages/blob/legacy/google-startup-scripts/usr/share/google/safe_format_and_mount
Checking the history of this script, I found the commit [Disable lazy init of inode table and journal.](4d7346f7f5). This one introduces the extended flags with this description:
```
Now that discard with guaranteed zeroing is supported by PD,
initializing them is really fast and prevents perf from being affected
when the filesystem is first mounted.
```
The problem is, that this is not true for all cloud providers and all disk types, e.g. Azure and AWS. I only tested with magnetic disks on Azure and AWS, so maybe it's different for SSDs on these cloud providers. The result is that this performance optimization dramatically increases the time needed to format a disk in such cases.
When mkfs.ext4 is told to not lazily initialize the inode tables and the check for guaranteed zeroing on discard fails, it falls back to a very naive implementation that simply loops and writes zeroed buffers to the disk. Performance on this highly depends on free memory and also uses up all this free memory for write caching, reducing performance of everything else in the system.
As of https://github.com/kubernetes/kubernetes/issues/30752, there is also something inside kubelet that somehow degrades performance of all this. It's however not exactly known what it is but I'd assume it has something to do with cgroups throttling IO or memory.
I checked the kernel code for lazy inode table initialization. The nice thing is, that the kernel also does the guaranteed zeroing on discard check. If it is guaranteed, the kernel uses discard for the lazy initialization, which should finish in a just few seconds. If it is not guaranteed, it falls back to using *bio*s, which does not require the use of the write cache. The result is, that free memory is not required and not touched, thus performance is maxed and the system does not suffer.
As the original reason for disabling lazy init was a performance optimization and the kernel already does this optimization by default (and in a much better way), I'd suggest to completely remove these flags and rely on the kernel to do it in the best way.
Automatic merge from submit-queue
Enable streaming proxy redirects by default (beta)
Prerequisite to moving CRI to Beta.
I'd like to enable this early in our 1.6 cycle to get plenty of test coverage before release.
@yujuhong @liggitt
```release-note
Follow redirects for streaming requests (exec/attach/port-forward) in the apiserver by default (alpha -> beta).
```
Automatic merge from submit-queue
Remove packages which are now apimachinery
Removes all the content from the packages that were moved to `apimachinery`. This will force all vendoring projects to figure out what's wrong. I had to leave many empty marker packages behind to have verify-godep succeed on vendoring heapster.
@sttts straight deletes and simple adds
Automatic merge from submit-queue (batch tested with PRs 38592, 39949, 39946, 39882)
move api/errors to apimachinery
`pkg/api/errors` is a set of helpers around `meta/v1.Status` that help to create and interpret various apiserver errors. Things like `.NewNotFound` and `IsNotFound` pairings. This pull moves it into apimachinery for use by the clients and servers.
@smarterclayton @lavalamp First commit is the move plus minor fitting. Second commit is straight replace and generation.
Automatic merge from submit-queue (batch tested with PRs 39807, 37505, 39844, 39525, 39109)
Update deployment equality helper
@mfojtik @janetkuo this is split out of https://github.com/kubernetes/kubernetes/pull/38714 to reduce the size of that PR, ptal
Automatic merge from submit-queue
run staging client-go update
Chasing to see what real problems we have in staging-client-go.
@sttts you get similar results?
Automatic merge from submit-queue
replace global registry in apimachinery with global registry in k8s.io/kubernetes
We'd like to remove all globals, but our immediate problem is that a shared registry between k8s.io/kubernetes and k8s.io/client-go doesn't work. Since client-go makes a copy, we can actually keep a global registry with other globals in pkg/api for now.
@kubernetes/sig-api-machinery-misc @lavalamp @smarterclayton @sttts
Automatic merge from submit-queue (batch tested with PRs 39834, 38665)
Use parallel list for deleting items from a primitive list with merge strategy
Implemented parallel list for deleting items from a primitive list with merge strategy. Ref: [design doc](https://github.com/kubernetes/community/blob/master/contributors/devel/api-conventions.md#list-of-primitives)
fixes#35163 and #32398
When using parallel list, we don't need to worry about version skew.
When an old APIServer gets a new patch like:
```yaml
metadata:
$deleteFromPrimitiveList/finalizers:
- b
finalizers:
- c
```
It won't fail and work as before, because the parallel list will be dropped during json decoding.
Remaining issue: There is no check when creating a set (primitive list with merge strategy). Duplicates may get in.
It happens in two cases:
1) Creation using POST
2) Creating a list that doesn't exist before using PATCH
Fixing the first case is the beyond the scope of this PR.
The second case can be fixed in this PR if we need that.
cc: @pwittrock @kubernetes/kubectl @kubernetes/sig-api-machinery
```release-note
Fix issue around merging lists of primitives when using PATCH or kubectl apply.
```
Automatic merge from submit-queue (batch tested with PRs 34488, 39511, 39619, 38342, 39491)
Make StrategicPatch delete all matching maps in a merging list
fixes#38332
```release-note
NONE
```
cc: @lavalamp @pwittrock
Automatic merge from submit-queue (batch tested with PRs 34488, 39511, 39619, 38342, 39491)
use fake clock in lruexpiration cache test
when the system clock is extremely slow(usually see in VMs), this [check](https://github.com/kubernetes/kubernetes/blob/master/pkg/util/cache/lruexpirecache.go#L74) might still return the value.
```go
if c.clock.Now().After(e.(*cacheEntry).expireTime) {
go c.remove(key)
return nil, false
}
```
that means even we set the ttl to be 0 second, the after check might still be false(because the clock is too slow, and thus equals).
the change here helps to reduce flakes.
Automatic merge from submit-queue
snip pkg/util/strings dependency
The `pkg/util/strings` package looks to be largely used by volumes, which are independent of the bits used by genericapiserver which aren't used by anyone else. This moves the single function (used no where else) to its point of use.
@sttts
Automatic merge from submit-queue
Avoid unnecessary memory allocations
Low-hanging fruits in saving memory allocations. During our 5000-node kubemark runs I've see this:
ControllerManager:
- 40.17% k8s.io/kubernetes/pkg/util/system.IsMasterNode
- 19.04% k8s.io/kubernetes/pkg/controller.(*PodControllerRefManager).Classify
Scheduler:
- 42.74% k8s.io/kubernetes/plugin/pkg/scheduler/algrorithm/predicates.(*MaxPDVolumeCountChecker).filterVolumes
This PR is eliminating all of those.
Automatic merge from submit-queue
Begin paths for internationalization in kubectl
This is just the first step, purposely simple so we can get the interface correct.
@kubernetes/sig-cli @deads2k