Automatic merge from submit-queue (batch tested with PRs 41498, 44487)
Use len of pods in stateful set error
**What this PR does / why we need it**:
Sync stateful set reports wrong error, we need to fix it.
**Release note**:
```release-note
`NONE`
```
Automatic merge from submit-queue
cinder: Add support for the KVM virtio-scsi driver
**What this PR does / why we need it**:
The VirtIO SCSI driver for KVM changes the way disks appear in /dev/disk/by-id.
This adds support for the new format.
Without this, volume attaching on an openstack cluster using this kvm driver doesn't work
**Special notes for your reviewer**:
Does this need e2e tests? I couldn't find anywhere to add another openstack configuration used in the e2e tests.
Wiki page about this: https://wiki.openstack.org/wiki/Virtio-scsi-for-bdm
**Release note**:
```release-note
cinder: Add support for the KVM virtio-scsi driver
```
Automatic merge from submit-queue
Update kubelet to use the network-plugin-dir if the cni-bin-dir flag is not set.
This is a fix for the regession raised in issue #44683
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue
Mark Stackdriver Logging e2e tests with a feature
Makes Stackdriver Logging e2e tests, except for the most basic one, run in the separate tests suites, prepared by https://github.com/kubernetes/test-infra/pull/2542
The servicecontroller documents that the master is excluded from the
LoadBalancer / NodePort, but this is broken for clusters where we are
using taints for the master (as introduced in 1.6), instead of marking
the master as unschedulable.
This restores the desired documented behaviour, by excluding nodes that
are labeled as masters with the new 1.6 labels, even if they use the new
1.6 taints.
Fix#33884
Using Ubuntu on GCE to run cluster e2e tests requires slightly different
node.yaml and master.yaml files than GCI, because Ubuntu uses systemd as
PID 1, wheras GCI uses upstart with a systemd delegate. Therefore the
e2e tests fail using those files since the kubernetes services are not
brought back up after a node/master reboot.
Automatic merge from submit-queue
De-Flake Volume E2E: force GCEPD detach to prevent timeout
**What this PR does / why we need it**:
Fix flake`[k8s.io] Volumes [Volume] [k8s.io] PD should be mountable [Flaky] 5m38s.
Flake occurs as a result of an automated detach taking longer than 5 minutes, which exceeds the timeout limit of the delete function.
This PR adds explicit detach and wait func calls before the deletion. By forcing the detach and giving GCE an appropriate timeout limit, this should squash the timeout flake. This also significantly shortens cleanup time.
This PR does not remove the [Flaky] tag. Once this PR is merged, I'll keep an eye on the test grid for ~1 week. If no flakes surface, I'll submit a PR to pull the tag off.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #43977
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 44722, 44704, 44681, 44494, 39732)
Fix issue #34242: Attach/detach should recover from a crash
When the attach/detach controller crashes and a pod with attached PV is deleted afterwards the controller will never detach the pod's attached volumes. To prevent this the controller should try to recover the state from the nodes status and figure out which volumes to detach. This requires some changes in the volume providers too: the only information available from the nodes is the volume name and the device path. The controller needs to find the correct volume plugin and reconstruct the volume spec just from the name. This required a small change also in the volume plugin interface.
Fixes Issue #34242.
cc: @jsafrane @jingxu97
Automatic merge from submit-queue (batch tested with PRs 44722, 44704, 44681, 44494, 39732)
Don't rebuild endpoints map in iptables kube-proxy all the time.
@thockin - i think that this PR should help with yours https://github.com/kubernetes/kubernetes/pull/41030 - it (besides performance improvements) clearly defines when update because of endpoints is needed. If we do the same for services (I'm happy to help with it), i think it should be much simpler.
But please take a look if it makes sense from your perspective too.
Automatic merge from submit-queue (batch tested with PRs 44722, 44704, 44681, 44494, 39732)
prevent installation of docker from upstream
**What this PR does / why we need it**: Disallows installation of upstream docker from PPA in the Juju kubernetes-worker charm.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
Disallows installation of upstream docker from PPA in the Juju kubernetes-worker charm.
```
Automatic merge from submit-queue (batch tested with PRs 44594, 44651)
remove strings.compare(), use string native operation
I notice we use strings.Compare() in some code, we can remove it and use native operation.
Automatic merge from submit-queue
Delete deprecated node phase in kubect describe node.
**What this PR does / why we need it**:
Since NodePhase is no longer used, delete it in `kubect describe node` result.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
ref: https://github.com/kubernetes/kubernetes/pull/44388
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 42177, 42176, 44721)
Removed fluentd-gcp manifest pod
```release-note
Fluentd manifest pod is no longer created on non-registered master when creating clusters using kube-up.sh.
```
Automatic merge from submit-queue (batch tested with PRs 42177, 42176, 44721)
Job: Respect ControllerRef
**What this PR does / why we need it**:
This is part of the completion of the [ControllerRef](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/controller-ref.md) proposal. It brings Job into full compliance with ControllerRef. See the individual commit messages for details.
**Which issue this PR fixes**:
This ensures that Job does not fight with other controllers over control of Pods.
Ref: #24433
**Special notes for your reviewer**:
**Release note**:
```release-note
Job controller now respects ControllerRef to avoid fighting over Pods.
```
cc @erictune @kubernetes/sig-apps-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 42177, 42176, 44721)
CronJob: Respect ControllerRef
**What this PR does / why we need it**:
This is part of the completion of the [ControllerRef](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/controller-ref.md) proposal. It brings CronJob into compliance with ControllerRef. See the individual commit messages for details.
**Which issue this PR fixes**:
This ensures that other controllers do not fight over control of objects that a CronJob owns.
**Special notes for your reviewer**:
**Release note**:
```release-note
CronJob controller now respects ControllerRef to avoid fighting with other controllers.
```
cc @erictune @kubernetes/sig-apps-pr-reviews
iptables-restore did not previously perform any locking, meaning that
when callers (like kube-proxy) asked iptables-restore to write large
numbers of rules, the iptables-restore process might run in parallel
with other 'iptables' invocations in kubelet (hostports), docker,
and other software. This causes errors like:
"CNI request failed with status 400: 'Failed to ensure that nat chain
POSTROUTING jumps to MASQUERADE: error checking rule: exit status 4:
iptables: Resource temporarily unavailable."
or from Docker
"Failed to allocate and map port 1095-1095: iptables failed:
iptables --wait -t nat -A DOCKER -p tcp -d 0/0 --dport 1095
-j DNAT --to-destination 10.1.0.2:1095 ! -i lbr0: iptables:
Resource temporarily unavailable.\n (exit status 4)"
iptables-restore "wait" functionality was added in iptables git
commit 999eaa241212d3952ddff39a99d0d55a74e3639e but is NOT YET
in a released version of iptables.
See also https://bugzilla.redhat.com/show_bug.cgi?id=1417234
Automatic merge from submit-queue (batch tested with PRs 44555, 44238)
openstack: remove field flavor_to_resource
I believe there is no usage about `flavor_to_resource`, and I think there is no need to build that information, too.
cc @anguslees
**Release note:**
```
NONE
```
Automatic merge from submit-queue
Remove the old docker-multinode files that were built into the hyperkube image
**What this PR does / why we need it**:
ref: https://goo.gl/VxSaKx
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
The hyperkube image has been slimmed down and no longer includes addon manifests and other various scripts. These were introduced for the now removed docker-multinode setup system.
```
cc @jbeda @brendandburns @bgrant0607 @justinsb @mikedanese
Automatic merge from submit-queue
Switch to pointer to policy rule, visit and short circuit during authorization
Ref #40015
* Switches policy rule helper methods to work with pointers
* Switches authorization to use a short-circuiting visitor
Best-case, authorization short-circuits early and avoids accumulating rules it never needs to check
Worst-case (a forbidden request), it still checks all the applicable rules, but requires less allocation to do so
$ go test ./plugin/pkg/auth/authorizer/rbac/... -bench=. -benchmem -run Bench
on master:
```
BenchmarkAuthorize/allow_list_pods-8 300000 4373 ns/op 3840 B/op 26 allocs/op
BenchmarkAuthorize/allow_update_pods/status-8 300000 5121 ns/op 3840 B/op 26 allocs/op
BenchmarkAuthorize/forbid_educate_dolphins-8 300000 4706 ns/op 3840 B/op 26 allocs/op
```
with short-circuiting and policy rule pointer changes:
```
BenchmarkAuthorize/allow_list_pods-8 2000000 930 ns/op 64 B/op 2 allocs/op
BenchmarkAuthorize/allow_update_pods/status-8 1000000 1656 ns/op 64 B/op 2 allocs/op
BenchmarkAuthorize/forbid_educate_dolphins-8 500000 3395 ns/op 1488 B/op 25 allocs/op
```
Automatic merge from submit-queue
Refactoring reorganize taints function in kubectl to expose operations
**What this PR does / why we need it**:
This adds some UX functionality when specifying taints using kubectl.
For example:
```
./kubectl.sh taint nodes XYZ dedicated1=abca2:NoSchedule
node "XYZ" tainted
./kubectl.sh taint nodes XYZ dedicated1=abca1:NoSchedule --overwrite=True
node "XYZ overwritten
./kubectl.sh taint nodes XYZ dedicated1-
node "XYZ" untainted
./kubectl.sh taint nodes XYZ dedicated=abca1:NoSchedule dedicated1-
node "XYZ" modified
```
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#43167
**Release note**:
```
Fixed the output of kubectl taint node command with minor improvements.
```
When the attach/detach controller crashes and a pod with attached PV is deleted
afterwards the controller will never detach the pod's attached volumes. To
prevent this the controller should try to recover the state from the nodes
status.
Automatic merge from submit-queue
adding test for volume fstype validation
**What this PR does / why we need it**:
This PR is adding a test for volume fstype validation. Test verifies fstype specified in storage-class is being honored after volume creation.
Steps:
1. Create StorageClass with fstype set to valid type (default case included).
2. Create PVC which uses the StorageClass created in step 1.
3. Wait for PV to be provisioned.
4. Wait for PVC's status to become Bound.
5. Create pod using PVC on specific node.
6. Wait for Disk to be attached to the node.
7. Execute command in the pod to get fstype.
8. Delete pod and Wait for Volume Disk to be detached from the Node.
9. Delete PVC, PV and Storage Class.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
None
```
cc: @jeffvance @tusharnt
Automatic merge from submit-queue (batch tested with PRs 42272, 44696)
e2e test fix: Wait longer when first creating ELB
On any cloud (GCE or AWS), a lag between creating the LoadBalancer and
having it actually start serving traffic is expected. On AWS the lag is
larger, and we weren't correctly using the longer wait on our first
request.
Use a longer wait period on our first request.
Fix#44695
```release-note
NONE
```