If the mount operation exceeds the timeout, it will return an error and the
pod worker will retry in the next sync (10s or less). Compared with the
original value (i.e., 10 minutes), this frees the pod worker sooner to process
pod updates, if there are any.
This commit adds a new volume manager in kubelet that synchronizes
volume mount/unmount (and attach/detach, if attach/detach controller
is not enabled).
This eliminates the race conditions between the pod creation loop
and the orphaned volumes loops. It also removes the unmount/detach
from the `syncPod()` path so volume clean up never blocks the
`syncPod` loop.
Similarly to Nodes, PersistentVolumes are not in any namespace and we should
not block events on them. Currently, these events are rejected with
'Event "nfs.145841cf9c8cfaf0" is invalid: involvedObject.namespace: Invalid value: "": does not match involvedObject'
Automatic merge from submit-queue
kubelet/rkt - treat pod container as the infra - only network stats
As no "container name" annotation was being applied to the pod as a whole, the rkt pod container didn't have a container name label. This means that in stat/summary it came up as a nameless container that belonged to the pod.
this was problematic as it caused double counting of container stats.
this adds a container name annotation to the pod level which will be overridden during label creation by annotations of the same name at the container level for the containers themselves.
stats/summary will do the right thing as it will treat it the same as the infra container, just get network stats from it.
Suppress #26759
cc @kubernetes/sig-node @kubernetes/rktnetes-maintainers
Automatic merge from submit-queue
Rbac api group make subject apiversion optional
This fixes the verification for the "apiVerion" field in the RBAC subject and makes it optional. This field isn't used and currently won't pass validation if it's filled.
```yml
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1alpha1
metadata:
name: admins
subject:
- kind: User
name: admin-user
# apiVersion: "entering anything here will fail validation"
roleRef:
kind: ClusterRole
name: admin
apiVersion: rbac.authorization.k8s.io/v1alpha1
```
Automatic merge from submit-queue
Quota uses old object provided via admission
fixes https://github.com/kubernetes/kubernetes/issues/26178
@sdminonne - fixes a bug in services not intercepting updates.
/cc @liggitt
In nsenter_mount.go/isLikelyNotMountPoint function, the returned output
from findmnt command misses the last letter. Modify the code to make sure
that output has the full target path. fix#26421#25056#22911
- replaces probeVolume with scsiHostRescan to scan hot attached disks
- fixes substring match of UUID returned from AttachDisk
- changes DetachDisk to take volumePath argument instead of diskID
- fixes delayed failure at mount rather than attach disk
- removes cloning of virtual disk in AttachDisk
The previous size, of 2KB, in practice always was filled completely by
http server-releated stuff well above the panic itself, and truncated
before anything of real value was printed.
This increases the stack size so that panics are printed in full.
This sets AttachOptions.CommandName dynamically depending on the corba Command
hierarchy. If the root command is named e.g. "oc" (for the OpenShift cli) this
will result in "oc attach" instead of the static "kubectl attach" before this
patch.
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1341450
Automatic merge from submit-queue
Let kubelet log the DeletionTimestamp if it's not nil in update
This helps to debug if it's the kubelet to blame when a pod is not deleted.
Example output:
```
SyncLoop (UPDATE, "api"): "redis-master_default(c6782276-2dd4-11e6-b874-64510650ab1c):DeletionTimestamp=2016-06-08T23:58:12Z"
```
ref #26290
cc @Random-Liu
Automatic merge from submit-queue
Update reason_cache.go, Get method operate lru cache not threadsafe
The reason_cache wrapped lru cache , lru cache modies linked list even for a get, should use WLock for both read and write
Automatic merge from submit-queue
Fix docker api version in kubelet
There are two variables `dockerv110APIVersion` and `dockerV110APIVersion` with
the same purpose, but different values. Remove the incorrect one and fix usage
in the file.
/cc @dchen1107 @Random-Liu
Automatic merge from submit-queue
processor listener: fix locking in pop()
Currently the lock in processorListener is used to guard pendingNotifications. But in pop, it also locks around on select chan. This will block the goroutine with lock acquired.
This PR changes the lock to guard the correct section only.
Automatic merge from submit-queue
pkg/kubectl: add resource printers for rbac api group
This PR adds the necessary kubectl printers for the rbac api group which we overlooked in previous PRs.
cc @erictune
Automatic merge from submit-queue
ResourceQuota BestEffort scope aligned with Pod level QoS
This aligns quota with the changes in kubelet and CLI.
So if quota allows 10 `BestEffort` pods, it will now track properly with what the user sees with changes in 1.3.
```
apiVersion: v1
kind: ResourceQuota
metadata:
name: best-effort
spec:
hard:
pods: "10"
scopes:
- BestEffort
```
/cc @vishh @kubernetes/rh-cluster-infra
Automatic merge from submit-queue
AWS: cache instances during service reload to avoid rate limiting on restart
Fixes#25610 by reducing redundant calls to DescribeInstances()
```release-note
* The AWS cloudprovider will cache results from DescribeInstances() if the set of nodes hasn't changed
```
Also move int/stringSlicesEqual from servicecontroller.go to pkg/util/slice
Automatic merge from submit-queue
Extract interface for master endpoints reconciler.
Make the master endpoints reconciler an interface so its implementation can be overridden, if
desired.
xref #20975#26574
cc @kubernetes/sig-api-machinery @lavalamp @smarterclayton @pmorie @DirectXMan12 @wojtek-t @kubernetes/rh-cluster-infra
OpenShift needs to be able to use a discovery client against a different
prefix. Make LegacyPrefix optional and parameterizable to the client. No
change to existing interfaces.
Automatic merge from submit-queue
fix recursive & non-recursive kubectl get of generic output format
This PR fixes the issues with `kubectl get` in https://github.com/kubernetes/kubernetes/issues/26466
Changes made:
- fix printing when using the generic output format in both non-recursive & recurvise settings to ensure that errors are being shown
- add tests to check printing generic output in a **non-recursive** setting with non-existent pods
- clean up the **recursive** `kubectl get` tests
/cc @janetkuo
Automatic merge from submit-queue
Sets IgnoreUnknown=1 in CNI_ARGS
```release-note
release-note-none
```
K8 uses CNI_ARGS to pass pod namespace, name and infra container
id to the CNI network plugin. CNI logic will throw an error
if these args are not known to it, unless the user specifies
IgnoreUnknown as part of CNI_ARGS. This PR sets IgnoreUnknown=1
to prevent the CNI logic from erroring and blocking pod setup.
https://github.com/appc/cni/pull/158https://github.com/appc/cni/issues/126
Automatic merge from submit-queue
Listing pods only once when getting pods for RS in deployment
Fixes#26834
1. Avoid ranging over RSes and then `List` pods of each RS. Instead, `List` pods of the deployment once, and then filter pods of each RS.
2. Avoid using clientset to `List` pods in deployment controller. Use podStore instead. (TODO in some functions because the unit tests don't have podStore.)
@kubernetes/deployment
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/.github/PULL_REQUEST_TEMPLATE.md?pixel)]()
Since appc requires gid to be non-empty today (https://github.com/appc/spec/issues/623),
we have to error out when gid is empty instead of using the root gid.
Make unversioned.ListMeta implement List. Update all the *List types so they implement GetListMeta.
This helps avoid using reflection to get list information.
Remove all unnecessary boilerplate, move the interfaces to the right
places, and add a test that verifies that objects implement one, the
other, but never both.
Automatic merge from submit-queue
rkt: Replace 'journalctl' with rkt's GetLogs() API.
This replaced the `journactl` shell out with rkt's GetLogs() API.
Fixes#26997
To make this fully work, we need rkt to have this patch #https://github.com/coreos/rkt/pull/2763
cc @kubernetes/sig-node @euank @alban @iaguis @jonboulle
Automatic merge from submit-queue
AWS: support mixed plaintext/encrypted ports in ELBs via service.beta.kubernetes.io/aws-load-balancer-ssl-ports annotation
Fixes#26268
Implements the second SSL ELB annotation, per #24978
`service.beta.kubernetes.io/aws-load-balancer-ssl-ports=*` (comma-separated list of port numbers or e.g. `https`)
If not specified, all ports are secure (SSL or HTTPS).
Automatic merge from submit-queue
rkt: Do not run rkt pod inside a pre-created netns when network plugin is no-op
This fixed a panic where the returned pod network status is nil. (Fix#26540)
Also this makes lkvm stage1 able to run inside a user defined network, where the network name needs to be 'rkt.kubernetes.io'. A temporal solution to solve the network issue for lkvm stage1.
Besides, I fixed minor issues such as passing the wrong pod UID when cleaning up the netns file.
/cc @euank @pskrzyns @jellonek @kubernetes/sig-node
I tested with no networkplugin locally, works fine.
As a reminder, we need to document this in the release.https://github.com/kubernetes/kubernetes/issues/26201
This fixed a panic where the returned pod network status is nil.
Also this makes lkvm stage1 able to run inside a user defined
network, where the network name needs to be 'rkt.kubernetes.io'.
Also fixed minor issues such as passing the wrong pod UID, ignoring
logging errors.
Automatic merge from submit-queue
rkt: Fix incomplete selinux context string when the option is partial.
Fix "EmptyDir" e2e tests failures caused by #https://github.com/kubernetes/kubernetes/pull/24901
As mentioned in https://github.com/kubernetes/kubernetes/pull/24901#discussion_r61372312
We should apply the selinux context of the rkt data directory (/var/lib/rkt) when users do not specify all the selinux options.
Due to my fault, the change was missed during rebase, thus caused the regression.
After applying this PR, the e2e tests passed.
```
$ go run hack/e2e.go -v -test --test_args="--ginkgo.dryRun=false --ginkgo.focus=EmptyDir"
...
Ran 19 of 313 Specs in 199.319 seconds
SUCCESS! -- 19 Passed | 0 Failed | 0 Pending | 294 Skipped PASS
```
BTW, the test is removed because the `--no-overlay=true` flag will only be there on non-coreos distro.
cc @euank @kubernetes/sig-node
Automatic merge from submit-queue
LBaaS v2 Support for Openstack Cloud Provider Plugin
Resolves#19774.
This work is based on Gophercloud support for LBaaS v2 currently in review (this will have to merge first):
https://github.com/rackspace/gophercloud/pull/575
These changes includes the addition of a new loadbalancer configuration option: **LBVersion**. If this configuration attribute is missing or anything other than "v2", lbaas v1 implementation will be used.
Automatic merge from submit-queue
GCE attach tests
Add basic tests for GCE attacher.
Looking at the code, it would deserve some refactoring as suggested in #25888, so mounting is not tested at all.
Automatic merge from submit-queue
Add specific error type for "operation already exists" error.
PersistentVolume controller needs to know why scheduling a new operation has failed - if the operation was already running or some other error happened.
Automatic merge from submit-queue
AWS: kubectl get service should print hostnames for LB services
Fixes#21526
Also test wide outputs. We only guarantee the first IP to be fully printed
if multiple ingresses are present. For AWS, which has no ingress IPs, but
only hostnames, the ELB hostname will be truncated, unless -o=wide is
specified.
Automatic merge from submit-queue
Fix NetworkPolicy validation bug
Fix bugs in NetworkPolicy resource (new in v1.3) validation.
Please add this to the v1.3 milestone.
Automatic merge from submit-queue
Preserve query strings in HTTP probes instead of escaping them
Fixes a problem reported on Slack by devth.
```release-note
* Allow the use of query strings and URI fragments in HTTP probes
```
This might also preserve fragments, for those crazy enough to pass them.
I am using url.Parse() on the path in order to get path/query/fragment
and also deliberately avoiding the addition of more fields to the API.
Automatic merge from submit-queue
Stop 'kubectl drain' deleting pods with local storage.
Kubectl drain will not continue if there are pods with local storage unless
forced with --delete-local-data.
Fixes#23972
Fixes#21526
Also test wide outputs. We only guarantee the first IP to be fully printed
if multiple ingresses are present. For AWS, which has no ingress IPs, but
only hostnames, the ELB hostname will be truncated, unless -o=wide is
specified.
Automatic merge from submit-queue
Resource quantity must support leading and trailing whitespace in JSON for back-compat
For backwards compatibility reasons, we must continue to support leading or trailing whitespace on Quantity values when deserialized from JSON. We must also support numbers serialized into yaml (`cpu: 1`) and JSON (`"cpu": 1`)
Fixes#26898
Automatic merge from submit-queue
Custom sort function for InitContainersStatuses
Order in init containers matters. Statues shoudln't be sorted by name.
Automatic merge from submit-queue
Move quota usage testing for loadbalancers into unit tests
Fixes https://github.com/kubernetes/kubernetes/issues/26319
* moved testing for node port and load balancer usage in quota to unit tests
* remove node port and node port -> loadbalancer service testing out of e2e
* covered already in replenishment_controller_test scenario
Given the time it takes to even allocate a load balancer, it seems better to test that outside of this test case to avoid unnecessary flakes.
/cc @bprashanth
There are two variables `dockerv110APIVersion` and `dockerV110APIVersion` with
the same purpose, but different values. Remove the incorrect one and fix usage
in the file.
Fixes#26268
Implements the second SSL ELB annotation, per #24978
service.beta.kubernetes.io/aws-load-balancer-ssl-ports=* (or e.g. https)
If not specified, all ports are secure (SSL or HTTPS).
Automatic merge from submit-queue
correction on rbd volume object and defaults
- add `omitempty` to `RBDPool RadosUser Keyring SecretRef ReadOnly`
- move defaults from `pkg/volume/rbd/rbd.go` to `pkg/api/v1/defaults.go`
addressing #18885
Double slashes are not allowed in annotation keys. Moreover, using the 63
characters of the name component in an annotation key will shorted the space
for the container name.
NewCacher is a wrapper of NewCacherFromConfig. NewCacher understands
how to create a key func from scopeStrategy. However, it is not the
responsibility of cacher. So we should remove this function, and
construct the config in its caller, which should understand scopeStrategy.
Automatic merge from submit-queue
rkt: Wrap exec errors as utilexec.ExitError
This is needed by the exec prober to distinguish error types and exit
codes correctly. Without this, the exec prober used for liveness probes
doesn't identify errors correctly and restarts aren't triggered. Fixes#26456
An alternative, and preferable solution would be to use utilexec
everywhere, but that change is much more involved and should come at a
later date. Unfortunately, until that change is made, writing tests for
this is quite difficult.
cc @yifan-gu @sjpotter
Automatic merge from submit-queue
volume controller: Convert PersistentVolumes from Kubernetes 1.2
In Kubernetes 1.2 we used template PersistentVolume for provisioning. When a claim for dynamic volume was detected, Kubernetes did:
- create template PV for the claim with dummy pointer to storage asset
- allocate storage asset such as AWS EBS
- fill real pointer to the created storage asset to the template PV
In refactored volume provisioner, Kubernetes allocates the storage asset first and then creates a Kubernetes PV instance already with the correct pointer to the storage asset.
To support seamles upgrade from 1.2 to 1.3 we need to remove these unprovisioned template PVs. The new controller does not use them, it will see PVC for dynamic provisioning and create real PV instead.
See https://github.com/pmorie/pv-haxxz/pull/3 for pseudocode.
Automatic merge from submit-queue
Fix GCE attacher/detacher to ignore return value of failed calls.
The plugin should ignore any return value if err is set. Found when writing unit tests in #26615 - my dummy `DiskIsAttached` returned `false, errors.New('fake error')` and the volume was **not** detached although the log message `"Error checking if PD (%q) is already attached to current node (%q). Will continue and try detach anyway."` suggested otherwise
@saad-ali, PTAL
@kubernetes/sig-storage
Automatic merge from submit-queue
Wait for all volumes/claims to get synced in unit test.
Controller.HasSynced() returns true when all initial claims/volumes were sent
to appropriate goroutines, not when the goroutine has actually processed them.
Fixes#26712
In Kubernetes 1.2 we used template PersistentVolume for provisioning. When a
claim for dynamic volume was detected, Kubernetes did:
- create template PV for the claim with dummy pointer to storage asset
- allocate storage asset such as AWS EBS
- fill real pointer to the created storage asset to the template PV
In refactored volume provisioner, Kubernetes allocates the storage asset first
and then creates a Kubernetes PV instance already with the correct pointer
to the storage asset.
To support seamles upgrade from 1.2 to 1.3 we need to remove these
unprovisioned template PVs. The new controller does not use them, it will see
PVC for dynamic provisioning and create real PV instead.
Automatic merge from submit-queue
Fix typo and linewrap comments in PV controller
Fix some typos and linewrap long comments that I found while going over this code investigating something.
Automatic merge from submit-queue
Add timeout for image pulling
Fix#26300.
With this PR, if image pulling makes no progress for *1 minute*, the operation will be cancelled. Docker reports progress for every 512kB block (See [here](3d13fddd2b/pkg/progress/progressreader.go (L32))), *512kB/min* means the throughput is *<= 8.5kB/s*, which should be kind of abnormal?
It's a little hard to write unit test for this, so I just manually tested it. If I set the `defaultImagePullingStuckTimeout` to 0s, and `defaultImagePullingProgressReportInterval` to 1s, image pulling will be cancelled.
```
E0601 18:48:29.026003 46185 kube_docker_client.go:274] Cancel pulling image "nginx:latest" because of no progress for 0, latest progress: "89732b811e7f: Pulling fs layer "
E0601 18:48:29.026308 46185 manager.go:2110] container start failed: ErrImagePull: net/http: request canceled
```
/cc @kubernetes/sig-node
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/.github/PULL_REQUEST_TEMPLATE.md?pixel)]()
Controller.HasSynced() returns true when all initial claims/volumes were sent
to appropriate goroutines, not when the goroutine has actually processed them.
Automatic merge from submit-queue
Don't allow deps with no discernible license
This updates the few deps we had with no LICENSE file to current versions that do have that file. It also disallows new deps without obvious licenses.
Automatic merge from submit-queue
Attach/Detach Controller Kubelet Changes
This PR contains changes to enable attach/detach controller proposed in #20262.
Specifically it:
* Introduces a new `enable-controller-attach-detach` kubelet flag to enable control by attach/detach controller. Default enabled.
* Removes all references `SafeToDetach` annotation from controller.
* Adds the new `VolumesInUse` field to the Node Status API object.
* Modifies the controller to use `VolumesInUse` instead of `SafeToDetach` annotation to gate detachment.
* Modifies kubelet to set `VolumesInUse` before Mount and after Unmount.
* There is a bug in the `node-problem-detector` binary that causes `VolumesInUse` to get reset to nil every 30 seconds. Issue https://github.com/kubernetes/node-problem-detector/issues/9#issuecomment-221770924 opened to fix that.
* There is a bug here in the mount/unmount code that prevents resetting `VolumeInUse in some cases, this will be fixed by mount/unmount refactor.
* Have controller process detaches before attaches so that volumes referenced by pods that are rescheduled to a different node are detached first.
* Fix misc bugs in controller.
* Modify GCE attacher to: remove retries, remove mutex, and not fail if volume is already attached or already detached.
Fixes#14642, #19953
```release-note
Kubernetes v1.3 introduces a new Attach/Detach Controller. This controller manages attaching and detaching volumes on-behalf of nodes that have the "volumes.kubernetes.io/controller-managed-attach-detach" annotation.
A kubelet flag, "enable-controller-attach-detach" (default true), controls whether a node sets the "controller-managed-attach-detach" or not.
```
Automatic merge from submit-queue
Fill controller caches on startup
The controller needs to fill its caches before it starts binding/recycling/ deleting or provisioning volumes and claims. This was done using blocking initial 'xxx added' from going through syncClaim/syncVolume. However, when the caches were full, the controller waited for the next sync period to do actual binding/recycling etc.
In this patch, the controller fills its caches directly from etcd and then processes initial 'xxx added' events to reconcile the world and bind/recycle/ delete/provision stuff, resulting in faster binding after startup.
Fixes#25967 (properly)
This PR contains Kubelet changes to enable attach/detach controller control.
* It introduces a new "enable-controller-attach-detach" kubelet flag to
enable control by controller. Default enabled.
* It removes all references "SafeToDetach" annoation from controller.
* It adds the new VolumesInUse field to the Node Status API object.
* It modifies the controller to use VolumesInUse instead of SafeToDetach
annotation to gate detachment.
* There is a bug in node-problem-detector that causes VolumesInUse to
get reset every 30 seconds. Issue https://github.com/kubernetes/node-problem-detector/issues/9
opened to fix that.
Automatic merge from submit-queue
rkt: Get logs via syslog identifier
This change works around https://github.com/coreos/rkt/issues/2630
Without this change, logs cannot reliably be collected for containers
with short lifetimes.
With this change, logs cannot be collected on rkt versions v1.6.0 and
before.
I'd like to also bump the required rkt version, but I don't want to do that until there's a released version that can be pointed to (so the next rkt release).
I haven't added tests (which were missing) because this code will be removed if/when logs are retrieved via the API. I have run E2E tests with this merged in and verified the tests which previously failed no longer fail.
cc @yifan-gu
Automatic merge from submit-queue
read gluster log to surface glusterfs plugin errors properly in describe events
glusterfs.go does not properly expose errors as all mount errors go to a log file, I propose we read the log file to expose the errors without asking the users to 'go look at this log'
This PR does the following:
1. adds a gluster option for log-level=ERROR to remove all noise from log file
2. change log file name and path based on PV + Pod name - so specific per PV and Pod
3. create a utility to read the last two lines of the log file when failure occurs
old behavior:
```
13s 13s 1 {kubelet 127.0.0.1} Warning FailedMount Unable to mount volumes for pod "bb-gluster-pod2_default(34b18c6b-070d-11e6-8e95-52540092b5fb)": glusterfs: mount failed: Mount failed: exit status 1
Mounting arguments: 192.168.234.147:myVol2 /var/lib/kubelet/pods/34b18c6b-070d-11e6-8e95-52540092b5fb/volumes/kubernetes.io~glusterfs/pv-gluster glusterfs [log-file=/var/lib/kubelet/plugins/kubernetes.io/glusterfs/pv-gluster/glusterfs.log]
Output: Mount failed. Please check the log file for more details.
```
improved behavior: (updated after suggestions from community)
```
34m 34m 1 {kubelet 127.0.0.1} Warning FailedMount Unable to mount volumes for pod "bb-multi-pod1_default(e7d7f790-0d4b-11e6-a275-52540092b5fb)": glusterfs: mount failed: Mount failed: exit status 1
Mounting arguments: 192.168.123.222:myVol2 /var/lib/kubelet/pods/e7d7f790-0d4b-11e6-a275-52540092b5fb/volumes/kubernetes.io~glusterfs/pv-gluster2 glusterfs [log-level=ERROR log-file=/var/lib/kubelet/plugins/kubernetes.io/glusterfs/pv-gluster2/bb-multi-pod1-glusterfs.log]
Output: Mount failed. Please check the log file for more details.
the following error information was pulled from the log to help resolve this issue:
[2016-04-28 14:21:29.109697] E [socket.c:2332:socket_connect_finish] 0-glusterfs: connection to 192.168.123.222:24007 failed (Connection timed out)
[2016-04-28 14:21:29.109767] E [glusterfsd-mgmt.c:1819:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 192.168.123.222 (Transport endpoint is not connected)
```
also this PR is alternate approach to : #24624
Automatic merge from submit-queue
rkt: Add pod selinux support.
Currently only pod level selinux context is supported, besides when
running selinux, we will not be able to use the overlay fs, see:
https://github.com/coreos/rkt/issues/1727#issuecomment-173203129.
cc @kubernetes/sig-node @alban @mjg59 @pmorie
Automatic merge from submit-queue
Fix data race in volume controller unit test.
Reactor must be locked when fiddling with reactor.volumes and reactor.claims. Therefore add new functions to add/delete volume/claim with sending an event.
Fixes#26345
Automatic merge from submit-queue
Add direct serializer
Fix#25589. Implemented a direct codec that doesn't do conversion, but sets the group, version and kind before serialization as Clayton suggested [here](https://github.com/kubernetes/kubernetes/issues/25589#issuecomment-219168009).
First commit is cherry-picked from #24826.
@kubernetes/sig-api-machinery
This is needed by the exec prober to distinguish error types and exit
codes correctly.
An alternative, and preferable solution would be to use utilexec
everywhere, but that change is much more involved and should come at a
later date. Unfortunately, until that change is made, writing tests for
this is quite difficult.
Automatic merge from submit-queue
Fix fake event recorder race
Event recorder should wait for some time to get all expected events, the event may be written by another goroutine that just have finished.
It should not slow down the test in most cases, only when there is a bug and expected event is not sent.
Fixes#26578
Using P2 to speed up merge and to prevent further flakes.
@kubernetes/sig-storage
Event recorder should wait for some time to get all expected events, the event
may be written by another goroutine that just have finished.
It should not slow down the test in most cases, only when there is a bug and
expected event is not sent.
Automatic merge from submit-queue
retry GetThirdPartyGroupVersions
GetThirdPartyGroupVersions() may return a "NotFound" error if a thirdparty group is deleted in the interim between the group-discovery and the resource-discovery. This is causing e2e flakes in all tests that run kubectl, because test/e2e/thirdparty.go is creating/deleting thirdparty groups.
Fix#26425
The e2e flakes will have the following pattern:
1. the test is calling kubectl
2. error message is `Error from server: the server could not find the requested resource`
3. in the apiserver log, you should see `GET /apis/company.com/v1: (518.944µs) 404 [[kubectl/v1.3.0 (linux/amd64) kubernetes/ae28564] 104.154.110.118:46043]`
For detail see [here](https://github.com/kubernetes/kubernetes/issues/26425#issuecomment-222844523)
cc @janetkuo @brendanburns
Reactor must be locked when fiddling with reactor.volumes and reactor.claims.
Therefore add new functions to add/delete volume/claim with sending an event.
Automatic merge from submit-queue
Stabilize controller unit tests.
Remove test "5-1", it's flaky as it depends on order of execution of goroutines. When the controller starts, existing claim is enqueued as "initial sync event" and a new volume is enqueued to separate goroutine. It is not deterministic which goroutine processes its events first and there is no way how to tell that the claim event was processed.
Also, force resync of the controllers after the test to make sure all events are processed.
Fixes unit test flakes.
@kubernetes/sig-storage