Automatic merge from submit-queue (batch tested with PRs 43963, 43965)
Update deployment retries to a saner count
It seems that the current retries sum up to no more than 0.2s so some transient errors may drop deployments out of the queue.
Auto-generated via:
git grep -l [Ss]uccesfully | xargs sed -ri 's/([sS])uccesfully/\1uccessfully/g'
I noticed this when running kube-scheduler with --v4 and it is annoying.
Then manually reverted changed to the vendored bits.
Automatic merge from submit-queue
GC: Fix re-adoption race when orphaning dependents.
**What this PR does / why we need it**:
The GC expects that once it sees a controller with a non-nil
DeletionTimestamp, that controller will not attempt any adoption.
There was a known race condition that could cause a controller to
re-adopt something orphaned by the GC, because the controller is using a
cached value of its own spec from before DeletionTimestamp was set.
This fixes that race by doing an uncached quorum read of the controller
spec just before the first adoption attempt. It's important that this
read occurs after listing potential orphans. Note that this uncached
read is skipped if no adoptions are attempted (i.e. at steady state).
**Which issue this PR fixes**:
Fixes#42639
**Special notes for your reviewer**:
**Release note**:
```release-note
```
cc @kubernetes/sig-apps-pr-reviews
The GC expects that once it sees a controller with a non-nil
DeletionTimestamp, that controller will not attempt any adoption.
There was a known race condition that could cause a controller to
re-adopt something orphaned by the GC, because the controller is using a
cached value of its own spec from before DeletionTimestamp was set.
This fixes that race by doing an uncached quorum read of the controller
spec just before the first adoption attempt. It's important that this
read occurs after listing potential orphans. Note that this uncached
read is skipped if no adoptions are attempted (i.e. at steady state).
Deployment should ignore Pods that don't match the selector, even if
they have a ControllerRef pointing to one of the ReplicaSets it owns.
The ReplicaSet itself will orphan the Pod as soon as it syncs.
Although Deployment already applied its ControllerRef to adopt matching
ReplicaSets, it actually still used label selectors to list objects that
it controls. That meant it didn't actually follow the rules of
ControllerRef, so it could still fight with other controller types.
This should mean that the special handling for overlapping Deployments
is no longer necessary, since each Deployment will only see objects that
it owns (via ControllerRef).
Make rollbacks re-entrant in the Deployment controller, otherwise
fast enqueues of a Deployment may end up in undesired behavior
- redundant rollbacks.
Automatic merge from submit-queue (batch tested with PRs 41714, 41510, 42052, 41918, 31515)
controller: fix requeueing progressing deployments
Drop the secondary queue and add either ratelimited or after the
required amount of time that we need to wait directly in the main
queue. In this way we can always be sure that we will sync back
the Deployment if its progress has yet to resolve into a complete
(NewReplicaSetAvailable) or TimedOut condition.
This should also simplify the deployment controller a bit.
Fixes https://github.com/kubernetes/kubernetes/issues/39785. Once this change soaks, I will move the test out of the flaky suite.
@kubernetes/sig-apps-misc
Drop the secondary queue and add either ratelimited or after the
required amount of time that we need to wait directly in the main
queue. In this way we can always be sure that we will sync back
the Deployment if its progress has yet to resolve into a complete
(NewReplicaSetAvailable) or TimedOut condition.
To prepare for implementing ControllerRef across all controllers,
this pushes the common adopt/orphan logic into ControllerRefManager
so each controller doesn't have to duplicate it.
This also shares the adopt/orphan logic between Pods and ReplicaSets,
so it lives in only one place.
Automatic merge from submit-queue (batch tested with PRs 41196, 41252, 41300, 39179, 41449)
controller: cleanup workload controllers a bit
* Switches glog.Errorf to utilruntime.HandleError in DS and RC controllers
* Drops a couple of unused variables in the DS, SS, and Deployment controllers
* Updates some comments
@kubernetes/sig-apps-misc
* Switches glog.Errorf to utilruntime.HandleError in DS and RC controllers
* Drops a couple of unused variables in the DS, SS, and Deployment controllers
* Updates some comments
Deployments get cleaned up only when they are paused, they get scaled up/down,
or when the strategy that drives rollouts completes. This means that stuck
deployments that fall into none of the above categories will not get cleaned
up. Since cleanup is already safe by itself (we only delete old replica sets
that are synced by the replica set controller and have no replicas) we can
execute it for every deployment when there is no intention to rollback.