Automatic merge from submit-queue (batch tested with PRs 44741, 44853, 44572, 44797, 44439)
controller: fix saturation check in Deployments
Fixes https://github.com/kubernetes/kubernetes/issues/44436
@kubernetes/sig-apps-bugs
I'll cherry-pick this back to 1.6 and 1.5
Automatic merge from submit-queue (batch tested with PRs 44447, 44456, 43277, 41779, 43942)
Clean up pre-ControllerRef compatibility logic
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#43323
**Special notes for your reviewer**:
No
**Release note**:
```
NONE
```
Automatic merge from submit-queue (batch tested with PRs 43900, 44152, 44324)
make deployment unit tests need to respect subresources
Fixes#42569
I check all the unit test code related to `Matches` method, seems there's only one line we could change to not break previous testing logic
@kargakis ptal, thanks
/assign @kargakis
Automatic merge from submit-queue (batch tested with PRs 43963, 43965)
Update deployment retries to a saner count
It seems that the current retries sum up to no more than 0.2s so some transient errors may drop deployments out of the queue.
Auto-generated via:
git grep -l [Ss]uccesfully | xargs sed -ri 's/([sS])uccesfully/\1uccessfully/g'
I noticed this when running kube-scheduler with --v4 and it is annoying.
Then manually reverted changed to the vendored bits.
The cache was sometimes catching up while we were testing the case
where the cache is not yet caught up.
Before this fix, I could reproduce the failure with the following
command. After the fix, it passes.
```
go test -count 100000 -run TestSyncDeploymentDeletionRace
```
Automatic merge from submit-queue
GC: Fix re-adoption race when orphaning dependents.
**What this PR does / why we need it**:
The GC expects that once it sees a controller with a non-nil
DeletionTimestamp, that controller will not attempt any adoption.
There was a known race condition that could cause a controller to
re-adopt something orphaned by the GC, because the controller is using a
cached value of its own spec from before DeletionTimestamp was set.
This fixes that race by doing an uncached quorum read of the controller
spec just before the first adoption attempt. It's important that this
read occurs after listing potential orphans. Note that this uncached
read is skipped if no adoptions are attempted (i.e. at steady state).
**Which issue this PR fixes**:
Fixes#42639
**Special notes for your reviewer**:
**Release note**:
```release-note
```
cc @kubernetes/sig-apps-pr-reviews
The GC expects that once it sees a controller with a non-nil
DeletionTimestamp, that controller will not attempt any adoption.
There was a known race condition that could cause a controller to
re-adopt something orphaned by the GC, because the controller is using a
cached value of its own spec from before DeletionTimestamp was set.
This fixes that race by doing an uncached quorum read of the controller
spec just before the first adoption attempt. It's important that this
read occurs after listing potential orphans. Note that this uncached
read is skipped if no adoptions are attempted (i.e. at steady state).
Automatic merge from submit-queue
kubectl: Use v1.5-compatible ownership logic when listing dependents.
**What this PR does / why we need it**:
This restores compatibility between kubectl 1.6 and clusters running Kubernetes 1.5.x. It introduces transitional ownership logic in which the client considers ControllerRef when it exists, but does not require it to exist.
If we were to ignore ControllerRef altogether (pre-1.6 client behavior), we would introduce a new failure mode in v1.6 because controllers that used to get stuck due to selector overlap will now make progress. For example, that means when reaping ReplicaSets of an overlapping Deployment, we would risk deleting ReplicaSets belonging to a different Deployment that we aren't about to delete.
This transitional logic avoids such surprises in 1.6 clusters, and does no worse than kubectl 1.5 did in 1.5 clusters. To prevent this when kubectl 1.5 is used against 1.6 clusters, we can cherrypick this change.
**Which issue this PR fixes**:
Fixes#43159
**Special notes for your reviewer**:
**Release note**:
```release-note
```
In particular, we should not assume ControllerRefs are necessarily set.
However, we can still use ControllerRefs that do exist to avoid
interfering with controllers that do use it.
This effectively reverts the client-side changes in
cec3899b96.
We have to maintain the old behavior on the client side to support
version skew when talking to old servers that set the annotation.
However, the new server-side behavior is still to NOT set the
annotation.
When the upgrade test operates on Deployments in a pre-1.6 cluster
(i.e. during the Setup phase), it needs to use the v1.5 deployment/util
logic. In particular, the v1.5 logic does not filter children to only
those with a matching ControllerRef.
Deployment should ignore Pods that don't match the selector, even if
they have a ControllerRef pointing to one of the ReplicaSets it owns.
The ReplicaSet itself will orphan the Pod as soon as it syncs.
The list functions in deployment/util are used outside the Deployment
controller itself. Therefore, they don't do actual adoption/orphaning.
However, they still need to avoid listing things that don't belong.
Although Deployment already applied its ControllerRef to adopt matching
ReplicaSets, it actually still used label selectors to list objects that
it controls. That meant it didn't actually follow the rules of
ControllerRef, so it could still fight with other controller types.
This should mean that the special handling for overlapping Deployments
is no longer necessary, since each Deployment will only see objects that
it owns (via ControllerRef).
Automatic merge from submit-queue (batch tested with PRs 31783, 41988, 42535, 42572, 41870)
controller: ensure deployment rollback is re-entrant
Make rollbacks re-entrant in the Deployment controller, otherwise
fast enqueues of a Deployment may end up in undesired behavior
- redundant rollbacks.
Fixes https://github.com/kubernetes/kubernetes/issues/36703
@kubernetes/sig-apps-bugs
Make rollbacks re-entrant in the Deployment controller, otherwise
fast enqueues of a Deployment may end up in undesired behavior
- redundant rollbacks.
Automatic merge from submit-queue
kubectl: respect deployment strategy parameters for rollout status
Fixes https://github.com/kubernetes/kubernetes/issues/40496
`rollout status` now respects the strategy parameters for a RollingUpdate Deployment. This means that it will exit as soon as minimum availability is reached for a rollout (note that if you allow maximum availability, `rollout status` will succeed as soon as the new pods are created)
@janetkuo @AdoHe ptal
Automatic merge from submit-queue (batch tested with PRs 41714, 41510, 42052, 41918, 31515)
controller: fix requeueing progressing deployments
Drop the secondary queue and add either ratelimited or after the
required amount of time that we need to wait directly in the main
queue. In this way we can always be sure that we will sync back
the Deployment if its progress has yet to resolve into a complete
(NewReplicaSetAvailable) or TimedOut condition.
This should also simplify the deployment controller a bit.
Fixes https://github.com/kubernetes/kubernetes/issues/39785. Once this change soaks, I will move the test out of the flaky suite.
@kubernetes/sig-apps-misc
Automatic merge from submit-queue (batch tested with PRs 41667, 41820, 40910, 41645, 41361)
Refactor ControllerRefManager
**What this PR does / why we need it**:
To prepare for implementing ControllerRef across all controllers (https://github.com/kubernetes/community/pull/298), this pushes the common adopt/orphan logic into ControllerRefManager so each controller doesn't have to duplicate it.
This also shares the adopt/orphan logic between Pods and ReplicaSets, so it lives in only one place.
**Which issue this PR fixes**:
**Special notes for your reviewer**:
**Release note**:
```release-note
```
cc @kubernetes/sig-apps-pr-reviews
Drop the secondary queue and add either ratelimited or after the
required amount of time that we need to wait directly in the main
queue. In this way we can always be sure that we will sync back
the Deployment if its progress has yet to resolve into a complete
(NewReplicaSetAvailable) or TimedOut condition.
To prepare for implementing ControllerRef across all controllers,
this pushes the common adopt/orphan logic into ControllerRefManager
so each controller doesn't have to duplicate it.
This also shares the adopt/orphan logic between Pods and ReplicaSets,
so it lives in only one place.
Automatic merge from submit-queue (batch tested with PRs 41196, 41252, 41300, 39179, 41449)
controller: cleanup workload controllers a bit
* Switches glog.Errorf to utilruntime.HandleError in DS and RC controllers
* Drops a couple of unused variables in the DS, SS, and Deployment controllers
* Updates some comments
@kubernetes/sig-apps-misc
* Switches glog.Errorf to utilruntime.HandleError in DS and RC controllers
* Drops a couple of unused variables in the DS, SS, and Deployment controllers
* Updates some comments
Automatic merge from submit-queue (batch tested with PRs 34543, 40606)
sync client-go and move util/workqueue
The vision of client-go is that it provides enough utilities to build a reasonable controller. It has been copying `util/workqueue`. This makes it authoritative.
@liggitt I'm getting really close to making client-go authoritative ptal.
approved based on https://github.com/kubernetes/kubernetes/issues/40363
Automatic merge from submit-queue
controller: don't run informers in unit tests when unnecessary
Fixes https://github.com/kubernetes/kubernetes/issues/39908
@mfojtik it seems that using informers makes the deployment sync for the initial relist so this races with the enqueue that these tests are testing.
Automatic merge from submit-queue (batch tested with PRs 40239, 40397, 40449, 40448, 40360)
move the discovery and dynamic clients
Moved the dynamic client, discovery client, testing/core, and testing/cache to `client-go`. Dependencies on api groups we don't have generated clients for have dropped out, so federation, kubeadm, and imagepolicy.
@caesarxuchao @sttts
approved based on https://github.com/kubernetes/kubernetes/issues/40363
Deployments get cleaned up only when they are paused, they get scaled up/down,
or when the strategy that drives rollouts completes. This means that stuck
deployments that fall into none of the above categories will not get cleaned
up. Since cleanup is already safe by itself (we only delete old replica sets
that are synced by the replica set controller and have no replicas) we can
execute it for every deployment when there is no intention to rollback.
Automatic merge from submit-queue
Updated unit tests
@janetkuo updated the flaky unit test to have the same structure with regard to uncasting as the rest of the tests. ptal
Automatic merge from submit-queue (batch tested with PRs 39807, 37505, 39844, 39525, 39109)
Update deployment equality helper
@mfojtik @janetkuo this is split out of https://github.com/kubernetes/kubernetes/pull/38714 to reduce the size of that PR, ptal
Automatic merge from submit-queue
Curating Owners: pkg/controller
cc @jsafrane @mikedanese @bprashanth @derekwaynecarr @thockin @saad-ali
In an effort to expand the existing pool of reviewers and establish a
two-tiered review process (first someone **lgtms** and then someone
experienced in the project **approves**), we are adding new reviewers to
existing owners files.
## If You Care About the Process:
We did this by algorithmically figuring out who’s contributed code to
the project and in what directories. Unfortunately, that doesn’t work
perfectly: people that have made mechanical code changes (e.g change the
copyright header across all directories) end up as reviewers in lots of
places.
Instead of using pure commit data, we generated an excessively large
list of reviewers and pruned based on all time commit data, recent
commit data and review data (number of PRs commented on).
At this point we have a decent list of reviewers, but it needs one last
pass for fine tuning.
## TLDR:
As an owner of a sig/directory and a leader of the project, here’s what
we need from you:
1. Use PR https://github.com/kubernetes/kubernetes/pull/35715 as an example.
2. The pull-request is made editable, please edit the OWNERS file to add
the names of people that should be reviewing code in the future in the **reviewers** section. You probably do NOT need to modify the **approvers** section.
3. Notify me if you want some OWNERS file to be removed. Being an approver or reviewer
of a parent directory makes you a reviewer/approver of the subdirectories too, so not all
OWNERS files may be necessary.
4. Please use ALIAS if you want to use the same list of people over and
over again (don't hesitate to ask me for help, or use the pull-request
above as an example)
Automatic merge from submit-queue (batch tested with PRs 38608, 38299)
controller: set unavailableReplicas correctly when scaling down
```
deployment_controller.go:299] Error syncing deployment
e2e-tests-kubectl-2l7xx/e2e-test-nginx-deployment:
Deployment.extensions "e2e-test-nginx-deployment" is invalid:
status.unavailableReplicas: Invalid value: -1:
must be greater than or equal to 0
```
The validation error above occurs usually when a Deployment is
scaled down. In such a case we should default unavailableReplicas
to 0 instead of making an invalid api call.
@kubernetes/deployment
Automatic merge from submit-queue (batch tested with PRs 38377, 36365, 36648, 37691, 38339)
controller: sync stuck deployments in a secondary queue
@kubernetes/deployment this makes Deployments not depend on a tight resync interval in order to estimate progress.
deployment_controller.go:299] Error syncing deployment
e2e-tests-kubectl-2l7xx/e2e-test-nginx-deployment:
Deployment.extensions "e2e-test-nginx-deployment" is invalid:
status.unavailableReplicas: Invalid value: -1:
must be greater than or equal to 0
The validation error above occurs usually when a Deployment is
scaled down. In such a case we should default unavailableReplicas
to 0 instead of making an invalid api call.
Automatic merge from submit-queue
Restore event messages for replica sets in the deployment controller
Needed to unblock release upgrade tests (see https://github.com/kubernetes/kubernetes/issues/36453)
@kubernetes/deployment ptal
Automatic merge from submit-queue
Update how we detect overlapping deployments
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#24152
**Special notes for your reviewer**: cc @kubernetes/deployment
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
NONE
```
When looking for overlapping deployments, we should also find other deployments that select current deployment's pods,
not just the ones whose pods are selected by current deployment.
This commit adds support for failing deployments based on a timeout
parameter defined in the spec. If there is no progress for the amount
of time defined as progressDeadlineSeconds then the deployment will be
marked as failed by adding a condition with a ProgressDeadlineExceeded
reason in it. Progress in the context of a deployment means the creation
or adoption of a new replica set, scaling up new pods, and scaling down
old pods.
When looking for overlapping deployments, we should also find other deployments that select current deployment's pods,
not just the ones whose pods are selected by current deployment.
Automatic merge from submit-queue
Let release_1_5 clientset include multiple versions of a group
Fix#35237
This PR make versioned clientset to include multiple versions of a group. Currently only `batch` has `v1` and `v2alpha1`. The clientset interface now looks like:
```go
BatchV2alpha1() v2alpha1batch.BatchV2alpha1Interface
BatchV1() v1batch.BatchV1Interface
// Deprecated: please explicitly pick a version if possible.
Batch() v1batch.BatchV1Interface
```
Commit "update client-gen to say internalversion rather than unversioned" fixes https://github.com/kubernetes/kubernetes/issues/24481.
cc @kubernetes/sig-api-machinery @soltysh @deads2k @nikhiljindal
```release-note
release_1_5 clientset supports multiple versions of a group.
```