Automatic merge from submit-queue (batch tested with PRs 51031, 51705, 51888, 51727, 51684). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Address panic in TestCancelAndReadd
This further increases the timeouts in `timed_workers_test` to reduce the chance of a race condition against the wait group.
Signed-off-by: Monis Khan <mkhan@redhat.com>
Fixes#51704
/assign @gmarek
cc @davidopp @timothysc
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
should use time.Since instead of time.Now().Sub
**What this PR does / why we need it**:
should use time.Since instead of time.Now().Sub
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
NONE
**Special notes for your reviewer**:
NONE
**Release note**:
```release-note
```
NONE
Automatic merge from submit-queue (batch tested with PRs 52679, 52285). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Improve cloud-cidr-allocator's performance
Fixes https://github.com/kubernetes/kubernetes/issues/52284
This makes the changes I suggested on that issue. Also it makes the cloud cidr allocator more similar to range allocator.
cc @kubernetes/sig-network-pr-reviews @kubernetes/sig-scalability-misc @wojtek-t @bowei
Automatic merge from submit-queue (batch tested with PRs 51081, 52725). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
daemon_controller: fix typo.
**What this PR does / why we need it**:
I found a small typo while implementing #48841.
With the existing code, some edge cases might lead to the wrong pods
being deleted.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 52176, 43152). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Eliminate hangs/throttling of node heartbeat
Fixes https://github.com/kubernetes/kubernetes/issues/48638Fixes#50304
Stops kubelet from wedging when updating node status if unable to establish tcp connection.
Notes that this only affects the node status loop. The pod sync loop would still hang until the dead TCP connections timed out, so more work is needed to keep the sync loop responsive in the face of network issues, but this change lets existing pods coast without the node controller trying to evict them
```release-note
kubelet to master communication when doing node status updates now has a timeout to prevent indefinite hangs
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Changes the node cloud controller to use its name for events
**What this PR does / why we need it**:
Updates the event recorder component to be the `cloud-node-controller` instead of `cloudcontrollermanager`, which aligns with how other controllers are setup like the daemonset controller which uses `daemonset-controller` and the ccm uses `cloud-controller-manager`.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
Use `cloud-node-controller` for cloud node controller events
```
/cc @luxas @wlan0 @jhorwit2
Automatic merge from submit-queue (batch tested with PRs 51796, 52223)
Fix pod and node names switched around in error message.
**What this PR does / why we need it**: This PR fixes a pod name and a node name switched around in an error message from the daemon controller.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: No issue that I know of
**Special notes for your reviewer**: -
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 52442, 52247, 46542, 52363, 51781)
Ignore pods for quota marked for deletion whose node is unreachable
**What this PR does / why we need it**:
Traditionally, we charge to quota all pods that are in a non-terminal phase. We have a user report that noted the behavior change in kube 1.5 for the node controller to no longer force delete pods whose nodes have been lost. Instead, the pod is marked for deletion, and the reason is updated to state that the node is unreachable. The user expected the quota to be released. If the user was at their quota limit, their application may not be able to create a new replica given the current behavior. As a result, this PR ignores pods marked for deletion that have exceeded their grace period.
**Which issue this PR fixes**
xref https://bugzilla.redhat.com/show_bug.cgi?id=1455743
fixes https://github.com/kubernetes/kubernetes/issues/52436
**Release note**:
```release-note
Ignore pods marked for deletion that exceed their grace period in ResourceQuota
```
Automatic merge from submit-queue
Fix swallowed errors in various volume packages
**What this PR does / why we need it**: Fixes swallowed errors in various volume packages.
**Release note**:
```release-note NONE
```
Automatic merge from submit-queue (batch tested with PRs 52007, 52196, 52169, 52263, 52291)
Remove links to GCE/AWS cloud providers from PersistentVolumeCo…
…ntroller
**What this PR does / why we need it**:
We should be able to build a cloud-controller-manager without having to
pull in code specific to GCE and AWS clouds. Note that this is a tactical
fix for now, we should have allow PVLabeler to be passed into the
PersistentVolumeController, maybe come up with better interfaces etc. Since
it is too late to do all that for 1.8, we just move cloud specific code
to where they belong and we check for PVLabeler method and use it where
needed.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Fixes#51629
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
We should be able to build a cloud-controller-manager without having to
pull in code specific to GCE and AWS clouds. Note that this is a tactical
fix for now, we should have allow PVLabeler to be passed into the
PersistentVolumeController, maybe come up with better interfaces etc. Since
it is too late to do all that for 1.8, we just move cloud specific code
to where they belong and we check for PVLabeler method and use it where
needed.
Fixes#51629
If the previous condition has been a successful rollout then we
shouldn't try to estimate any progress. Scenario:
* progressDeadlineSeconds is smaller than the difference between
now and the time the last rollout finished in the past.
* the creation of a new ReplicaSet triggers a resync of the
Deployment prior to the cached copy of the Deployment getting
updated with the status.condition that indicates the creation
of the new ReplicaSet.
The Deployment will be resynced and eventually its Progressing
condition will catch up with the state of the world.
Signed-off-by: Michail Kargakis <mkargaki@redhat.com>
Automatic merge from submit-queue (batch tested with PRs 52091, 52071)
Bugfix: Improve how JobController use queue for backoff
**What this PR does / why we need it**:
In some cases, the backoff delay for a given Job is reset unnecessarily.
the PR improves how JobController uses queue for backoff:
- Centralize the key "forget" and "re-queue" process in only on method.
- Change the signature of the syncJob method in order to return the
information if it is necessary to forget the backoff delay for a given
key.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Links to #51153
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 48552, 51876)
Disable default paging in list watches
For 1.8 this will be off by default. In 1.9 it will be on by default.
Add tests and rename some fields to use the `chunking` terminology.
Note that the pager may be used for other things besides chunking.
Follow on to #48921, we left the field on to get some exercise in the normal code paths, but needs to be disabled for 1.8.
@liggitt let's merge on wednesday.
Automatic merge from submit-queue (batch tested with PRs 52097, 52054)
Move paused deployment e2e tests to integration
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: xref #52113
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Centralize the key "forget" and "requeue" process in only on method.
Change the signature of the syncJob method in order to return the
information if it is necessary to forget the backoff delay for a given
key.
For 1.8 this will be off by default. In 1.9 it will be on by default.
Add tests and rename some fields to use the `chunking` terminology.
Note that the pager may be used for other things besides chunking.
Automatic merge from submit-queue (batch tested with PRs 51956, 50708)
Move autoscaling/v2 from alpha1 to beta1
This graduates autoscaling/v2alpha1 to autoscaling/v2beta1. The move is more-or-less just a straightforward rename.
Part of kubernetes/features#117
```release-note
v2 of the autoscaling API group, including improvements to the HorizontalPodAutoscaler, has moved from alpha1 to beta1.
```
Automatic merge from submit-queue (batch tested with PRs 51603, 51653)
Graduate metrics/v1alpha1 to v1beta1
This introduces v1beta1 of the resource metrics API, previously in alpha.
The v1alpha1 version remains for compatibility with the Heapster legacy version
of the resource metrics API, which is compatible with the v1alpha1 version. It also
renames the v1beta1 version to `resource-metrics.metrics.k8s.io`.
The HPA controller's REST clients (but not the legacy client) have been migrated as well.
Part of kubernetes/features#118.
```release-note
Migrate the metrics/v1alpha1 API to metrics/v1beta1. The HorizontalPodAutoscaler
controller REST client now uses that version. For v1beta1, the API is now known as
resource-metrics.metrics.k8s.io.
```
Automatic merge from submit-queue (batch tested with PRs 51603, 51653)
fix taint controller panic
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#51586
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Enable batch/v1beta1.CronJobs by default
This PR re-applies the cronjobs->beta back (https://github.com/kubernetes/kubernetes/pull/51720) with the fix from @shyamjvs.
Fixes#51692
@apelisse @dchen1107 @smarterclayton ptal
@janetkuo @erictune fyi
Job failure policy integration in JobController. From the
JobSpec.BackoffLimit the JobController will define the backoff
duration between Job retry.
It use the ```workqueue.RateLimitingInterface``` to store the number of
"retry" as "requeue" and the default Job backoff initial duration is set
during the initialization of the ```workqueue.RateLimiter.
Since the number of retry for each job is store in a local structure
"JobController.queue" if the JobController restarts the number of retries
will be lost and the backoff duration will be reset to 0.
Add e2e test for Job backoff failure policy
Automatic merge from submit-queue (batch tested with PRs 50602, 51561, 51703, 51748, 49142)
Slow-start batch pod creation of rs, rc, ds, jobs
Prevent too-large replicas from generating enormous numbers
of events by creating only a few pods at a time, then increasing
the batch size when pod creations succeed. Stop creating batches
of pods when any pod creation errors are encountered.
Todo:
- [x] Add automated tests
- [x] Test ds
Fixes https://github.com/kubernetes/kubernetes/issues/49145
**Release note**:
```release-note
controllers backoff better in face of quota denial
```
Automatic merge from submit-queue (batch tested with PRs 51666, 49829, 51058, 51004, 50938)
Add test items for job utils
**What this PR does / why we need it**:
Add test item for job util
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
NONE
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51666, 49829, 51058, 51004, 50938)
Fixed integer overflow when matching PVPVC claims
Fixes#49911
Fixed integer overflow when matching PVPVC claims. Added test to guard this behavior.
Automatic merge from submit-queue (batch tested with PRs 51553, 51538, 51663, 51069, 51737)
Consistent Names for ControllerRevisions, ReplicaSets, and objects using GenerateName
**What this PR does / why we need it**:
Adds the rand.SafeEncodeString function and uses this function to generate names for ReplicaSets and ControllerRevisions.
```release-note
The names generated for ControllerRevision and ReplicaSet are consistent with the GenerateName functionality of the API Server and will not contain "bad words".
```
Automatic merge from submit-queue (batch tested with PRs 51583, 51283, 51374, 51690, 51716)
Add IPAM controller for synchronizing node pod CIDR range allocations between the cluster and the cloud (alpha feature)
```release-note
IPAM controller unifies handling of node pod CIDR range allocation.
It is intended to supersede the logic that is currently in range_allocator
and cloud_cidr_allocator. (ALPHA FEATURE)
Note: for this change, the other allocators still exist and are the default.
It supports two modes:
* CIDR range allocations done within the cluster that are then propagated out to the cloud provider.
* Cloud provider managed IPAM that is then reflected into the cluster.
```
Fixes https://github.com/kubernetes/kubernetes/issues/51826
Automatic merge from submit-queue (batch tested with PRs 51335, 51364, 51130, 48075, 50920)
Graduate custom metrics API to v1beta1
This graduates custom-metrics.metrics.k8s.io/v1alpha1 to custom-metrics.metrics.k8s.io/v1beta1. The move is more-or-less just a straightforward rename.
Part of kubernetes/features#117 and kubernetes/features#118
```release-note
the custom metrics API (custom-metrics.metrics.k8s.io) has moved from v1alpha1 to v1beta1
```
Automatic merge from submit-queue
update deprecated interface and fix bug not return when list pod failed in cronjob_controller.go
**What this PR does / why we need it**:
remove some unused redundant code, and fix bug: when list pod failed,
job still deleted but pod may still exist in func `deleteJob`
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```