Automatic merge from submit-queue (batch tested with PRs 52176, 43152). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Eliminate hangs/throttling of node heartbeat
Fixes https://github.com/kubernetes/kubernetes/issues/48638Fixes#50304
Stops kubelet from wedging when updating node status if unable to establish tcp connection.
Notes that this only affects the node status loop. The pod sync loop would still hang until the dead TCP connections timed out, so more work is needed to keep the sync loop responsive in the face of network issues, but this change lets existing pods coast without the node controller trying to evict them
```release-note
kubelet to master communication when doing node status updates now has a timeout to prevent indefinite hangs
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Changes the node cloud controller to use its name for events
**What this PR does / why we need it**:
Updates the event recorder component to be the `cloud-node-controller` instead of `cloudcontrollermanager`, which aligns with how other controllers are setup like the daemonset controller which uses `daemonset-controller` and the ccm uses `cloud-controller-manager`.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
Use `cloud-node-controller` for cloud node controller events
```
/cc @luxas @wlan0 @jhorwit2
Automatic merge from submit-queue (batch tested with PRs 51796, 52223)
Fix pod and node names switched around in error message.
**What this PR does / why we need it**: This PR fixes a pod name and a node name switched around in an error message from the daemon controller.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: No issue that I know of
**Special notes for your reviewer**: -
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 52442, 52247, 46542, 52363, 51781)
Ignore pods for quota marked for deletion whose node is unreachable
**What this PR does / why we need it**:
Traditionally, we charge to quota all pods that are in a non-terminal phase. We have a user report that noted the behavior change in kube 1.5 for the node controller to no longer force delete pods whose nodes have been lost. Instead, the pod is marked for deletion, and the reason is updated to state that the node is unreachable. The user expected the quota to be released. If the user was at their quota limit, their application may not be able to create a new replica given the current behavior. As a result, this PR ignores pods marked for deletion that have exceeded their grace period.
**Which issue this PR fixes**
xref https://bugzilla.redhat.com/show_bug.cgi?id=1455743
fixes https://github.com/kubernetes/kubernetes/issues/52436
**Release note**:
```release-note
Ignore pods marked for deletion that exceed their grace period in ResourceQuota
```
Automatic merge from submit-queue
Fix swallowed errors in various volume packages
**What this PR does / why we need it**: Fixes swallowed errors in various volume packages.
**Release note**:
```release-note NONE
```
Fix#49256
When `ScalingLimited = true` for the `hpa`, there is an accompanying
status message, describing why scaling is limited. Previously if the desired
replica count was 0, and spec.minReplicas > 0, the status message
indicated "the desired replica count was less than the min replica
count". This was particularly confusing when `spec.MinReplicas = 1`. If
there was no `spec.minReplicas`, then the status message indicated "the
desired replica count was zero" which is more informative.
Update the calculation of status message so that if the desired replica
count is 0, we always display the clearer "the desired replica count was
zero" status message, even if spec.minReplicas > 0.
Signed-off-by: mattjmcnaughton <mattjmcnaughton@gmail.com>
Automatic merge from submit-queue (batch tested with PRs 52007, 52196, 52169, 52263, 52291)
Remove links to GCE/AWS cloud providers from PersistentVolumeCo…
…ntroller
**What this PR does / why we need it**:
We should be able to build a cloud-controller-manager without having to
pull in code specific to GCE and AWS clouds. Note that this is a tactical
fix for now, we should have allow PVLabeler to be passed into the
PersistentVolumeController, maybe come up with better interfaces etc. Since
it is too late to do all that for 1.8, we just move cloud specific code
to where they belong and we check for PVLabeler method and use it where
needed.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Fixes#51629
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Currently some of the imports of `apimachinery` use
`k8s.io/kubernetes/staging/src/k8s.io/apimachinery...`. Replace
these with `k8s.io/apimachinery`, as is in use throughout the rest
of the code base.
Signed-off-by: mattjmcnaughton <mattjmcnaughton@gmail.com>
Address `golint` errors in `pkg/controller/podautoscaler`. Note,
I did not address issues around exported types/functions missing
comments, because I'm not sure what the convention within the k8s project is.
Signed-off-by: mattjmcnaughton <mattjmcnaughton@gmail.com>
We should be able to build a cloud-controller-manager without having to
pull in code specific to GCE and AWS clouds. Note that this is a tactical
fix for now, we should have allow PVLabeler to be passed into the
PersistentVolumeController, maybe come up with better interfaces etc. Since
it is too late to do all that for 1.8, we just move cloud specific code
to where they belong and we check for PVLabeler method and use it where
needed.
Fixes#51629
If the previous condition has been a successful rollout then we
shouldn't try to estimate any progress. Scenario:
* progressDeadlineSeconds is smaller than the difference between
now and the time the last rollout finished in the past.
* the creation of a new ReplicaSet triggers a resync of the
Deployment prior to the cached copy of the Deployment getting
updated with the status.condition that indicates the creation
of the new ReplicaSet.
The Deployment will be resynced and eventually its Progressing
condition will catch up with the state of the world.
Signed-off-by: Michail Kargakis <mkargaki@redhat.com>
Automatic merge from submit-queue (batch tested with PRs 52091, 52071)
Bugfix: Improve how JobController use queue for backoff
**What this PR does / why we need it**:
In some cases, the backoff delay for a given Job is reset unnecessarily.
the PR improves how JobController uses queue for backoff:
- Centralize the key "forget" and "re-queue" process in only on method.
- Change the signature of the syncJob method in order to return the
information if it is necessary to forget the backoff delay for a given
key.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Links to #51153
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 48552, 51876)
Disable default paging in list watches
For 1.8 this will be off by default. In 1.9 it will be on by default.
Add tests and rename some fields to use the `chunking` terminology.
Note that the pager may be used for other things besides chunking.
Follow on to #48921, we left the field on to get some exercise in the normal code paths, but needs to be disabled for 1.8.
@liggitt let's merge on wednesday.
Automatic merge from submit-queue (batch tested with PRs 52097, 52054)
Move paused deployment e2e tests to integration
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: xref #52113
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Centralize the key "forget" and "requeue" process in only on method.
Change the signature of the syncJob method in order to return the
information if it is necessary to forget the backoff delay for a given
key.
For 1.8 this will be off by default. In 1.9 it will be on by default.
Add tests and rename some fields to use the `chunking` terminology.
Note that the pager may be used for other things besides chunking.
Automatic merge from submit-queue (batch tested with PRs 51956, 50708)
Move autoscaling/v2 from alpha1 to beta1
This graduates autoscaling/v2alpha1 to autoscaling/v2beta1. The move is more-or-less just a straightforward rename.
Part of kubernetes/features#117
```release-note
v2 of the autoscaling API group, including improvements to the HorizontalPodAutoscaler, has moved from alpha1 to beta1.
```
Automatic merge from submit-queue (batch tested with PRs 51603, 51653)
Graduate metrics/v1alpha1 to v1beta1
This introduces v1beta1 of the resource metrics API, previously in alpha.
The v1alpha1 version remains for compatibility with the Heapster legacy version
of the resource metrics API, which is compatible with the v1alpha1 version. It also
renames the v1beta1 version to `resource-metrics.metrics.k8s.io`.
The HPA controller's REST clients (but not the legacy client) have been migrated as well.
Part of kubernetes/features#118.
```release-note
Migrate the metrics/v1alpha1 API to metrics/v1beta1. The HorizontalPodAutoscaler
controller REST client now uses that version. For v1beta1, the API is now known as
resource-metrics.metrics.k8s.io.
```
Automatic merge from submit-queue (batch tested with PRs 51603, 51653)
fix taint controller panic
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#51586
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Enable batch/v1beta1.CronJobs by default
This PR re-applies the cronjobs->beta back (https://github.com/kubernetes/kubernetes/pull/51720) with the fix from @shyamjvs.
Fixes#51692
@apelisse @dchen1107 @smarterclayton ptal
@janetkuo @erictune fyi
Job failure policy integration in JobController. From the
JobSpec.BackoffLimit the JobController will define the backoff
duration between Job retry.
It use the ```workqueue.RateLimitingInterface``` to store the number of
"retry" as "requeue" and the default Job backoff initial duration is set
during the initialization of the ```workqueue.RateLimiter.
Since the number of retry for each job is store in a local structure
"JobController.queue" if the JobController restarts the number of retries
will be lost and the backoff duration will be reset to 0.
Add e2e test for Job backoff failure policy
Automatic merge from submit-queue (batch tested with PRs 50602, 51561, 51703, 51748, 49142)
Slow-start batch pod creation of rs, rc, ds, jobs
Prevent too-large replicas from generating enormous numbers
of events by creating only a few pods at a time, then increasing
the batch size when pod creations succeed. Stop creating batches
of pods when any pod creation errors are encountered.
Todo:
- [x] Add automated tests
- [x] Test ds
Fixes https://github.com/kubernetes/kubernetes/issues/49145
**Release note**:
```release-note
controllers backoff better in face of quota denial
```
Automatic merge from submit-queue (batch tested with PRs 51666, 49829, 51058, 51004, 50938)
Add test items for job utils
**What this PR does / why we need it**:
Add test item for job util
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
NONE
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51666, 49829, 51058, 51004, 50938)
Fixed integer overflow when matching PVPVC claims
Fixes#49911
Fixed integer overflow when matching PVPVC claims. Added test to guard this behavior.
Automatic merge from submit-queue (batch tested with PRs 51553, 51538, 51663, 51069, 51737)
Consistent Names for ControllerRevisions, ReplicaSets, and objects using GenerateName
**What this PR does / why we need it**:
Adds the rand.SafeEncodeString function and uses this function to generate names for ReplicaSets and ControllerRevisions.
```release-note
The names generated for ControllerRevision and ReplicaSet are consistent with the GenerateName functionality of the API Server and will not contain "bad words".
```
Automatic merge from submit-queue (batch tested with PRs 51583, 51283, 51374, 51690, 51716)
Add IPAM controller for synchronizing node pod CIDR range allocations between the cluster and the cloud (alpha feature)
```release-note
IPAM controller unifies handling of node pod CIDR range allocation.
It is intended to supersede the logic that is currently in range_allocator
and cloud_cidr_allocator. (ALPHA FEATURE)
Note: for this change, the other allocators still exist and are the default.
It supports two modes:
* CIDR range allocations done within the cluster that are then propagated out to the cloud provider.
* Cloud provider managed IPAM that is then reflected into the cluster.
```
Fixes https://github.com/kubernetes/kubernetes/issues/51826
Automatic merge from submit-queue (batch tested with PRs 51335, 51364, 51130, 48075, 50920)
Graduate custom metrics API to v1beta1
This graduates custom-metrics.metrics.k8s.io/v1alpha1 to custom-metrics.metrics.k8s.io/v1beta1. The move is more-or-less just a straightforward rename.
Part of kubernetes/features#117 and kubernetes/features#118
```release-note
the custom metrics API (custom-metrics.metrics.k8s.io) has moved from v1alpha1 to v1beta1
```
Automatic merge from submit-queue
update deprecated interface and fix bug not return when list pod failed in cronjob_controller.go
**What this PR does / why we need it**:
remove some unused redundant code, and fix bug: when list pod failed,
job still deleted but pod may still exist in func `deleteJob`
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
IPAM controller unifies handling of node pod CIDR range allocation. It
is intended to supersede the logic that is currently in range_allocator
and cloud_cidr_allocator.
Note: for this change, the other allocators still exist and are the
default.
It supports two modes:
* CIDR range allocations done within the cluster that are then
propagated out to the cloud provider.
* Cloud provider managed IPAM that is then reflected into the cluster.
Prevent too-large replicas from generating enormous numbers
of events by creating only a few pods at a time, then increasing
the batch size when pod creations succeed. Stop creating batches
of pods when any pod creation errors are encountered.
Add a feature gate in the apiserver to control whether paging can be
used. Add controls to the storage factory that allow it to be disabled
per resource. Use a JSON encoded continuation token that can be
versioned. Create a 410 error if the continuation token is expired.
Adds GetContinue() to ListMeta.
Automatic merge from submit-queue
Use json-iterator instead of ugorji for JSON.
@smarterclayton @wojtek-t
Fixes#36120
xref #18762
```release-note
Switch JSON marshal/unmarshal to json-iterator library. Performance should be close to previous with no generated code.
```
Automatic merge from submit-queue (batch tested with PRs 51628, 51637, 51490, 51279, 51302)
Ensure that DaemonSet respects termination
**What this PR does / why we need it**:
#43077 correctly prevents the DaemonSet controller from adopting deleted Pods, but, as pointed out in #50477, the controller now has no sensitivity to the termination lifecycle (i.e TerminationGracePeriodSeconds) of the Pods it creates. This PR attempts to balance the two. DaemonSet controller will now consider deleted Pods owned by a DaemonSet during creation, but it will not consider deleted Pods as targets for adoption.
fixes#50477
```release-note
#43077 introduced a condition where DaemonSet controller did not respect the TerminationGracePeriodSeconds of the Pods it created. This is now corrected.
```
Automatic merge from submit-queue (batch tested with PRs 51574, 51534, 49257, 44680, 48836)
Add a persistent volume label controller to the cloud-controller-manager
Part of https://github.com/kubernetes/features/issues/88
Outstanding concerns needing input:
- [x] Why 5 threads for controller processing?
- [x] Remove direct linkage to aws/gce cloud providers [#51629]
- [x] Modify shared informers to allow added event handlers ability to include uninitialized objects/using unshared informer #48893
- [x] Use cache.MetaNamespaceKeyFunc in event handler?
I'm willing to work on addressing the removal of the direct linkage to aws/gce after this PR gets in.
Automatic merge from submit-queue (batch tested with PRs 51574, 51534, 49257, 44680, 48836)
Task 1: Tainted node by condition.
**What this PR does / why we need it**:
Tainted node by condition for MemoryPressure, OutOfDisk and so on.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: part of #42001
**Release note**:
```release-note
Tainted nodes by conditions as following:
* 'node.kubernetes.io/network-unavailable=:NoSchedule' if NetworkUnavailable is true
* 'node.kubernetes.io/disk-pressure=:NoSchedule' if DiskPressure is true
* 'node.kubernetes.io/memory-pressure=:NoSchedule' if MemoryPressure is true
* 'node.kubernetes.io/out-of-disk=:NoSchedule' if OutOfDisk is true
```
Automatic merge from submit-queue (batch tested with PRs 51480, 49616, 50123, 50846, 50404)
Use 'Infof' instead of 'Errorf' for a debug log
Outputing error log for a debug is confused.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51707, 51662, 51723, 50163, 51633)
update GC controller to wait until controllers have been initialized …
fixes#51013
Alternative to https://github.com/kubernetes/kubernetes/pull/51492 which keeps those few controllers (only one) from starting the informers early.
This further increases the timeouts in timed_workers_test to reduce
the chance of a race condition against the wait group.
Signed-off-by: Monis Khan <mkhan@redhat.com>
Automatic merge from submit-queue (batch tested with PRs 50775, 51397, 51168, 51465, 51536)
Enable batch/v1beta1.CronJobs by default
This PR moves to CronJobs beta entirely, enabling `batch/v1beta1` by default.
Related issue: #41039
@erictune @janetkuo ptal
```release-note
Promote CronJobs to batch/v1beta1.
```
Automatic merge from submit-queue
Add storageClass.mountOptions and use it in all applicable plugins
split off from https://github.com/kubernetes/kubernetes/pull/50919 and still dependent on it. cc @gnufied
issue: https://github.com/kubernetes/features/issues/168
```release-note
Add mount options field to StorageClass. The options listed there are automatically added to PVs provisioned using the class.
```
Automatic merge from submit-queue (batch tested with PRs 44719, 48454)
check job ActiveDeadlineSeconds
**What this PR does / why we need it**:
enqueue a sync task after ActiveDeadlineSeconds
**Which issue this PR fixes** *:
fixes#32149
**Special notes for your reviewer**:
**Release note**:
```release-note
enqueue a sync task to wake up jobcontroller to check job ActiveDeadlineSeconds in time
```
Automatic merge from submit-queue (batch tested with PRs 44719, 48454)
Fix handling of APIserver errors when saving provisioned PVs.
When API server crashes *after* saving a provisioned PV and before sending
200 OK, the controller tries to save the PV again. In this case, it gets
AlreadyExists error, which should be interpreted as success and not as error.
Especially, a volume that corresponds to the PV should not be deleted in the
underlying storage.
Fixes#44372
```release-note
NONE
```
@kubernetes/sig-storage-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 50919, 51410, 50099, 51300, 50296)
Remove failure check from deployment controller
@kubernetes/sig-apps-pr-reviews this check is useless w/o automatic rollback so I am removing it.
When API server crashes *after* saving a provisioned PV and before sending
200 OK, the controller tries to save the PV again. In this case, it gets
AlreadyExists error, which should be interpreted as success and not as error.
Especially, a volume that corresponds to the PV should not be deleted in the
underlying storage.
Automatic merge from submit-queue (batch tested with PRs 51441, 51356, 51460)
Don't update pvc.status.capacity if pvc is already Bound
As discussed here https://github.com/kubernetes/community/pull/657#discussion_r128008128, in order for `pvc.status.Capacity < pv.Spec.Capcity` to be the mechanism for volume filesystem* resize, the pv controller should stop updating pvc.status.Capacity every resync period.
/assign @jsafrane
/sig storage
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51441, 51356, 51460)
fix the bad position of code comment
**What this PR does / why we need it**:
The position of code comment is wrong and move it to the right position
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```
NONE
```
Automatic merge from submit-queue
Add volume operation metrics to operation executor and PV controller
This PR implements the proposal for high level volume metrics https://github.com/kubernetes/community/pull/809
**Special notes for your reviewer**:
~Differences from proposal:~ all resolved
~"verify_volume" is now "verify_volumes_are_attached" + "verify_volumes_are_attached_per_node" + "verify_controller_attached_volume." Which of them do we want?~
~There is no "mount_device" metric because the MountVolume operation combines MountDevice and mount (plugin.Setup). Do we want to extract the mount_device metric or is it okay to keep mountvolume as one? For attachable volumes, MountDevice is the actual mount and Setup is a bindmount + setvolumeownership. For unattachable, mountDevice does not occur and Setup is an actual mount + setvolumeownership.~
~PV controller metrics I did not implement following the proposal at all. I did not change goroutinemap nor scheduleOperation. Because provisionClaimOperation does not return an error, so it's impossible for the caller to know if there is actually a failure worth reporting. So I manually create a new metric inside the function according to some conditions.~
@gnufied
I have tested the operationexecutor metrics but not provision & delete. Sample:
![screen shot 2017-08-02 at 15 01 08](https://user-images.githubusercontent.com/13111288/28889980-a7093526-7793-11e7-9aa9-ad7158be76fa.png)
**Release note**:
```release-note
Add error count and time-taken metrics for storage operations such as mount and attach, per-volume-plugin.
```
Automatic merge from submit-queue
simplify disruption controller finder logic
**What this PR does / why we need it**:
Address some comments from https://github.com/kubernetes/kubernetes/pull/45003 and simplify the PDB controller logic as part of issue https://github.com/kubernetes/kubernetes/issues/42284
@enisoc @kargakis @caesarxuchao
Also it feels like we can get rid of the finders all together since with controller ref, each pod has only controller. Let me know if i should remove that finders all together ?
This change is prerequisite for implementing iSCSI attacher
and detacher.
In order to use chap authentication at iSCSI plugin after
implementing attacher and detacher, secret is needed at
AttachDisk() which is called from WaitForAttach().
To obtain secret, pod information is required, but
WaitForAttach() doesn't pass pod information inside.
This patch adds 'pod' as an argument of WaitForAttach()
and adds changes to drivers who implements WaitForAttach().
Fixes#48953
Automatic merge from submit-queue (batch tested with PRs 51391, 51338, 51340, 50773, 49599)
add an starting info log of namespace controller.
**What this PR does / why we need it**:
add an starting info log of namespace controller.
**Release note**:
NA
Automatic merge from submit-queue (batch tested with PRs 51174, 51363, 51087, 51382, 51388)
Add InstanceExistsByProviderID to cloud provider interface for CCM
**What this PR does / why we need it**:
Currently, [`MonitorNode()`](02b520f0a4/pkg/controller/cloud/nodecontroller.go (L240)) in the node controller checks with the CCM if a node still exists by calling `ExternalID(nodeName)`. `ExternalID` is supposed to return the provider id of a node which is not supported on every cloud. This means that any clouds who cannot infer the provider id by the node name from a remote location will never remove nodes that no longer exist.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#50985
**Special notes for your reviewer**:
We'll want to create a subsequent issue to track the implementation of these two new methods in the cloud providers.
**Release note**:
```release-note
Adds `InstanceExists` and `InstanceExistsByProviderID` to cloud provider interface for the cloud controller manager
```
/cc @wlan0 @thockin @andrewsykim @luxas @jhorwit2
/area cloudprovider
/sig cluster-lifecycle
Automatic merge from submit-queue (batch tested with PRs 51054, 51101, 50031, 51296, 51173)
Dynamic Flexvolume plugin discovery, probing with filesystem watch.
**What this PR does / why we need it**: Enables dynamic Flexvolume plugin discovery. This model uses a filesystem watch (fsnotify library), which notifies the system that a probe is necessary only if something changes in the Flexvolume plugin directory.
This PR uses the dependency injection model in https://github.com/kubernetes/kubernetes/pull/49668.
**Release Note**:
```release-note
Dynamic Flexvolume plugin discovery. Flexvolume plugins can now be discovered on the fly rather than only at system initialization time.
```
/sig-storage
/assign @jsafrane @saad-ali
/cc @bassam @chakri-nelluri @kokhang @liggitt @thockin
Automatic merge from submit-queue (batch tested with PRs 49850, 47782, 50595, 50730, 51341)
Cloud Controller Manager now sets Node.Spec.ProviderID
**What this PR does / why we need it**:
Cloud Controller Manager now sets `Node.Spec.ProviderID`.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
https://github.com/kubernetes/kubernetes/issues/49836.
**Special notes for your reviewer**:
* As part of an effort to move cloud controller manager into beta https://github.com/kubernetes/kubernetes/issues/48690.
Automatic merge from submit-queue (batch tested with PRs 49850, 47782, 50595, 50730, 51341)
NodeConditionPredicates should return NodeOutOfDisk error.
**What this PR does / why we need it**:
In https://github.com/kubernetes/kubernetes/pull/49932 , I moved node condition check into a predicates; but it return incorrect error :(.
We also need to add more cases to `TestNodeShouldRunDaemonPod` which is key function of DaemonSet.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#50594
**Release note**:
```release-note
None
```
Automatic merge from submit-queue (batch tested with PRs 50033, 49988, 51132, 49674, 51207)
StatefulSet kubectl rollout command
**What this PR does / why we need it**: This PR implements StatefulSet kubectl rollout command, covering `history`, `status`, and `undo`.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#49890
**Special notes for your reviewer**:
**Release note**:
```release-note
kubectl rollout `history`, `status`, and `undo` subcommands now support StatefulSets.
```
Automatic merge from submit-queue (batch tested with PRs 50213, 50707, 49502, 51230, 50848)
Fix comment of cronjob utils.go
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/kubernetes/kubernetes/issues/50951
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51108, 51035, 50539, 51160, 50947)
Delete load balancers if the UIDs for services don't match.
An attempt to fix https://github.com/kubernetes/kubernetes/issues/43730
@thockin @djsly
Automatic merge from submit-queue
Allow attach of volumes to multiple nodes for vSphere
This is a fix for issue #50944 which doesn't allow a volume to be attached to a new node after the node is powered off where the volume was previously attached.
Current behaviour:
One of the cluster worker nodes was powered off in vCenter.
Pods running on this node have been rescheduled on different nodes but got stuck in ContainerCreating. It failed to attach the volume on the new node with error "Multi-Attach error for volume pvc-xxx, Volume is already exclusively attached to one node and can't be attached to another" and hence the application running in the pod has no data available because the volume is not attached to the new node. Since the volume is still attached to powered off node, any attempt to attach the volume on the new node failed with error "Multi-Attach error". It's stuck for 6 minutes until attach/detach controller forcefully tried to detach the volume on the powered off node. After the end of 6 minutes when volume is detached on powered off node, the volume is now successfully attached on the new node and application has now the data available.
What is expected to happen:
I would want the attach/detach controller to go ahead with the attach of the volume on new node where the pod got provisioned instead of waiting for the volume to be detached on the powered off node. It is ok to eventually delete the volume on the powered off node after 6 minutes. This way the application downtime is low and pods are up as soon as possible.
The current fix ignore, vSphere volumes/persistent volume to check for multi-attach scenario in attach/detach controller.
@jingxu97 @saad-ali : Can you please take a look at it.
@tusharnt @divyenpatel @rohitjogvmw @luomiao
```release-note
Allow attach of volumes to multiple nodes for vSphere
```
Automatic merge from submit-queue (batch tested with PRs 38947, 50239, 51115, 51094, 51116)
Fix comment and typos in node_controller
**What this PR does / why we need it**:
1. fix comment to more accurately
2. fix typos
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50980, 46902, 51051, 51062, 51020)
fix confusion in service_controller
**What this PR does / why we need it**:
Fix code and comment confusion in `service_controller`.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#51009
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 50980, 46902, 51051, 51062, 51020)
Fix swallowed errors in statefulset tests
**What this PR does / why we need it**: Fixes errors that were being swallowed in the tests of the statefulset package.
```release-note NONE
```
Automatic merge from submit-queue (batch tested with PRs 50806, 48789, 49922, 49935, 50438)
On AttachDetachController node status update, do not retry when node …
…doesn't exist but keep the node entry in cache.
**What this PR does / why we need it**: An alternative fix for https://github.com/kubernetes/kubernetes/issues/42438 which also fixes#50721.
Instead of removing the node entry entirely from the node status update cache (which prevents the node from ever being updated even when it recovers), here the node status updater does nothing, so that there won't be an update retry until the node is re-added, where the cache entry is set to true.
Will cherry pick to prior versions after this is merged.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#50721
**Release Note**:
``` release-note
On AttachDetachController node status update, do not retry when node doesn't exist but keep the node entry in cache.
```
/assign @jingxu97
/cc @saad-ali
/sig storage
/release-note
A pod status change of unready -> ready results in a move from
the endpoint's unready endpoint addresses to its ready addresses
so if a pod update contains an unready -> ready status change,
the endpoint needs to be updated.
Automatic merge from submit-queue (batch tested with PRs 51102, 50712, 51037, 51044, 51059)
fix#51043
**What this PR does / why we need it**: The StatefulSet controller no longer attempts to mutate "hostname" or "subdomain" fields of the "pod.spec" to enforce the network identity of Pods in a StatefeulSet. Since these fields are set upon creation and immutable thereafter setting the annotations is no longer necessary.
fixes: #51043
Automatic merge from submit-queue (batch tested with PRs 46458, 50934, 50766, 50970, 47698)
Skip non-update endpoint updates
**What this PR does / why we need it**:
On large clusters, a large percentage of endpoint updates are actually non-updates that occur as a result of a change in an associated pod. This results in endpoint updates where the only field that has changed is the `TargetRef.ResourceVersion` in the endpoint address associated with the changed pod. Given enough of these non-updates, the endpoint controller's queue rate limit can be overwhelmed and legitimate updates can be delayed, resulting in (temporarily) broken services. We have clusters where we've seen endpoint updates take 9 minutes.
**Which issue this PR fixes** : fixes#50936
**Special notes for your reviewer**:
N/A
**Release note**:
```release-note
Prevent unneeded endpoint updates
```
Automatic merge from submit-queue (batch tested with PRs 46458, 50934, 50766, 50970, 47698)
Prepare VolumeHost for running mount tools in containers
This is the first part of implementation of https://github.com/kubernetes/features/issues/278 - running mount utilities in containers.
It updates `VolumeHost` interface:
* `GetMounter()` now requires volume plugin name, as it is going to return different mounter to different volume plugings, because mount utilities for these plugins can be on different places.
* New `GetExec()` method that should volume plugins use to execute any utilities. This new `Exec` interface will execute them on proper place.
* `SafeFormatAndMount` is updated to the new `Exec` interface.
This is just a preparation, `GetExec` right now leads to simple `os.Exec` and mount utilities are executed on the same place as before. Also, the volume plugins will be updated in subsequent PRs (split into separate PRs, some plugins required lot of changes).
```release-note
NONE
```
@kubernetes/sig-storage-pr-reviews
@rootfs @gnufied
Automatic merge from submit-queue (batch tested with PRs 46512, 50146)
Make metav1.(Micro)?Time functions take pointers
Is there any reason for those functions not to be on pointers?
Automatic merge from submit-queue
Use CollisionCount for collision avoidance in StatefulSet controller
**What this PR does / why we need it**:
This PR uses the newly added `CollisionCount` in `StatefulSetStatus` for name collision avoidance when the `StatefulSet` controller creates `ControllerRevision`s. The `CreateControllerRevision` method of the `ControllerHistory` interface was augmented to use a user-specified `collisionCount` instead of the internal probe that always starts with 0. The `StatefulSet` controller uses the `CreateControllerRevision` to create `ControllerRevision`s and when it calls it, it passes in the `CollisionCount` of the `StatefulSet` it manipulates.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#49909.
**Special notes for your reviewer**:
/assign @kow3ns
**Release note**:
```release-note
Use CollisionCount for collision avoidance when creating ControllerRevisions in StatefulSet controller
```
Automatic merge from submit-queue
Add enj as reviewer to OWNERS
Adding myself as a reviewer for the following areas:
- API
- auth
- registry
- storage (etcd)
Signed-off-by: Monis Khan <mkhan@redhat.com>
**Release note**:
```release-note
NONE
```
@kubernetes/sig-api-machinery-pr-reviews
@kubernetes/sig-auth-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 50536, 50809, 50220, 50399, 50176)
Refactor statefulset test with sets.String
**What this PR does / why we need it**:
Delete redundant sort. These string slices only own one element.
There is no necessary to sort them.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 49869, 47987, 50211, 50804, 50583)
Make endpoints controller update based on semantic equality
Fixes#50828
Split from https://github.com/kubernetes/kubernetes/pull/45294 for separate review
Currently, endpoints objects containing no subsets are decoded by the go client as subsets:[] (when requested individually) or as subsets:null (when requested in a list of endpoints).
Because the endpoints controller is fed via a lister/watcher, it gets the `subsets:null` version fed to it. The subsets computation then returns an empty slice, which fails reflect.DeepEqual, which triggers a write attempt.
This PR makes the comparison use semantic.DeepEqual to avoid spurious writes.
https://github.com/kubernetes/kubernetes/pull/45294 would remove the inconsistency between lists and individual gets.
Automatic merge from submit-queue (batch tested with PRs 49869, 47987, 50211, 50804, 50583)
Add ReclaimPolicy field to StorageClass
fix https://github.com/kubernetes/kubernetes/issues/38192, enough people want this imo so going ahead and adding it according to initial suggested design
some considerations:
* No Recycle allowed, Retain (& Delete) only.
* Do we need to gate the field.
* E2E test where a Retain PV is dynamically provisioned is TODO if we agree we want this & this is the way to do it.
* Need a feature repo issue to track docs and stuff for 1.8
**Release note**:
```release-note
StorageClass has a new field to configure reclaim policy of dynamically provisioned PVs.
```
Automatic merge from submit-queue (batch tested with PRs 46317, 48922, 50651, 50230, 47599)
fix the typo of errorf info
**What this PR does / why we need it**:
fix the error message of stateful_pod_control_test.go
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50589, 50558)
remove useless comments
**What this PR does / why we need it**:
remove useless comments
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#49103
**Release note**:
```release-note
None
```
Automatic merge from submit-queue
Fix error message of statefulset test
**What this PR does / why we need it**:
Fix error message of statefulset test
It should be 0 replocas in the error message.
And fix typo from Falied to Failed
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/kubernetes/kubernetes/issues/50592
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```