Automatic merge from submit-queue (batch tested with PRs 42177, 42176, 44721)
Job: Respect ControllerRef
**What this PR does / why we need it**:
This is part of the completion of the [ControllerRef](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/controller-ref.md) proposal. It brings Job into full compliance with ControllerRef. See the individual commit messages for details.
**Which issue this PR fixes**:
This ensures that Job does not fight with other controllers over control of Pods.
Ref: #24433
**Special notes for your reviewer**:
**Release note**:
```release-note
Job controller now respects ControllerRef to avoid fighting over Pods.
```
cc @erictune @kubernetes/sig-apps-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 42177, 42176, 44721)
CronJob: Respect ControllerRef
**What this PR does / why we need it**:
This is part of the completion of the [ControllerRef](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/controller-ref.md) proposal. It brings CronJob into compliance with ControllerRef. See the individual commit messages for details.
**Which issue this PR fixes**:
This ensures that other controllers do not fight over control of objects that a CronJob owns.
**Special notes for your reviewer**:
**Release note**:
```release-note
CronJob controller now respects ControllerRef to avoid fighting with other controllers.
```
cc @erictune @kubernetes/sig-apps-pr-reviews
When the attach/detach controller crashes and a pod with attached PV is deleted
afterwards the controller will never detach the pod's attached volumes. To
prevent this the controller should try to recover the state from the nodes
status.
Automatic merge from submit-queue
Fixes an issue in cide_set.go
Function getBeginingAndEndIndices may return
end index too big
**What this PR does / why we need it**:
Fixes getBeginingAndEndIndices() in cidr_set.go
End index is off by one when s.clusterMaskSize >= maskSize
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#44558
**Special notes for your reviewer**:
**Release note**:
```release-note
```
This should only happen if the Jobs were created by an older version
of the CronJob controller, since from now on we add ControllerRef upon
creation.
CronJob doesn't do actual adoption because it doesn't use label
selectors to find its Jobs. However, we should apply ControllerRef
for potential server-side cascading deletion, and to advise other
controllers we own these objects.
Automatic merge from submit-queue (batch tested with PRs 44469, 44566, 44467, 44526)
WaitForCacheSync before running attachdetach controller
@gnufied you wrote the test and @ncdc the TODO comment. Let's just run the pv and pvc informers, we do not care about them in this test. But we want to be able to stop the pod Informer at will, hence not just using informers.Start, is my understanding.
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 40055, 42085, 44509, 44568, 43956)
Fix gofmt errors
**What this PR does / why we need it**:
There were some gofmt errors on master. Ran the following to fix:
```
hack/verify-gofmt.sh | grep ^diff | awk '{ print $2 }' | xargs gofmt -w -s
```
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: none
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 44362, 44421, 44468, 43878, 44480)
fix error message in ReplicaCalculator
**What this PR does / why we need it**: fixes spelling in an error message
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**: have previously signed the CLA for minikube, not sure if that covers this repo also.
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 44424, 44026, 43939, 44386, 42914)
ServiceAccountsController does not need nameIndexFunc to index ST
The ServiceAccountsController's Informer does not need nameIndexFunc.
Automatic merge from submit-queue (batch tested with PRs 44406, 41543, 44071, 44374, 44299)
Record a warning type event
A warning type event should be recorded when failed to calculate
the number of expected pods.
Automatic merge from submit-queue (batch tested with PRs 44447, 44456, 43277, 41779, 43942)
Clean up pre-ControllerRef compatibility logic
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#43323
**Special notes for your reviewer**:
No
**Release note**:
```
NONE
```
Automatic merge from submit-queue
Reduce replication_controller log spam
Decrease verbosity and reword 'Observed updated replication controller
...' now that the issue it was added for has been fixed.
This was originally added to debug #31981, and it was fixed back in September 2016.
cc @gmarek @wojtek-t @kargakis @eparis @smarterclayton
Automatic merge from submit-queue
Exit from NewController() for PersistentVolumeController when InitPlugins() failed
Exit from NewController() for PersistentVolumeController when InitPlugins() failed just like NewAttachDetachController() does
**Release note**:
```release-note
NONE
```
@jsafrane @saad-ali PTAL. Thanks in advance
Automatic merge from submit-queue
Move api helpers.go to a subpackage
Part of https://github.com/kubernetes/kubernetes/issues/44065.
This PR moves the pkg/api/helpers.go to its own subpackage. It's mostly a mechanic move, except that
* I removed ConversionError in helpers.go, it's not used by anyone
* I moved the 3 methods of Taint and Toleration to pkg/api/methods.go, and left a TODO saying refactoring these methods to functions.
I'll send a few more PRs to make the k8s.io/kubernetes/pkg/api package only contains the code we want in the k8s.io/api repo, then we can run a [script](a0015fd1be (diff-7a2fbb4371972350ee414c6b88aee1c8)) to cut the new repo.
Automatic merge from submit-queue
Add support for IP aliases for pod IPs (GCP alpha feature)
```release-note
Adds support for allocation of pod IPs via IP aliases.
# Adds KUBE_GCE_ENABLE_IP_ALIASES flag to the cluster up scripts (`kube-{up,down}.sh`).
KUBE_GCE_ENABLE_IP_ALIASES=true will enable allocation of PodCIDR ips
using the ip alias mechanism rather than using routes. This feature is currently
only available on GCE.
## Usage
$ CLUSTER_IP_RANGE=10.100.0.0/16 KUBE_GCE_ENABLE_IP_ALIASES=true bash -x cluster/kube-up.sh
# Adds CloudAllocator to the node CIDR allocator (kubernetes-controller manager).
If CIDRAllocatorType is set to `CloudCIDRAllocator`, then allocation
of CIDR allocation instead is done by the external cloud provider and
the node controller is only responsible for reflecting the allocation
into the node spec.
- Splits off the rangeAllocator from the cidr_allocator.go file.
- Adds cloudCIDRAllocator, which is used when the cloud provider allocates
the CIDR ranges externally. (GCE support only)
- Updates RBAC permission for node controller to include PATCH
```
A warning type event should be recorded when failed to calculate
the number of expected pods.
And the same to daemoncontroller when failed to place pod.
Automatic merge from submit-queue
Remove alphaProvisioner in PVController and AlphaStorageClassAnnotation
remove alpha annotation and alphaProvisioner
**Release note**:
```release-note
NONE
```
If CIDRAllocatorType is set to `CloudCIDRAllocator`, then allocation
of CIDR allocation instead is done by the external cloud provider and
the node controller is only responsible for reflecting the allocation
into the node spec.
- Splits off the rangeAllocator from the cidr_allocator.go file.
- Adds cloudCIDRAllocator, which is used when the cloud provider allocates
the CIDR ranges externally. (GCE support only)
- Updates RBAC permission for node controller to include PATCH
Automatic merge from submit-queue (batch tested with PRs 43900, 44152, 44324)
make deployment unit tests need to respect subresources
Fixes#42569
I check all the unit test code related to `Matches` method, seems there's only one line we could change to not break previous testing logic
@kargakis ptal, thanks
/assign @kargakis
Automatic merge from submit-queue
add Stringer interface for eventType
**What this PR does / why we need it**:
fix invalid log outputs like
"graph_builder.go:429] GraphBuilder process object: v1/Endpoints, namespace kube-system, name kube-controller-manager, event type %!s(garbagecollector.eventType=1)"
Automatic merge from submit-queue (batch tested with PRs 41758, 44137)
Removed hostname/subdomain annotation.
fixes#44135
```release-note
Remove `pod.beta.kubernetes.io/hostname` and `pod.beta.kubernetes.io/subdomain` annotations.
Users should use `pod.spec.hostname` and `pod.spec.subdomain` instead.
```
Automatic merge from submit-queue
Improve event msg for PV controller when using external provisioner
Improve event msg for PV controller when using external provisioner
**Which issue this PR fixes** *:
Fixed part of #42121
**Special notes for your reviewer**:
@jsafrane, as many of our users are confused by the original message, can we fix the message first and then consider how to control the count of the events?
Automatic merge from submit-queue (batch tested with PRs 41775, 39678, 42629, 42524, 43028)
matchPredicate does not fit findByClaim()
matchPredicate has two args which are type of PV,and is not used in function findByClaim(),remove it
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 43373, 41780, 44141, 43914, 44180)
Updated comments according to the logic.
Updated comments according to the logic: it'll return empty string instead of nil if failed to match the regex.
Automatic merge from submit-queue
small code improvements and fix some typos
Signed-off-by: bruceauyeung <ouyang.qinhua@zte.com.cn>
**What this PR does / why we need it**:
1. eliminate unnecessary extra scale up limit calculation
2. eliminate unnecessary extra pods length computation
3. fix some typos
Automatic merge from submit-queue (batch tested with PRs 42961, 44042)
Allow swapping NotReady and Unschedulable Taints
Fix#43444
cc @kubernetes/sig-scheduling-pr-reviews @davidopp @aveshagarwal @mdshuai
For cherrypick @ethernetdan
Automatic merge from submit-queue (batch tested with PRs 43963, 43965)
Update deployment retries to a saner count
It seems that the current retries sum up to no more than 0.2s so some transient errors may drop deployments out of the queue.
Automatic merge from submit-queue (batch tested with PRs 43963, 43965)
Wait for clean old RSs statuses in the middle of Recreate rollouts
After https://github.com/kubernetes/kubernetes/pull/43508 got merged, we started returning ReplicaSets with no pods but with stale statuses back to the rollout functions. As a consequence, one of our e2e tests that checks if a Recreate Deployment runs pods from different versions, started flakying because the Deployment status may be incorrect. This change simply waits for the statuses to get cleaned up before proceeding with scaling up the new RS.
Fixes https://github.com/kubernetes/kubernetes/issues/43864
@kubernetes/sig-apps-bugs
Automatic merge from submit-queue (batch tested with PRs 44084, 42964)
Updated AddOrUpdateTolerationInPod to return bool only.
Updated AddOrUpdateTolerationInPod to return bool only, as there's no case to generate error (the error was used for annotation, it'll not return error after moving to field); and also update admission & daemonset accordingly.
Automatic merge from submit-queue (batch tested with PRs 43453, 42992)
make sure that the volume satisfies the requirements of the claim before binding
check if the volume requested by the claim satisfies the requirements of the claim before binding when
syncUnboundClaim and claim.Spec.VolumeName is not set, although the volume is asked by user
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Adding gnufied as reviewer for volume controller
I have helped review several PRs and made new
PRs to this area.
cc @childsb @saad-ali
check if the volume requested by the claim satisfies the requirements of the claim before binding when
syncUnboundClaim and claim.Spec.VolumeName is not set
Automatic merge from submit-queue
Make Constants Public so that They Can Be Used in an Ext. Provisioner
Out-of-tree external provisioners have the same purpose as in-tree provisioners. As external provisioners work with PV and PVC datastructures it's an advantage to import certain Kubernetes packages instead of copy-pasting the Kubernetes code.
That's why the constants are made public so that they can be used in an external provisioner.
@jsafrane @kubernetes/sig-storage-pr-reviews
```
NONE
```
Automatic merge from submit-queue
update pkg/controller/volume/OWNER to add appropriate approvers for both volume controllers
Update pkg/controller/volume approvers so that the attach/detach and binding controllers have approvers.
Automatic merge from submit-queue
Attach/detach controller: fix potential race in constructor
**What this PR does / why we need it**:
There is a potential race condition in the Attach/detach controller: The "constructor" first installs informer event handlers and then creates and initializes the other data structures. However there is no guarantee an event cannot arrive before the data structures required by the event handlers are ready. This may result in nil pointer derefernces and potential crashes (e.g. the nodeAdd method calls adc.actualStateOfWorld.SetNodeStatusUpdateNeeded even though the actualStateOfWorld might be still nil).
It should be enough just to move the event handlers installation at the end of the constructor function.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
fix createProvisionedPV result err judgenment bug
When ctrl.kubeClient.Core().PersistentVolumes().Create(volume) returns no err, and storeVolumeUpdate() fails, we save PV sucessfully ,but here err is not nil,we should not run the codes next in block if err != nil {} to delete the storage asset.
same in the deletion retries below
change the err names to make it clear
**Release note**:
```release-note
NONE
```
@jsafrane @saad-ali PTAL. Thanks
Automatic merge from submit-queue
Fix PVC Annotations
annDynamicallyProvisioned is added to PV, while PVC has annStorageProvisioner.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 43429, 43416, 43312, 43141, 43421)
remove duplicated import
**What this PR does / why we need it**:
**Which issue this PR fixes**:
**Special notes for your reviewer**:
**Release note**:
```
NONE
```
Automatic merge from submit-queue (batch tested with PRs 43378, 43216, 43384, 43083, 43428)
Remove unused argument of 'groupJobsByParent' in cronjob controller
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
When ctrl.kubeClient.Core().PersistentVolumes().Create(volume) returns no err, but storeVolumeUpdate() failed, we save PV sucessfully ,but here err is not nil,
we should not run the codes next in block if err != nil {}
same in the deletion retries below
Automatic merge from submit-queue (batch tested with PRs 42522, 42545, 42556, 42006, 42631)
optimize the binding logic of bindClaimToVolume
extract var shouldSetBoundByController and do not need to judge volumename twice
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
fix typos
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue
small change to ControlleeExpectations result judgement
GetByKey() will never return err, so err != nil {} is redundant,remove it and remove the err return too
**Release note**:
```release-note
NONE
```
Out-of-tree external provisioners have the same purpose as in-tree provisioners. As external provisioners work with PV and PVC datastructures it's an advantage to import certain Kubernetes packages instead of copy-pasting the Kubernetes code.
That's why the constants are made public so that they can be used in an external provisioner.
Auto-generated via:
git grep -l [Ss]uccesfully | xargs sed -ri 's/([sS])uccesfully/\1uccessfully/g'
I noticed this when running kube-scheduler with --v4 and it is annoying.
Then manually reverted changed to the vendored bits.
Automatic merge from submit-queue (batch tested with PRs 43481, 43419, 42741, 43480)
controller: work around milliseconds skew in AddAfter
AddAfter is not requeueing precisely after the provided time and may
skew for some millieseconds. This is really important because controllers
don't relist often so a missed check because of ms difference is
essentially dropping the key. For example, in [1] the test requeues a
Deployment for a progress check after 10s[2] but the Deployment is synced
9ms earlier ending up in the controller not recognizing the Deployment as
failed thus dropping it from the queue w/o any error. The drop is fixed by
forcing the controller to resync the Deployment but we are going to resync
after the full duration.
@deads2k if you don't like this I am going to handle this on a case by case basis
[1] https://github.com/kubernetes/kubernetes/issues/39785#issuecomment-279959133
[2] c48b2cab0f/test/e2e/deployment.go (L1122)
Automatic merge from submit-queue
Rate limit HPA controller to sync period
Since the HPA controller pulls information from an external source that
makes no guarantees about consistency, it's possible for the HPA
to get into an infinite update loop -- if the metrics change with
every query, the HPA controller will run it's normal reconcilation,
post a status update, see that status update itself, fetch new metrics,
and if those metrics are different, post another status update, and
repeat. This can lead to continuously updating a single HPA.
By rate-limiting each HPA to once per sync interval, we prevent this
from happening.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Deflake TestSyncDeploymentDeletionRace
**What this PR does / why we need it**:
The cache was sometimes catching up while we were testing the case
where the cache is not yet caught up.
Before this fix, I could reproduce the failure with the following
command. After the fix, it passes.
```
go test -count 100000 -run TestSyncDeploymentDeletionRace
```
I checked the other controllers, and they all were already not starting informers for the deletion race test. I also checked that the deletion race tests for other controllers all pass with `-count 100000`.
**Which issue this PR fixes**:
Fixes#43390
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
The cache was sometimes catching up while we were testing the case
where the cache is not yet caught up.
Before this fix, I could reproduce the failure with the following
command. After the fix, it passes.
```
go test -count 100000 -run TestSyncDeploymentDeletionRace
```
Automatic merge from submit-queue
GC: Fix re-adoption race when orphaning dependents.
**What this PR does / why we need it**:
The GC expects that once it sees a controller with a non-nil
DeletionTimestamp, that controller will not attempt any adoption.
There was a known race condition that could cause a controller to
re-adopt something orphaned by the GC, because the controller is using a
cached value of its own spec from before DeletionTimestamp was set.
This fixes that race by doing an uncached quorum read of the controller
spec just before the first adoption attempt. It's important that this
read occurs after listing potential orphans. Note that this uncached
read is skipped if no adoptions are attempted (i.e. at steady state).
**Which issue this PR fixes**:
Fixes#42639
**Special notes for your reviewer**:
**Release note**:
```release-note
```
cc @kubernetes/sig-apps-pr-reviews
The GC expects that once it sees a controller with a non-nil
DeletionTimestamp, that controller will not attempt any adoption.
There was a known race condition that could cause a controller to
re-adopt something orphaned by the GC, because the controller is using a
cached value of its own spec from before DeletionTimestamp was set.
This fixes that race by doing an uncached quorum read of the controller
spec just before the first adoption attempt. It's important that this
read occurs after listing potential orphans. Note that this uncached
read is skipped if no adoptions are attempted (i.e. at steady state).
Automatic merge from submit-queue
kubectl: Use v1.5-compatible ownership logic when listing dependents.
**What this PR does / why we need it**:
This restores compatibility between kubectl 1.6 and clusters running Kubernetes 1.5.x. It introduces transitional ownership logic in which the client considers ControllerRef when it exists, but does not require it to exist.
If we were to ignore ControllerRef altogether (pre-1.6 client behavior), we would introduce a new failure mode in v1.6 because controllers that used to get stuck due to selector overlap will now make progress. For example, that means when reaping ReplicaSets of an overlapping Deployment, we would risk deleting ReplicaSets belonging to a different Deployment that we aren't about to delete.
This transitional logic avoids such surprises in 1.6 clusters, and does no worse than kubectl 1.5 did in 1.5 clusters. To prevent this when kubectl 1.5 is used against 1.6 clusters, we can cherrypick this change.
**Which issue this PR fixes**:
Fixes#43159
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue
Fix revision when SetDeploymentRevision
When some oldRSs be deleted or cleared(eg. revisionHistoryLimit set 0), the revision for SetDeploymentRevision is incorrect
In particular, we should not assume ControllerRefs are necessarily set.
However, we can still use ControllerRefs that do exist to avoid
interfering with controllers that do use it.
This effectively reverts the client-side changes in
cec3899b96.
We have to maintain the old behavior on the client side to support
version skew when talking to old servers that set the annotation.
However, the new server-side behavior is still to NOT set the
annotation.
Automatic merge from submit-queue
Fix Deployment upgrade test.
**What this PR does / why we need it**:
When the upgrade test operates on Deployments in a pre-1.6 cluster (i.e. during the Setup phase), it needs to use the v1.5 deployment/util logic. In particular, the v1.5 logic does not filter children to only those with a matching ControllerRef.
**Which issue this PR fixes**:
Fixes#42738
**Special notes for your reviewer**:
**Release note**:
```release-note
```
cc @kubernetes/sig-apps-pr-reviews
When the upgrade test operates on Deployments in a pre-1.6 cluster
(i.e. during the Setup phase), it needs to use the v1.5 deployment/util
logic. In particular, the v1.5 logic does not filter children to only
those with a matching ControllerRef.
Automatic merge from submit-queue
Fix taint based pod eviction for clusters where controller manager is not running with allocate-node-cidrs set
Fixes https://github.com/kubernetes/kubernetes/issues/42733
In my cluster, I have not set allocate-node-cidr, and It is causing taint based pod eviction to fail.
@gmarek @kubernetes/sig-scheduling-bugs @davidopp @derekwaynecarr
Automatic merge from submit-queue (batch tested with PRs 38805, 42362, 42862)
Let GC print specific message for RESTMapping failure
Make the error messages reported in https://github.com/kubernetes/kubernetes/issues/39816 to be more specific, also only print the message once.
I'll also update the garbage collector's doc to clearly state we don't support tpr yet.
We'll wait for the watchable discovery feature (@sttts are you going to work on that?) to land in 1.7, and then enable the garbage collector to handle TPR.
cc @hongchaodeng @MikaelCluseau @djMax
Automatic merge from submit-queue (batch tested with PRs 42024, 42780, 42808, 42640)
kubectl: respect DaemonSet strategy parameters for rollout status
It handles "after-merge" comments from #41116
cc @kargakis @janetkuo
I will add one more e2e test later. I need to handle some in company stuff.
Automatic merge from submit-queue (batch tested with PRs 42024, 42780, 42808, 42640)
Node controller test flake 39975 with delay for try function
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#39975
/cc @ncdc @gmarek @liggitt
Since the HPA controller pulls information from an external source that
makes no guarantees about consistency, it's possible for the HPA
to get into an infinite update loop -- if the metrics change with
every query, the HPA controller will run it's normal reconcilation,
post a status update, see that status update itself, fetch new metrics,
and if those metrics are different, post another status update, and
repeat. This can lead to continuously updating a single HPA.
By rate-limiting each HPA to once per sync interval, we prevent this
from happening.
The design of DaemonSet requires a relist before each phase (manage,
update, status) because it does not short-circuit and requeue for each
action triggered.