Automatic merge from submit-queue
Continue on #30774: Change podNamespacer API
continue on #30774, credit to @wojtek-t, Ref #30759
I just fixed a test and converted IsActivePod to operate on *Pod.
Automatic merge from submit-queue
Expose flags for new NodeEviction logic in NodeController
Fix#28832
Last PR from the NodeController NodeEviction logic series.
cc @davidopp @lavalamp @mml
Automatic merge from submit-queue
Nodecontroller doesn't flip readiness on pods if kubeletVersion < 1.2.0
Older versions of the kubelet didn't know how to reconcile pod.Status, so the nodecontroller would mark pods NotReady on netsplit, and if the partition recovered in < 5m, the pods would never get marked Ready resulting in NotReady endpoints indefinitely (till kubelet restart/pod recreate etc).
Automatic merge from submit-queue
Fix default resource limits (node allocatable) for downward api volumes and env vars
@kubernetes/rh-cluster-infra @pmorie @derekwaynecarr
Automatic merge from submit-queue
Add NodeName to EndpointAddress object
Adding a new string type `nodeName` to api.EndpointAddress.
We could also do *ObjectReference to the api.Node object instead, which would be more precise for the future.
```
type ObjectReference struct {
Kind string `json:"kind,omitempty"`
Namespace string `json:"namespace,omitempty"`
Name string `json:"name,omitempty"`
UID types.UID `json:"uid,omitempty"`
APIVersion string `json:"apiVersion,omitempty"`
ResourceVersion string `json:"resourceVersion,omitempty"`
// Optional. If referring to a piece of an object instead of an entire object, this string
// should contain information to identify the sub-object. For example, if the object
// reference is to a container within a pod, this would take on a value like:
// "spec.containers{name}" (where "name" refers to the name of the container that triggered
// the event) or if no container name is specified "spec.containers[2]" (container with
// index 2 in this pod). This syntax is chosen only to have some well-defined way of
// referencing a part of an object.
// TODO: this design is not final and this field is subject to change in the future.
FieldPath string `json:"fieldPath,omitempty"`
}
```
Automatic merge from submit-queue
fix node controller event uid issue
Fix#29289. @smarterclayton ptal. This is not a very elegant fix, if we can use nodeName in log maybe we can set timedValue.Value to node.UID.
Automatic merge from submit-queue
Add volume reconstruct/cleanup logic in kubelet volume manager
Currently kubelet volume management works on the concept of desired
and actual world of states. The volume manager periodically compares the
two worlds and perform volume mount/unmount and/or attach/detach
operations. When kubelet restarts, the cache of those two worlds are
gone. Although desired world can be recovered through apiserver, actual
world can not be recovered which may cause some volumes cannot be cleaned
up if their information is deleted by apiserver. This change adds the
reconstruction of the actual world by reading the pod directories from
disk. The reconstructed volume information is added to both desired
world and actual world if it cannot be found in either world. The rest
logic would be as same as before, desired world populator may clean up
the volume entry if it is no longer in apiserver, and then volume
manager should invoke unmount to clean it up.
Fixes https://github.com/kubernetes/kubernetes/issues/27653
Currently kubelet volume management works on the concept of desired
and actual world of states. The volume manager periodically compares the
two worlds and perform volume mount/unmount and/or attach/detach
operations. When kubelet restarts, the cache of those two worlds are
gone. Although desired world can be recovered through apiserver, actual
world can not be recovered which may cause some volumes cannot be cleaned
up if their information is deleted by apiserver. This change adds the
reconstruction of the actual world by reading the pod directories from
disk. The reconstructed volume information is added to both desired
world and actual world if it cannot be found in either world. The rest
logic would be as same as before, desired world populator may clean up
the volume entry if it is no longer in apiserver, and then volume
manager should invoke unmount to clean it up.
Automatic merge from submit-queue
Move new etcd storage (low level storage) into cacher
In an effort for #29888, we are pushing forward this:
What?
- It changes creating etcd storage.Interface impl into creating config
- In creating cacher storage (StorageWithCacher), it passes config created above and new etcd storage inside.
Why?
- We want to expose the information of (etcd) kv client to cacher. Cacher storage uses this information to talk to remote storage.
Automatic merge from submit-queue
Endpoint controller logs errors during normal cluster behavior
The endpoint controller logs an error when its forbidden from creating new endpoints during namespace termination. This is normal cluster behavior, and therefore should not be logged. This confuses operators administrating the cluster.
Updated to log at a lower level in response to a forbidden message when performing a create operation. In case of an error on the API server side of the house, I continue to requeue the key. It should be ignored in a future syncService call once the service is deleted as part of namespace termination.
See https://bugzilla.redhat.com/show_bug.cgi?id=1347425
/cc @kubernetes/rh-cluster-infra
Automatic merge from submit-queue
Name jobs created by sj deterministically
```release-note
Name the job created by scheduledjob (sj) deterministically with sj's name and a hash of job's scheduled time.
```
@erictune @soltysh
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.kubernetes.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.kubernetes.io/reviews/kubernetes/kubernetes/30420)
<!-- Reviewable:end -->
Automatic merge from submit-queue
the observed usage should match those that have hard constraints
in the sync process, the quota will be replenished, the new observed usage will be sumed from each evaluator, if the previousUsed set is not be cleared, the new usage will be dirty, maybe some unusage resource still in , as the code below
newUsage = quota.Mask(newUsage, matchedResources)
for key, value := range newUsage {
usage.Status.Used[key] = value
}
so i think here shoul not set value previousUsed
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.kubernetes.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.kubernetes.io/reviews/kubernetes/kubernetes/29653)
<!-- Reviewable:end -->
Automatic merge from submit-queue
[GarbageCollector] measure latency
First commit is #27600.
In e2e tests, I measure the average time an item spend in the eventQueue(~1.5 ms), dirtyQueue(~13ms), and orphanQueue(~37ms). There is no stress test in e2e yet, so the number may not be useful.
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.kubernetes.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.kubernetes.io/reviews/kubernetes/kubernetes/28387)
<!-- Reviewable:end -->
Automatic merge from submit-queue
add metrics for workqueues
Adds prometheus metrics to work queues and enables them for the resourcequota controller. It would be easy to add this to all other workqueue based controllers and gather basic responsiveness metrics.
@kubernetes/rh-cluster-infra helps debug quota controller responsiveness problems.
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.kubernetes.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.kubernetes.io/reviews/kubernetes/kubernetes/30296)
<!-- Reviewable:end -->
Automatic merge from submit-queue
pkg/controller/garbagecollector: simplify mutexes.
pkg/controller/garbagecollector: simplified synchronization and made idiomatic.
Similar to #29598, we can rely on the zero-value construction behavior
to embed `sync.Mutex` into parent structs.
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.kubernetes.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.kubernetes.io/reviews/kubernetes/kubernetes/29898)
<!-- Reviewable:end -->
Automatic merge from submit-queue
HPA: ignore scale targets whose replica count is 0
Disable HPA when the user (or another component) explicitly sets the replicas to 0.
Fixes#28603
@kubernetes/autoscaling @fgrzadkowski @kubernetes/rh-cluster-infra @smarterclayton @ncdc
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.kubernetes.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.kubernetes.io/reviews/kubernetes/kubernetes/29212)
<!-- Reviewable:end -->
Also, fix unit tests to have the same claim and volume sizes in most of the
tests where we don't test matching based on size and test for a specific size
when we do actually test the matching.
Automatic merge from submit-queue
Run goimport for the whole repo
While removing GOMAXPROC and running goimports, I noticed quite a lot of other files also needed a goimport format. Didn't commit `*.generated.go`, `*.deepcopy.go` or files in `vendor`
This is more for testing if it builds.
The only strange thing here is the gopkg.in/gcfg.v1 => github.com/scalingdata/gcfg replace.
cc @jfrazelle @thockin
Automatic merge from submit-queue
optimize conditions of ServiceReplenishmentUpdateFunc to replenish service
Originally, the replenishQuota method didn't focus on the third parameter object even if others transfered to it, i think the function is not efficient and perfect. then i use the third param to get MatchResources, it will be more exact. for example, if the old pod was quota tracked and the new was not, the replenishQuota only focus on usage resource of the old pod, still if the third parameter object is nil, the process will be same as before
Automatic merge from submit-queue
refactor quota calculation for re-use
Refactors quota calculation to allow reuse. This will allow us to do "punch through" calculation inside of admission if a particular quota needs usage stats and allows downstream re-use by moving calculation closer to the evaluators and separating "needs calculation" logic from "do calculation".
@derekwaynecarr
Automatic merge from submit-queue
Rewrite service controller to apply best controller pattern
This PR is a long term solution for #21625:
We apply the same pattern like replication controller to service controller to avoid the potential process order messes in service controller, the change includes:
1. introduce informer controller to watch service changes from kube-apiserver, so that every changes on same service will be kept in serviceStore as the only element.
2. put the service name to be processed to working queue
3. when process service, always get info from serviceStore to ensure the info is up-to-date
4. keep the retry mechanism, sleep for certain interval and add it back to queue.
5. remote the logic of reading last service info from kube-apiserver before processing the LB info as we trust the info from serviceStore.
The UT has been passed, manual test passed after I hardcode the cloud provider as FakeCloud, however I am not able to boot a k8s cluster with any available cloudprovider, so e2e test is not done.
Submit this PR first for review and for triggering a e2e test.
Automatic merge from submit-queue
Fix deployment e2e test: waitDeploymentStatus should error when entering an invalid state
Follow up #28162
1. We should check that max unavailable and max surge aren't violated at all times in e2e tests (didn't check this in deployment scaled rollout yet, but we should wait for it to become valid and then continue do the check until it finishes)
2. Fix some minor bugs in e2e tests
@kubernetes/deployment
Automatic merge from submit-queue
Stabilize volume unit tests by waiting for exact state
Wait for specific final state instead of waiting for specific number of
operations in controller unit tests. The tests are more readable and will survive
random goroutine ordering (PV and PVC controller have both their own
goroutine).
@kubernetes/sig-storage
Automatic merge from submit-queue
add union registry for quota
Adds the ability to combine multiple quota registries together. Kube needs this for other types.
@derekwaynecarr
Automatic merge from submit-queue
Error out when any RS has more available pods then its spec replicas
Fixes#29559 (hopefully, if not the bot will open new issues for us)
@kubernetes/deployment
Automatic merge from submit-queue
pkg/various: plug leaky time.New{Timer,Ticker}s
According to the documentation for Go package time, `time.Ticker` and
`time.Timer` are uncollectable by garbage collector finalizers. They
leak until otherwise stopped. This commit ensures that all remaining
instances are stopped upon departure from their relative scopes.
Similar efforts were incrementally done in #29439 and #29114.
```release-note
* pkg/various: plugged various time.Ticker and time.Timer leaks.
```
Automatic merge from submit-queue
pkg/controller/node/nodecontroller: simplify mutex
Similar to #29598, we can rely on the zero-value construction behavior
to embed `sync.Mutex` into parent structs.
/CC: @saad-ali
Automatic merge from submit-queue
make the resource prefix in etcd configurable for cohabitation
This looks big, its not as bad as it seems.
When you have different resources cohabiting, the resource name used for the etcd directory needs to be configurable. HPA in two different groups worked fine before. Now we're looking at something like RC<->RS. They normally store into two different etcd directories. This code allows them to be configured to store into the same location.
To maintain consistency across all resources, I allowed the `StorageFactory` to indicate which `ResourcePrefix` should be used inside `RESTOptions` which already contains storage information.
@lavalamp affects cohabitation.
@smarterclayton @mfojtik prereq for our rc<->rs and d<->dc story.
For non-attachable volumes, do not call GetVolumeName on the plugin and instead
generate a unique name based on the identity of the pod and the name of the volume
within the pod.
Automatic merge from submit-queue
Add an Azure CloudProvider Implementation
This PR adds `Azure` as a cloudprovider provider for Kubernetes. It specifically adds support for native pod networking (via Azure User Defined Routes) and L4 Load Balancing (via Azure Load Balancers).
I did have to add `clusterName` as a parameter to the `LoadBalancers` methods. This is because Azure only allows one "LoadBalancer" object per set of backend machines. This means a single "LoadBalancer" object must be shared across the cluster. The "LoadBalancer" is named via the `cluster-name` parameter passed to `kube-controller-manager` so as to enable multiple clusters per resource group if the user desires such a configuration.
There are few things that I'm a bit unsure about:
1. The implementation of the `Instances` interface. It's not extensively documented, it's not really clear what the different functions are used for, and my questions on the ML didn't get an answer.
2. Counter to the comments on the `LoadBalancers` Interface, I modify the `api.Service` object in `EnsureLoadBalancerDeleted`, but not with the intention of affecting Kube's view of the Service. I simply do it so that I can remove the `Port`s on the `Service` object and then re-use my reconciliation logic that can handle removing stale/deleted Ports.
3. The logging is a bit verbose. I'm looking for guidance on the appropriate log level to use for the chattier bits.
Due to the (current) lack of Instance Metadata Service and lack of Virtual Machine Identity in Azure, the user is required to do a few things to opt-in to this provider. These things are called-out as they are in contrast to AWS/GCE:
1. The user must provision an Azure Active Directory ServicePrincipal with `Contributor` level access to the resource group that the cluster is deployed in. This creation process is documented [by Hashicorp](https://www.packer.io/docs/builders/azure-setup.html) or [on the MSDN Blog](https://blogs.msdn.microsoft.com/arsen/2016/05/11/how-to-create-and-test-azure-service-principal-using-azure-cli/).
2. The user must place a JSON file somewhere on each Node that conforms to the `AzureConfig` struct defined in `azure.go`. (This is automatically done in the Azure flavor of [Kubernetes-Anywhere](https://github.com/kubernetes/kubernetes-anywhere).)
3. The user must specify `--cloud-config=/path/to/azure.json` as an option to `kube-apiserver` and `kube-controller-manager` similarly to how the user would need to pass `--cloud-provider=azure`.
I've been running approximately this code for a month and a half. I only encountered one bug which has since been fixed and covered by a unit test. I've just deployed a new cluster (and a Type=LoadBalancer nginx Service) using this code (via `kubernetes-anywhere`) and have posted [the `kube-controller-manager` logs](https://gist.github.com/colemickens/1bf6a26e7ef9484a72a30b1fcf9fc3cb) for anyone who is interested in seeing the logs of the logic.
If you're interested in this PR, you can use the instructions in my [`azure-kubernetes-demo` repository](https://github.com/colemickens/azure-kubernetes-demo) to deploy a cluster with minimal effort via [`kubernetes-anywhere`](https://github.com/kubernetes/kubernetes-anywhere). (There is currently [a pending PR in `kubernetes-anywhere` that is needed](https://github.com/kubernetes/kubernetes-anywhere/pull/172) in conjuncture with this PR). I also have a pre-built `hyperkube` image: `docker.io/colemickens/hyperkube-amd64:v1.4.0-alpha.0-azure`, which will be kept in sync with the branch this PR stems from.
I'm hoping this can land in the Kubernetes 1.4 timeframe.
CC (potential code reviewers from Azure): @ahmetalpbalkan @brendandixon @paulmey
CC (other interested Azure folk): @brendandburns @johngossman @anandramakrishna @jmspring @jimzim
CC (others who've expressed interest): @codefx9 @edevil @thockin @rootfs
According to the documentation for Go package time, `time.Ticker` and
`time.Timer` are uncollectable by garbage collector finalizers. They
leak until otherwise stopped. This commit ensures that all remaining
instances are stopped upon departure from their relative scopes.
Automatic merge from submit-queue
To break the loop when object found in removeOrphanFinalizer()
To break the loop when object found in removeOrphanFinalizer()
Automatic merge from submit-queue
Allow shareable resources for admission control plugins.
Changes allow admission control plugins to share resources. This is done via new PluginInitialization structure. The structure can be extended for other resources, for now it is an shared informer for namespace plugins (NamespiceLifecycle, NamespaceAutoProvisioning, NamespaceExists).
If a plugins needs some kind of shared resource e.g. client, the client shall be added to PluginInitializer and Wants methods implemented to every plugin which will use it.
Automatic merge from submit-queue
controller/service: minor cleanup
1. always handle short case first for if statement
2. do not capitalize error message
3. put the mutex before the fields it protects
4. prefer switch over if elseif.
Automatic merge from submit-queue
use a separate queue for initial quota calculation
When the quota controller gets backed up on resyncs, it can take a long time to observe the first usage stats which are needed by the admission plugin. This creates a second queue to prioritize the initial calculation.
Automatic merge from submit-queue
Fix "PVC Volume not detached if pod deleted via namespace deletion" issue
Fixes#29051: "PVC Volume not detached if pod deleted via namespace deletion"
This PR:
* Fixes a bug in `desired_state_of_the_world_populator.go` to check the value of `exists` returned by the `podInformer` so that it can delete pods even if the delete event is missed (or fails).
* Reduces the desired state of the world populators sleep period from 5 min to 1 min (reducing the amount of time a volume would remain attached if a volume delete event is missed or fails).
Automatic merge from submit-queue
Allow mounts to run in parallel for non-attachable volumes
This PR:
* Fixes https://github.com/kubernetes/kubernetes/issues/28616
* Enables mount volume operations to run in parallel for non-attachable volume plugins.
* Enables unmount volume operations to run in parallel for all volume plugins.
* Renames `GoRoutineMap` to `GoroutineMap`, resolving a long outstanding request from @thockin: `"Goroutine" is a noun`
When a new rollout with a different size than the previous size of the
deployment is initiated then only the new replica set will notice the
new size. Old replica sets are not updated by the rollout path.
Allow mount volume operations to run in parallel for non-attachable
volume plugins.
Allow unmount volume operations to run in parallel for all volume
plugins.
Automatic merge from submit-queue
Certificate signing controller for TLS bootstrap (alpha)
The controller handles generating and signing certificates when a CertificateSigningRequest has the "Approved" condition. Uses cfssl to support a wide set of possible keys and algorithms. Depends on PR #25562, only the last two commits are relevant to this PR.
cc @mikedanese
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/.github/PULL_REQUEST_TEMPLATE.md?pixel)]()
Automatic merge from submit-queue
Improve quota controller performance by eliminating unneeded list calls
Previously, when syncing quota usage, we asked each registered `Evaluator` to determine the usage it knows to track associated with a `GroupKind` even if that particular `GroupKind` had no associated resources under quota.
This fix makes it that when we sync a quota that just had only `Pod` related compute resources, we do not also calculate the usage stats for things like `ConfigMap`, `Secret`, etc. per quota.
This should be a significant performance gain when running large numbers of `Namespace`'s each with `ResourceQuota` that tracks a subset of resources.
/cc @deads2k @kubernetes/rh-cluster-infra
Automatic merge from submit-queue
[GarbageCollector] Let the RC manager set/remove ControllerRef
What's done:
* RC manager sets Controller Ref when creating new pods
* RC manager sets Controller Ref when adopting pods with matching labels but having no controller
* RC manager clears Controller Ref when pod labels change
* RC manager clears pods' Controller Ref when rc's selector changes
* RC manager stops adoption/creating/deleting pods when rc's DeletionTimestamp is set
* RC manager bumps up ObservedGeneration: The [original code](https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/replication/replication_controller_utils.go#L36) will do this.
* Integration tests:
* verifies that changing RC's selector or Pod's Labels triggers adoption/abandoning
* e2e tests (separated to #27151):
* verifies GC deletes the pods created by RC if DeleteOptions.OrphanDependents=false, and orphans the pods if DeleteOptions.OrphanDependents=true.
TODO:
- [x] we need to be able to select Pods that have a specific ControllerRef. Then each time we sync the RC, we will iterate through all the Pods that has a controllerRef pointing the RC, event if the labels of the Pod doesn't match the selector of RC anymore. This will prevent a Pod from stuck with a stale controllerRef, which could be caused by the race between abandoner (the goroutine that removes controllerRef) and worker the goroutine that add controllerRef to pods).
- [ ] use controllerRef instead of calling `getPodController`. This might be carried out by the control-plane team.
- [ ] according to the controllerRef proposal (#25256): "For debugging purposes we want to add an adoptionTime annotation prefixed with kubernetes.io/ which will keep the time of last controller ownership transfer." This might be carried out by the control-plane team.
cc @lavalamp @gmarek
Automatic merge from submit-queue
[garbage collector] add e2e test
This PR also includes some changes to plumb controller-manager's `--enable_garbage_collector` from the environment variable.
The e2e test will not be run by the core suite because it's marked `[Feature:GarbageCollector]`.
The corresponding jenkins job configuration PR is https://github.com/kubernetes/test-infra/pull/132.
Automatic merge from submit-queue
controller: wait for synced old replica sets on Recreate
Partially fixes https://github.com/kubernetes/kubernetes/issues/27362
Any other work on it should be handled in the replica set level (and/or kubelet if it's required)
@kubernetes/deployment PTAL
Automatic merge from submit-queue
Controllers doesn't take any actions when being deleted.
I started doing it for other controllers but it's not always clear to me how it should work. I'll be adding other ones as separate commits to this PR.
cc @caesarxuchao @lavalamp
Automatic merge from submit-queue
Add hooks for cluster health detection
Separate a function that decides if zone is healthy. First real commit for preventing massive pod eviction.
Ref. #28832
cc @davidopp
Automatic merge from submit-queue
Change storeToNodeConditionLister to return []*api.Node instead of api.NodeList for performance
Currently copies that are made while copying/creating api.NodeList are significant part of scheduler profile, and a bunch of them are made in places, that are not-parallelizable.
Ref #28590
Automatic merge from submit-queue
controller/volume: simplify sync logic in syncUnboundClaim
Remove all unnecessary branching logic. No actual logic changes. Code is more readable now.
Wait for specific final state instead of waiting for specific number of
operations in volume unit tests. The tests are more readable and will survive
random goroutine ordering (PV and PVC controller have both their own
goroutine).
Automatic merge from submit-queue
allow lock acquisition injection for quota admission
Allows for custom lock acquisition when composing the quota admission controller.
@derekwaynecarr I'm still experimenting to make sure this satisfies the need downstream, but looking for agreement in principle
Changes:
* moved waiting for synced caches before starting any work
* refactored worker() to really quit on quit
* changed queue to a ratelimiting queue and added retries on errors
* deep-copy deployments before mutating - we still need to deep-copy
replica sets and pods
Automatic merge from submit-queue
Move deployment functions to deployment/util.go not widely used
If the function is not used in multiple areas, move it to deployment/util.go
fixes#26750
Automatic merge from submit-queue
Some scheduler optimizations
Ref #28590
This PR doesn't do anything fancy - it is just reducing amount of memory allocations in scheduler, which in turn significantly speeds up scheduler.
Automatic merge from submit-queue
allow handler to join after the informer has started
This allows an event handler to join after a SharedInformer has started. It can't add any indexes, but it can add its reaction functions.
This works by
1. stopping the flow of events from the reflector (thus stopping updates to our store)
1. registering the new handler
1. sending synthetic "add" events to the new handler only
1. unblocking the flow of events
It would be possible to
1. block
1. list
1. add recorder
1. unblock
1. play list to as-yet unregistered handler
1. block
1. remove recorder
1. play recording
1. add new handler
1. unblock
But that is considerably more complicated. I'd rather not start there since this ought to be the exception rather than the rule.
@wojtek-t who requested this power in the initial review
@smarterclayton @liggitt I think this resolves our all-in-one ordering problem.
@hongchaodeng since this came up on the call
* rolling.go (has all the logic for rolling deployments)
* recreate.go (has all the logic for recreate deployments)
* sync.go (has all the logic for getting and scaling replica sets)
* rollback.go (has all the logic for rolling back a deployment)
* util.go (contains all the utilities used throughout the controller)
Leave back at deployment_controller.go all the necessary bits for
creating, setting up, and running the controller loop.
Also add package documentation.
Automatic merge from submit-queue
Use `CreatedByAnnotation` constant
A nit but didn't want the strings to get out of sync.
Signed-off-by: Doug Davis <dug@us.ibm.com>
Automatic merge from submit-queue
Track object modifications in fake clientset
Fake clientset is used by unit tests extensively but it has some
shortcomings:
- no filtering on namespace and name: tests that want to test objects in
multiple namespaces end up getting all objects from this clientset,
as it doesn't perform any filtering based on name and namespace;
- updates and deletes don't modify the clientset state, so some tests
can get unexpected results if they modify/delete objects using the
clientset;
- it's possible to insert multiple objects with the same
kind/name/namespace, this leads to confusing behavior, as retrieval is
based on the insertion order, but anchors on the last added object as
long as no more objects are added.
This change changes core.ObjectRetriever implementation to track object
adds, updates and deletes.
Some unit tests were depending on the previous (and somewhat incorrect)
behavior. These are fixed in the following few commits.
Automatic merge from submit-queue
Kubelet should mark VolumeInUse before checking if it is Attached
Kubelet should mark VolumeInUse before checking if it is Attached.
Controller should fetch fresh copy of node object before detach instead of relying on node informer cache.
Fixes#27836
Ensure that kublet marks VolumeInUse before checking if it is Attached.
Also ensures that the attach/detach controller always fetches a fresh
copy of the node object before detach (instead ofKubelet relying on node
informer cache).
Using new fake clientset registry exposes the actual flow on pending
namespace finalization: get namespace, create finalizer, list pods and
delete namespace if there are no pods.
Since fake clientset now correctly tracks objects created by deployment
controller, it triggers different controller behavior: controller only
creates replica set once and updates deployment once.
Fake clientset no longer needs to be prepopulated with records: keeping
them in leads to the name conflict on creates. Also, since fake
clientset now respects namespaces, we need to correctly populate them.
Automatic merge from submit-queue
Convert service account token controller to use a work queue
Converts the service account token controller to use a work queue. This allows parallelization of token generation (useful when there are several simultaneous namespaces or service accounts being created). It also lets us requeue failures to be retried sooned than the next sync period (which can be very long).
Fixes an issue seen when a namespace is created with secrets quotaed, and the token controller tries to create a token secret prior to the quota status having been initialized. In that case, the secret is rejected at admission, and the token controller wasn't retrying until the resync period.
Automatic merge from submit-queue
Fix initialization of volume controller caches.
Fix `PersistentVolumeController.initializeCaches()` to pass pointers to volume or claim to `storeObjectUpdate()` and add extra functions to enforce that the right types are checked in the future.
Fixes#28076
Fix PersistentVolumeController.initializeCaches() to pass pointers to volume
or claim to storeObjectUpdate() and add extra functions to enforce that the
right types are checked in the future.
Fixes#28076
Automatic merge from submit-queue
[client-gen]Add Patch to clientset
* add the Patch() method to the clientset.
* I have to rename the existing Patch() method of `Event` to PatchWithEventNamespace() to avoid overriding.
* some minor changes to the fake Patch action.
cc @Random-Liu since he asked for the method
@kubernetes/sig-api-machinery
ref #26580
```release-note
Add the Patch method to the generated clientset.
```
Automatic merge from submit-queue
Fix startup type error in initializeCaches
The following error was getting logged:
PersistentVolumeController can't initialize caches, expected list of volumes, got:
&{TypeMeta:{Kind: APIVersion:} ListMeta:{SelfLink:/api/v1/persistentvolumes ResourceVersion:11} Items:[]}
The tests make extensive use of NewFakeControllerSource which uses api.List
instead of api.PersistentVolumeList. So use reflect to help iterate over the
items then assert the item type.
fixes#27757
Automatic merge from submit-queue
daemon/controller.go: refactor worker
1. function name is better to be verb or verb+noun
2. remove unnecessary func call
Automatic merge from submit-queue
Proportionally scale paused and rolling deployments
Enable paused and rolling deployments to be proportionally scaled.
Also have cleanup policy work for paused deployments.
Fixes#20853Fixes#20966Fixes#20754
@bgrant0607 @janetkuo @ironcladlou @nikhiljindal
<!-- Reviewable:start -->
---
This change is [<img src="http://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http://reviewable.k8s.io/reviews/kubernetes/kubernetes/20273)
<!-- Reviewable:end -->
The following error was getting logged:
PersistentVolumeController can't initialize caches, expected list of volumes, got:
&{TypeMeta:{Kind: APIVersion:} ListMeta:{SelfLink:/api/v1/persistentvolumes ResourceVersion:11} Items:[]}
Automatic merge from submit-queue
let dynamic client handle non-registered ListOptions
And register v1.ListOptions in the policy group.
Fix#27622
@lavalamp @smarterclayton @krousey
Automatic merge from submit-queue
Prevent detach before node status update
The PR prevents the attach/detach controller from start a detach operation before updating the node status (to remove the volume from the list of attached volumes).
Fixes https://github.com/kubernetes/kubernetes/issues/27836
Automatic merge from submit-queue
AWS/GCE: Spread PetSet volume creation across zones, create GCE volumes in non-master zones
Long term we plan on integrating this into the scheduler, but in the
short term we use the volume name to place it onto a zone.
We hash the volume name so we don't bias to the first few zones.
If the volume name "looks like" a PetSet volume name (ending with
-<number>) then we use the number as an offset. In that case we hash
the base name.
This change implements the desiredStateOfWorld populator to sync up with
the pod informer. It periodically check each pod in the
desiredStateOfworld and verify whether it is still in pod informer
cache. If it not, remove it from the desiredStateOfWorld
Automatic merge from submit-queue
Deployment controller's cleanupUnhealthyReplicas should respect minReadySeconds
```release-note
Fixed an issue that Deployment may be scaled down further than allowed by maxUnavailable when minReadySeconds is set.
```
Fixes#26834
Detected by a flake in deployment rollover e2e test (the only test that specifies `minReadySeconds`).
cc @kubernetes/deployment @pwittrock
cc @mqliang who first added `cleanupUnhealthyReplicas` in deployment controller
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/.github/PULL_REQUEST_TEMPLATE.md?pixel)]()
Automatic merge from submit-queue
Kubelet Volume Manager Wait For Attach Detach Controller and Backoff on Error
* Closes https://github.com/kubernetes/kubernetes/issues/27483
* Modified Attach/Detach controller to report `Node.Status.AttachedVolumes` on successful attach (unique volume name along with device path).
* Modified Kubelet Volume Manager wait for Attach/Detach controller to report success before proceeding with attach.
* Closes https://github.com/kubernetes/kubernetes/issues/27492
* Implemented an exponential backoff mechanism for for volume manager and attach/detach controller to prevent operations (attach/detach/mount/unmount/wait for controller attach/etc) from executing back to back unchecked.
* Closes https://github.com/kubernetes/kubernetes/issues/26679
* Modified volume `Attacher.WaitForAttach()` methods to uses the device path reported by the Attach/Detach controller in `Node.Status.AttachedVolumes` instead of calling out to cloud providers.
Modify attach/detach controller to keep track of volumes to report
attached in Node VolumeToAttach status.
Modify kubelet volume manager to wait for volume to show up in Node
VolumeToAttach status.
Implement exponential backoff for errors in volume manager and attach
detach controller
Automatic merge from submit-queue
Allow disabling of dynamic provisioning
Allow administrators to opt-out of dynamic provisioning. Provisioning is still on by default, which is the current behavior.
Per a conversation with @jsafrane, a boolean toggle was added and plumbed through into the controller. Deliberate disabling will simply return nil from `provisionClaim` whereas a misconfigured provisioner will continue on and generate error events for the PVC.
@kubernetes/rh-storage @saad-ali @thockin @abhgupta
Automatic merge from submit-queue
Fill PV.Status.Message with deleter/recycler errors.
Instead of empty `Message` `kubectl describe pv` now shows:
```
Name: nfs
Labels: <none>
Status: Failed
Claim: default/nfs
Reclaim Policy: Recycle
Access Modes: RWX
Capacity: 1Mi
Message: Recycler failed: Pod was active on the node longer than specified deadline
Source:
Type: NFS (an NFS mount that lasts the lifetime of a pod)
Server: 10.999.999.999
Path: /
ReadOnly: false
```
This is actually a regression since 1.2
@kubernetes/sig-storage
Long term we plan on integrating this into the scheduler, but in the
short term we use the volume name to place it onto a zone.
We hash the volume name so we don't bias to the first few zones.
If the volume name "looks like" a PetSet volume name (ending with
-<number>) then we use the number as an offset. In that case we hash
the base name.
Fixes#27256
Automatic merge from submit-queue
fix updatePod() of RS and RC controllers
Fix updatePod of replication controller manager and replica set controller to handle pod label updates that match no RC or RS.
Fix#27405
This commit adds a new volume manager in kubelet that synchronizes
volume mount/unmount (and attach/detach, if attach/detach controller
is not enabled).
This eliminates the race conditions between the pod creation loop
and the orphaned volumes loops. It also removes the unmount/detach
from the `syncPod()` path so volume clean up never blocks the
`syncPod` loop.
Automatic merge from submit-queue
processor listener: fix locking in pop()
Currently the lock in processorListener is used to guard pendingNotifications. But in pop, it also locks around on select chan. This will block the goroutine with lock acquired.
This PR changes the lock to guard the correct section only.
Automatic merge from submit-queue
volume controller: Convert PersistentVolumes from Kubernetes 1.2
In Kubernetes 1.2 we used template PersistentVolume for provisioning. When a claim for dynamic volume was detected, Kubernetes did:
- create template PV for the claim with dummy pointer to storage asset
- allocate storage asset such as AWS EBS
- fill real pointer to the created storage asset to the template PV
In refactored volume provisioner, Kubernetes allocates the storage asset first and then creates a Kubernetes PV instance already with the correct pointer to the storage asset.
To support seamles upgrade from 1.2 to 1.3 we need to remove these unprovisioned template PVs. The new controller does not use them, it will see PVC for dynamic provisioning and create real PV instead.
See https://github.com/pmorie/pv-haxxz/pull/3 for pseudocode.
Automatic merge from submit-queue
Wait for all volumes/claims to get synced in unit test.
Controller.HasSynced() returns true when all initial claims/volumes were sent
to appropriate goroutines, not when the goroutine has actually processed them.
Fixes#26712
In Kubernetes 1.2 we used template PersistentVolume for provisioning. When a
claim for dynamic volume was detected, Kubernetes did:
- create template PV for the claim with dummy pointer to storage asset
- allocate storage asset such as AWS EBS
- fill real pointer to the created storage asset to the template PV
In refactored volume provisioner, Kubernetes allocates the storage asset first
and then creates a Kubernetes PV instance already with the correct pointer
to the storage asset.
To support seamles upgrade from 1.2 to 1.3 we need to remove these
unprovisioned template PVs. The new controller does not use them, it will see
PVC for dynamic provisioning and create real PV instead.