Automatic merge from submit-queue
Make sure finalizers prevent deletion on storage that supports graceful deletion
Fixing bug:
Non-empty Finalizers fails to prevent a pod from being deleted, if deleteOptions.GracefulPeriod=0. See https://github.com/kubernetes/kubernetes/issues/32157#issuecomment-245778483
We didn't hit any issue with orphan finalizer because all our tests set finalizers on RC or RS, whose storage doesn't support graceful deletion.
cc @thockin @lavalamp
Automatic merge from submit-queue
api storage: Decouple Decorator from Filter
Continue #28249
What?
This PR decouples Decorator from Filter, i.e. remove Decorator in createFilter().
- For List, Decorator is called on returned list object.
- For Watch, we implement a new watcher to pipe through decorator. Error will be returned as a watch event.
Why?
- We want to change filter to SelectionPredicate struct. But Decorator is designed to be coupled with filtering.
- Per the discussion in #28249, decorator shouldn't be coupled to filter and error from Decorator should be returned instead of assuming false filtering.
Automatic merge from submit-queue
Split path validation into a separate library
This PR splits path segment validation into it's own package. This cuts off one of the restclient's dependency paths to some docker packages, and completely eliminates its dependency on go-restful swagger validation.
cc @kubernetes/sig-api-machinery
Automatic merge from submit-queue
return destroy func to clean up internal resources of storage
What?
Provide a destroy func to clean up internal resources of storage.
It changes **unit tests** to clean up resources. (Maybe fix integration test in another PR.)
Why?
Although apiserver is designed to be long running, there are some cases that it's not.
See https://github.com/kubernetes/kubernetes/issues/31262#issuecomment-242208771
We need to gracefully shutdown and clean up resources.
- Error from Decorator will be returned instead of assuming false filtering
- For List, Decorator is called on return list object
- For Watch, we implement a new watcher to pipe through decorator
Automatic merge from submit-queue
[GarbageCollector] Allow per-resource default garbage collection behavior
What's the bug:
When deleting an RC with `deleteOptions.OrphanDependents==nil`, garbage collector is supposed to treat it as `deleteOptions.OrphanDependents==true", and orphan the pods created by it. But the apiserver is not doing that.
What's in the pr:
Allow each resource to specify the default garbage collection behavior in the registry. For example, RC registry's default GC behavior is Orphan, and Pod registry's default GC behavior is CascadingDeletion.
Automatic merge from submit-queue
change all PredicateFunc to use SelectionPredicate
What?
- This PR changes all PredicateFunc in registry to return SelectionPredicate instead of Matcher interface.
Why?
- We want to pass SelectionPredicate to storage layer. Matcher interface did not expose enough information for indexing.
Automatic merge from submit-queue
Move new etcd storage (low level storage) into cacher
In an effort for #29888, we are pushing forward this:
What?
- It changes creating etcd storage.Interface impl into creating config
- In creating cacher storage (StorageWithCacher), it passes config created above and new etcd storage inside.
Why?
- We want to expose the information of (etcd) kv client to cacher. Cacher storage uses this information to talk to remote storage.
NewCacher is a wrapper of NewCacherFromConfig. NewCacher understands
how to create a key func from scopeStrategy. However, it is not the
responsibility of cacher. So we should remove this function, and
construct the config in its caller, which should understand scopeStrategy.
The codec factory should support two distinct interfaces - negotiating
for a serializer with a client, vs reading or writing data to a storage
form (etcd, disk, etc). Make the EncodeForVersion and DecodeToVersion
methods only take Encoder and Decoder, and slight refactoring elsewhere.
In the storage factory, use a content type to control what serializer to
pick, and use the universal deserializer. This ensures that storage can
read JSON (which might be from older objects) while only writing
protobuf. Add exceptions for those resources that may not be able to
write to protobuf (specifically third party resources, but potentially
others in the future).
Automatic merge from submit-queue
Make etcd cache size configurable
Instead of the prior 50K limit, allow users to specify a more sensible size for their cluster.
I'm not sure what a sensible default is here. I'm still experimenting on my own clusters. 50 gives me a 270MB max footprint. 50K caused my apiserver to run out of memory as it exceeded >2GB. I believe that number is far too large for most people's use cases.
There are some other fundamental issues that I'm not addressing here:
- Old etcd items are cached and potentially never removed (it stores using modifiedIndex, and doesn't remove the old object when it gets updated)
- Cache isn't LRU, so there's no guarantee the cache remains hot. This makes its performance difficult to predict. More of an issue with a smaller cache size.
- 1.2 etcd entries seem to have a larger memory footprint (I never had an issue in 1.1, even though this cache existed there). I suspect that's due to image lists on the node status.
This is provided as a fix for #23323