Watch in apiserver proposal

pull/6/head
Wojciech Tyczynski 2015-07-02 09:56:54 +02:00
parent d9bb171f8a
commit dd6ba6a07f
1 changed files with 184 additions and 0 deletions

View File

@ -0,0 +1,184 @@
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
<!-- BEGIN STRIP_FOR_RELEASE -->
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
width="25" height="25">
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
width="25" height="25">
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
width="25" height="25">
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
width="25" height="25">
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
width="25" height="25">
<h2>PLEASE NOTE: This document applies to the HEAD of the source tree</h2>
If you are using a released version of Kubernetes, you should
refer to the docs that go with that version.
<strong>
The latest 1.0.x release of this document can be found
[here](http://releases.k8s.io/release-1.0/docs/proposals/apiserver_watch.md).
Documentation for other releases can be found at
[releases.k8s.io](http://releases.k8s.io).
</strong>
--
<!-- END STRIP_FOR_RELEASE -->
<!-- END MUNGE: UNVERSIONED_WARNING -->
## Abstract
In the current system, all watch requests send to apiserver are in general
redirected to etcd. This means that for every watch request to apiserver,
apiserver opens a watch on etcd.
The purpose of the proposal is to improve the overall performance of the system
by solving the following problems:
- having too many open watches on etcd
- avoiding deserializing/converting the same objects multiple times in different
watch results
In the future, we would also like to add an indexing mechanism to the watch.
Although Indexer is not part of this proposal, it is supposed to be compatible
with it - in the future Indexer should be incorporated into the proposed new
watch solution in apiserver without requiring any redesign.
## High level design
We are going to solve those problems by allowing many clients to watch the same
storage in the apiserver, without being redirected to etcd.
At the high level, apiserver will have a single watch open to etcd, watching all
the objects (of a given type) without any filtering. The changes delivered from
etcd will then be stored in a cache in apiserver. This cache is in fact a
"rolling history window" that will support clients having some amount of latency
between their list and watch calls. Thus it will have a limited capacity and
whenever a new change comes from etcd when a cache is full, othe oldest change
will be remove to make place for the new one.
When a client sends a watch request to apiserver, instead of redirecting it to
etcd, it will cause:
- registering a handler to receive all new changes coming from etcd
- iteratiting though a watch window, starting at the requested resourceVersion
to the head and sending filetered changes directory to the client, blocking
the above until this iteration has caught up
This will be done be creating a go-routine per watcher that will be responsible
for performing the above.
The following section describes the proposal in more details, analizes some
corner cases and divides the whole design in more fine-grained steps.
## Proposal details
We would like the cache to be __per-resource-type__ and __optional__. Thanks to
it we will be able to:
- have different cache sizes for different resources (e.g. bigger cache
[= longer history] for pods, which can significantly affect performance)
- avoid any overhead for objects that are watched very rarely (e.g. events
are almost not watched at all, but there are a lot of them)
- filter the cache for each watcher more effectively
If we decide to support watches spanning different resources in the future and
we have an efficient indexing mechanisms, it should be relatively simple to unify
the cache to be common for all the resources.
The rest of this section describes the concrete steps that need to be done
to implement the proposal.
1. Since we want the watch in apiserver to be optional for different resource
types, this needs to be self-contained and hidden behind a well defined API.
This should be a layer very close to etcd - in particular all registries:
"pkg/registry/generic/etcd" should be build on top of it.
We will solve it by turning tools.EtcdHelper by extracting its interface
and treating this interface as this API - the whole watch mechanisms in
apiserver will be hidden behind that interface.
Thanks to it we will get an initial implementation for free and we will just
need to reimplement few relevant functions (probably just Watch and List).
Mover, this will not require any changes in other parts of the code.
This step is about extracting the interface of tools.EtcdHelper.
2. Create a FIFO cache with a given capacity. In its "rolling history windown"
we will store two things:
- the resourceVersion of the object (being an etcdIndex)
- the object watched from etcd itself (in a deserialized form)
This should be as simple as having an array an treating it as a cyclic buffer.
Obviously resourceVersion of objects watched from etcd will be increasing, but
they are necessary for registering a new watcher that is interested in all the
changes since a given etcdIndec.
Additionally, we should support LIST operation, otherwise clients can never
start watching at now. We may consider passing lists through etcd, however
this will not work once we have Indexer, so we will need that information
in memory anyway.
Thus, we should support LIST operation from the "end of the history" - i.e.
from the moment just after the newest cached watched event. It should be
pretty simple to do, because we can incrementally update this list whenever
the new watch event is watched from etcd.
We may consider reusing existing structures cache.Store or cache.Indexer
("pkg/client/cache") but this is not a hard requirement.
3. Create a new implementation of the EtcdHelper interface, that will internally
have a single watch open to etcd and will store data received from etcd in the
FIFO cache. This includes implementing registration of a new watcher that will
start a new go-routine responsible for iterating over the cache and sending
appropriately filtered objects to the watcher.
4. Create the new implementation of the API, that will internally have a
single watch open to etcd and will store the data received from etcd in
the FIFO cache - this includes implementing registration of a new watcher
which will start a new go-routine responsible for iterating over the cache
and sending all the objects watcher is interested in (by applying filtering
function) to the watcher.
5. Add a support for processing "error too old" from etcd, which will require:
- disconnect all the watchers
- clear the internal cache and relist all objects from etcd
- start accepting watchers again
6. Enable watch in apiserver for some of the existing resource types - this
should require only changes at the initialization level.
7. The next step will be to incorporate some indexing mechanism, but details
of it are TBD.
### Future optimizations:
1. The implementation of watch in apiserver internally will open a single
watch to etcd, responsible for watching all the changes of objects of a given
resource type. However, this watch can potentially expire at any time and
reconnecting can return "too old resource version". In that case relisting is
necessary. In such case, to avoid LIST requests coming from all watchers at
the same time, we can introduce an additional etcd event type:
[EtcdResync](../../pkg/tools/etcd_helper_watch.go#L36)
Whenever reslisting will be done to refresh the internal watch to etcd,
EtcdResync event will be send to all the watchers. It will contain the
full list of all the objects the watcher is interested in (appropriately
filtered) as the parameter of this watch event.
Thus, we need to create the EtcdResync event, extend watch.Interface and
its implementations to support it and handle those events appropriately
in places like
[Reflector](../../pkg/client/cache/reflector.go)
However, this might turn out to be unnecessary optimization if apiserver
will always keep up (which is possible in the new design). We will work
out all necessary details at that point.
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/apiserver_watch.md?pixel)]()
<!-- END MUNGE: GENERATED_ANALYTICS -->