2015-07-02 07:56:54 +00:00
|
|
|
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
|
|
|
|
|
2016-06-10 23:46:46 +00:00
|
|
|
<!-- BEGIN STRIP_FOR_RELEASE -->
|
|
|
|
|
2016-07-15 09:44:58 +00:00
|
|
|
<img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
|
2016-06-10 23:46:46 +00:00
|
|
|
width="25" height="25">
|
2016-07-15 09:44:58 +00:00
|
|
|
<img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
|
2016-06-10 23:46:46 +00:00
|
|
|
width="25" height="25">
|
2016-07-15 09:44:58 +00:00
|
|
|
<img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
|
2016-06-10 23:46:46 +00:00
|
|
|
width="25" height="25">
|
2016-07-15 09:44:58 +00:00
|
|
|
<img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
|
2016-06-10 23:46:46 +00:00
|
|
|
width="25" height="25">
|
2016-07-15 09:44:58 +00:00
|
|
|
<img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
|
2016-06-10 23:46:46 +00:00
|
|
|
width="25" height="25">
|
|
|
|
|
|
|
|
<h2>PLEASE NOTE: This document applies to the HEAD of the source tree</h2>
|
|
|
|
|
|
|
|
If you are using a released version of Kubernetes, you should
|
|
|
|
refer to the docs that go with that version.
|
|
|
|
|
|
|
|
<!-- TAG RELEASE_LINK, added by the munger automatically -->
|
|
|
|
<strong>
|
|
|
|
The latest release of this document can be found
|
2016-06-13 19:24:34 +00:00
|
|
|
[here](http://releases.k8s.io/release-1.3/docs/proposals/apiserver-watch.md).
|
2016-06-10 23:46:46 +00:00
|
|
|
|
|
|
|
Documentation for other releases can be found at
|
|
|
|
[releases.k8s.io](http://releases.k8s.io).
|
|
|
|
</strong>
|
|
|
|
--
|
|
|
|
|
|
|
|
<!-- END STRIP_FOR_RELEASE -->
|
2015-07-02 07:56:54 +00:00
|
|
|
|
|
|
|
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
|
|
|
|
|
|
|
## Abstract
|
|
|
|
|
2015-12-26 18:19:03 +00:00
|
|
|
In the current system, most watch requests sent to apiserver are redirected to
|
|
|
|
etcd. This means that for every watch request the apiserver opens a watch on
|
|
|
|
etcd.
|
2015-07-02 07:56:54 +00:00
|
|
|
|
|
|
|
The purpose of the proposal is to improve the overall performance of the system
|
|
|
|
by solving the following problems:
|
|
|
|
|
|
|
|
- having too many open watches on etcd
|
|
|
|
- avoiding deserializing/converting the same objects multiple times in different
|
|
|
|
watch results
|
|
|
|
|
|
|
|
In the future, we would also like to add an indexing mechanism to the watch.
|
|
|
|
Although Indexer is not part of this proposal, it is supposed to be compatible
|
|
|
|
with it - in the future Indexer should be incorporated into the proposed new
|
|
|
|
watch solution in apiserver without requiring any redesign.
|
|
|
|
|
|
|
|
|
|
|
|
## High level design
|
|
|
|
|
|
|
|
We are going to solve those problems by allowing many clients to watch the same
|
|
|
|
storage in the apiserver, without being redirected to etcd.
|
|
|
|
|
|
|
|
At the high level, apiserver will have a single watch open to etcd, watching all
|
|
|
|
the objects (of a given type) without any filtering. The changes delivered from
|
|
|
|
etcd will then be stored in a cache in apiserver. This cache is in fact a
|
|
|
|
"rolling history window" that will support clients having some amount of latency
|
|
|
|
between their list and watch calls. Thus it will have a limited capacity and
|
2015-08-25 14:47:58 +00:00
|
|
|
whenever a new change comes from etcd when a cache is full, the oldest change
|
2015-07-02 07:56:54 +00:00
|
|
|
will be remove to make place for the new one.
|
|
|
|
|
|
|
|
When a client sends a watch request to apiserver, instead of redirecting it to
|
|
|
|
etcd, it will cause:
|
|
|
|
|
|
|
|
- registering a handler to receive all new changes coming from etcd
|
2015-08-09 18:18:06 +00:00
|
|
|
- iterating though a watch window, starting at the requested resourceVersion
|
|
|
|
to the head and sending filtered changes directory to the client, blocking
|
2015-07-02 07:56:54 +00:00
|
|
|
the above until this iteration has caught up
|
|
|
|
|
|
|
|
This will be done be creating a go-routine per watcher that will be responsible
|
|
|
|
for performing the above.
|
|
|
|
|
2015-08-09 18:18:06 +00:00
|
|
|
The following section describes the proposal in more details, analyzes some
|
2015-07-02 07:56:54 +00:00
|
|
|
corner cases and divides the whole design in more fine-grained steps.
|
|
|
|
|
|
|
|
|
|
|
|
## Proposal details
|
|
|
|
|
|
|
|
We would like the cache to be __per-resource-type__ and __optional__. Thanks to
|
|
|
|
it we will be able to:
|
|
|
|
- have different cache sizes for different resources (e.g. bigger cache
|
|
|
|
[= longer history] for pods, which can significantly affect performance)
|
|
|
|
- avoid any overhead for objects that are watched very rarely (e.g. events
|
|
|
|
are almost not watched at all, but there are a lot of them)
|
|
|
|
- filter the cache for each watcher more effectively
|
|
|
|
|
|
|
|
If we decide to support watches spanning different resources in the future and
|
|
|
|
we have an efficient indexing mechanisms, it should be relatively simple to unify
|
|
|
|
the cache to be common for all the resources.
|
|
|
|
|
|
|
|
The rest of this section describes the concrete steps that need to be done
|
|
|
|
to implement the proposal.
|
|
|
|
|
|
|
|
1. Since we want the watch in apiserver to be optional for different resource
|
|
|
|
types, this needs to be self-contained and hidden behind a well defined API.
|
|
|
|
This should be a layer very close to etcd - in particular all registries:
|
2015-12-26 18:19:03 +00:00
|
|
|
"pkg/registry/generic/etcd" should be built on top of it.
|
2015-07-02 07:56:54 +00:00
|
|
|
We will solve it by turning tools.EtcdHelper by extracting its interface
|
|
|
|
and treating this interface as this API - the whole watch mechanisms in
|
|
|
|
apiserver will be hidden behind that interface.
|
|
|
|
Thanks to it we will get an initial implementation for free and we will just
|
|
|
|
need to reimplement few relevant functions (probably just Watch and List).
|
2016-07-05 11:39:48 +00:00
|
|
|
Moreover, this will not require any changes in other parts of the code.
|
2015-07-02 07:56:54 +00:00
|
|
|
This step is about extracting the interface of tools.EtcdHelper.
|
|
|
|
|
2015-07-31 07:54:05 +00:00
|
|
|
2. Create a FIFO cache with a given capacity. In its "rolling history window"
|
2015-07-02 07:56:54 +00:00
|
|
|
we will store two things:
|
|
|
|
|
|
|
|
- the resourceVersion of the object (being an etcdIndex)
|
|
|
|
- the object watched from etcd itself (in a deserialized form)
|
|
|
|
|
|
|
|
This should be as simple as having an array an treating it as a cyclic buffer.
|
|
|
|
Obviously resourceVersion of objects watched from etcd will be increasing, but
|
|
|
|
they are necessary for registering a new watcher that is interested in all the
|
2015-09-21 07:21:11 +00:00
|
|
|
changes since a given etcdIndex.
|
2015-07-02 07:56:54 +00:00
|
|
|
|
|
|
|
Additionally, we should support LIST operation, otherwise clients can never
|
|
|
|
start watching at now. We may consider passing lists through etcd, however
|
|
|
|
this will not work once we have Indexer, so we will need that information
|
|
|
|
in memory anyway.
|
|
|
|
Thus, we should support LIST operation from the "end of the history" - i.e.
|
|
|
|
from the moment just after the newest cached watched event. It should be
|
|
|
|
pretty simple to do, because we can incrementally update this list whenever
|
|
|
|
the new watch event is watched from etcd.
|
|
|
|
We may consider reusing existing structures cache.Store or cache.Indexer
|
|
|
|
("pkg/client/cache") but this is not a hard requirement.
|
|
|
|
|
2015-07-31 07:54:05 +00:00
|
|
|
3. Create the new implementation of the API, that will internally have a
|
2015-07-02 07:56:54 +00:00
|
|
|
single watch open to etcd and will store the data received from etcd in
|
|
|
|
the FIFO cache - this includes implementing registration of a new watcher
|
|
|
|
which will start a new go-routine responsible for iterating over the cache
|
|
|
|
and sending all the objects watcher is interested in (by applying filtering
|
|
|
|
function) to the watcher.
|
|
|
|
|
2015-07-31 07:54:05 +00:00
|
|
|
4. Add a support for processing "error too old" from etcd, which will require:
|
2015-07-02 07:56:54 +00:00
|
|
|
- disconnect all the watchers
|
|
|
|
- clear the internal cache and relist all objects from etcd
|
|
|
|
- start accepting watchers again
|
|
|
|
|
2015-07-31 07:54:05 +00:00
|
|
|
5. Enable watch in apiserver for some of the existing resource types - this
|
2015-07-02 07:56:54 +00:00
|
|
|
should require only changes at the initialization level.
|
|
|
|
|
2015-07-31 07:54:05 +00:00
|
|
|
6. The next step will be to incorporate some indexing mechanism, but details
|
2015-07-02 07:56:54 +00:00
|
|
|
of it are TBD.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### Future optimizations:
|
|
|
|
|
|
|
|
1. The implementation of watch in apiserver internally will open a single
|
|
|
|
watch to etcd, responsible for watching all the changes of objects of a given
|
|
|
|
resource type. However, this watch can potentially expire at any time and
|
|
|
|
reconnecting can return "too old resource version". In that case relisting is
|
|
|
|
necessary. In such case, to avoid LIST requests coming from all watchers at
|
|
|
|
the same time, we can introduce an additional etcd event type:
|
2015-07-30 11:27:18 +00:00
|
|
|
[EtcdResync](../../pkg/storage/etcd/etcd_watcher.go#L36)
|
2015-07-02 07:56:54 +00:00
|
|
|
|
2015-08-25 14:47:58 +00:00
|
|
|
Whenever relisting will be done to refresh the internal watch to etcd,
|
2015-07-02 07:56:54 +00:00
|
|
|
EtcdResync event will be send to all the watchers. It will contain the
|
|
|
|
full list of all the objects the watcher is interested in (appropriately
|
|
|
|
filtered) as the parameter of this watch event.
|
|
|
|
Thus, we need to create the EtcdResync event, extend watch.Interface and
|
|
|
|
its implementations to support it and handle those events appropriately
|
|
|
|
in places like
|
2015-09-03 21:50:12 +00:00
|
|
|
[Reflector](../../pkg/client/cache/reflector.go)
|
2015-07-02 07:56:54 +00:00
|
|
|
|
2015-12-26 18:19:03 +00:00
|
|
|
However, this might turn out to be unnecessary optimization if apiserver
|
|
|
|
will always keep up (which is possible in the new design). We will work
|
2015-07-02 07:56:54 +00:00
|
|
|
out all necessary details at that point.
|
|
|
|
|
|
|
|
|
|
|
|
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
2015-07-31 07:54:05 +00:00
|
|
|
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/apiserver-watch.md?pixel)]()
|
2015-07-02 07:56:54 +00:00
|
|
|
<!-- END MUNGE: GENERATED_ANALYTICS -->
|