Commit Graph

92 Commits (d5055d49d3547bcdeea415a92da646db35defe91)

Author SHA1 Message Date
Wojciech Tyczynski 01699ef320 Proper fix for non-receiving watchers 2016-12-09 09:43:10 +01:00
Clayton Coleman 3454a8d52c
refactor: update bazel, codec, and gofmt 2016-12-03 19:10:53 -05:00
Clayton Coleman 5df8cc39c9
refactor: generated 2016-12-03 19:10:46 -05:00
Wojciech Tyczynski ec247315be Handle RV in Get calls to storage interface. 2016-12-03 10:18:43 +01:00
Kubernetes Submit Queue cd560926bd Merge pull request #36889 from wojtek-t/reuse_fields_and_labels
Automatic merge from submit-queue

Reuse fields and labels

This should significantly reduce memory allocations in apiserver in large cluster.
Explanation:
- every kubelet is refreshing watch every 5-10 minutes (this generally is not causing relist - it just renews watch)
- that means, in 5000-node cluster, we are issuing ~10 watches per second
- since we don't have "watch heartbets", the watch is issued from previously received resourceVersion
- to make some assumption, let's assume pods are evenly spread across pods, and writes for them are evenly spread - that means, that a given kubelet is interested in 1 per 5000 pod changes
- with that assumption, each watch, has to process 2500 (on average) previous watch events
- for each of such even, we are currently computing fields.

This PR is fixing this problem.
2016-12-02 21:49:43 -08:00
Wojciech Tyczynski 36e6cd19e1 Cache fields for filtering in watchCache. 2016-11-29 09:48:09 +01:00
Wojciech Tyczynski ac7b1065e7 Better waiting for watch event delivery in cacher 2016-11-28 09:25:33 +01:00
Chao Xu 4f3d0e3bde more dependencies packages:
pkg/metrics
pkg/credentialprovider
pkg/security
pkg/securitycontext
pkg/serviceaccount
pkg/storage
pkg/fieldpath
2016-11-23 15:53:09 -08:00
Kubernetes Submit Queue 6f80ec91d6 Merge pull request #35415 from wojtek-t/avoid_get
Automatic merge from submit-queue

Try to avoid Get to etcd in GuaranteedUpdate in Cacher
2016-10-26 16:15:06 -07:00
Wojciech Tyczynski 5d2062db9f Reduce amount of not-helping logs in apiserver 2016-10-26 13:20:07 +02:00
Wojciech Tyczynski a1090151ef Try to avoid Get to etcd in GuaranteedUpdate in Cacher 2016-10-25 21:59:02 +02:00
Wojciech Tyczynski 93c008f8a4 Support resourceVersion in GetToList - unify interface of List and GetToList 2016-10-21 10:09:23 +02:00
Kubernetes Submit Queue 5fcb9fd056 Merge pull request #35125 from wojtek-t/avoid_unnecessary_reallocations
Automatic merge from submit-queue

Avoid unnecessary reallocations of slice in Cacher
2016-10-19 20:33:13 -07:00
Wojciech Tyczynski 0ced3f43bf Avoid unnecessary reallocations of slice in Cacher 2016-10-19 19:33:33 +02:00
Wojciech Tyczynski 8040719d7f Avoid computing key func multiple times in cacher 2016-10-19 08:38:18 +02:00
Wojciech Tyczynski f10b0205e7 Store keys in watchCache store 2016-10-19 08:38:18 +02:00
Wojciech Tyczynski 9895f337ee Avoid unnecessary copies in cacher 2016-10-19 08:33:58 +02:00
Wojciech Tyczynski 0f2270698c Reduce amount of annoying logs in cacher 2016-10-17 16:15:24 +02:00
Wojciech Tyczynski 4d5ac91f88 Add tracing to listing in Cacher 2016-10-17 08:58:40 +02:00
Wojciech Tyczynski 2298e1746c Increase buffer sizes in cacher for watchers interested in all/many objects. 2016-10-13 16:40:33 +02:00
Wojciech Tyczynski c02df26ad6 Improve some logging in cacher 2016-10-07 15:04:08 +02:00
Wojciech Tyczynski 90bc19959d Extend logging in cacher to understand its bottleneck 2016-10-06 10:57:46 +02:00
Hongchao Deng 6f3ac807fd pass SelectionPredicate instead of Filter to storage layer 2016-09-26 09:47:19 -07:00
Lucas Käldström 06917531b3 Move HighWaterMark to the top of the struct in order to fix arm, second time 2016-09-23 20:58:28 +03:00
Wojciech Tyczynski e5b3f19638 Fix logging in cacher 2016-09-14 09:13:41 +02:00
Wojciech Tyczynski 949dd90593 Extend logging for performance debuggin 2016-09-12 12:46:19 +02:00
Wojciech Tyczynski 03a23aed09 Log water mark for incoming queue in cacher 2016-09-09 11:35:05 +02:00
Kubernetes Submit Queue 504ccc6f37 Merge pull request #32275 from wojtek-t/split_process_event
Automatic merge from submit-queue

Split dispatching to watchers in Cacher into separate goroutine.

Should help with #32257
2016-09-08 07:42:12 -07:00
Wojciech Tyczynski e750454c31 Fix allow for non-ready nodes in e2e framework 2016-09-08 14:22:08 +02:00
Wojciech Tyczynski 378cd81dbe Split dispatching to watchers in Cacher into separate goroutine. 2016-09-08 13:27:54 +02:00
Wojciech Tyczynski bd54c389f5 Extend logging for scalability tests debugging 2016-09-08 12:02:59 +02:00
Hongchao Deng a607a69f4a pkg/storage: cleanup Codec() from interface 2016-08-15 20:46:13 -07:00
Kubernetes Submit Queue a69054f9c3 Merge pull request #30368 from wojtek-t/log_terminating_all_watchers
Automatic merge from submit-queue

Log warning when terminating all watchers

Ref #30275

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.kubernetes.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.kubernetes.io/reviews/kubernetes/kubernetes/30368)
<!-- Reviewable:end -->
2016-08-10 09:26:07 -07:00
Wojciech Tyczynski 497f891cfb Log warning when terminating all watchers 2016-08-10 17:04:10 +02:00
Hongchao Deng 7f28eda9be storage interface: remove Backends() 2016-08-07 16:10:18 -07:00
Wojciech Tyczynski 33e612e101 Revert "cacher.go: embed storage.Interface into cacher" 2016-07-22 07:28:45 +02:00
Xiang Li 44c0a1190c cacher.go: embed storage.Interface into cacher 2016-07-16 23:25:48 -07:00
Jordan Liggitt 4fcd999c25
Fix watch cache filtering 2016-07-14 13:13:17 -04:00
Wojciech Tyczynski 1d9bc58328 Extend Filter interface with Trigger() and use it for pods and nodes 2016-07-13 08:45:18 +02:00
Wojciech Tyczynski 7f7ef0879f Change filter to interface in storage.Interface 2016-07-13 08:44:22 +02:00
Xiang Li aa472ff734 cacher: replace usable lock with conditional variable 2016-07-04 08:57:59 -07:00
David McMahon ef0c9f0c5b Remove "All rights reserved" from all the headers. 2016-06-29 17:47:36 -07:00
k8s-merge-robot 00b5b548d6 Merge pull request #26854 from xiang90/cacher
Automatic merge from submit-queue

cacher.go: remove NewCacher func

NewCacher is a wrapper of NewCacherFromConfig. NewCacher understands
how to create a key func from scopeStrategy. However, it is not the
responsibility of cacher. So we should remove this function, and
construct the config in its caller, which should understand scopeStrategy.
2016-06-25 11:10:06 -07:00
Xiang Li c530a5810a cacher: remove unnecessary initialzation 2016-06-04 22:49:45 -07:00
Xiang Li e2aab093aa cacher.go: remove NewCacher func
NewCacher is a wrapper of NewCacherFromConfig. NewCacher understands
how to create a key func from scopeStrategy. However, it is not the
responsibility of cacher. So we should remove this function, and
construct the config in its caller, which should understand scopeStrategy.
2016-06-04 22:46:58 -07:00
Jordan Liggitt f80b59ba87 Return 'too old' errors from watch cache via watch stream 2016-05-10 10:59:53 -04:00
Russ Cox 6a19e46ed6 pkg/storage: cache timers
A previous change here replaced time.After with an explicit
timer that can be stopped, to avoid filling up the active timer list
with timers that are no longer needed. But an even better fix is to
reuse the timers across calls, to avoid filling the allocated heap
with work for the garbage collector. On top of that, try a quick
non-blocking send to avoid the timer entirely.

For the e2e 1000-node kubemark test, basically everything gets faster,
some things significantly so. The 90th and 99th percentile for LIST nodes
in particular are the worst case that has caused SLO/SLA problems
in the past, and this reduces 99th percentile by 10%.

name                               old ms/op  new ms/op   delta
LIST_nodes_p50                      127 ±16%    124 ±13%     ~     (p=0.136 n=29+29)
LIST_nodes_p90                      326 ±12%    278 ±15%  -14.85%  (p=0.000 n=29+29)
LIST_nodes_p99                      453 ±11%    405 ±19%  -10.70%  (p=0.000 n=29+28)
LIST_replicationcontrollers_p50    29.4 ±49%   26.6 ±43%     ~     (p=0.176 n=30+29)
LIST_replicationcontrollers_p90    83.0 ±78%   68.7 ±63%  -17.30%  (p=0.020 n=30+29)
LIST_replicationcontrollers_p99     216 ±43%    173 ±41%  -19.53%  (p=0.000 n=29+28)
DELETE_pods_p50                    24.5 ±14%   24.3 ±17%     ~     (p=0.562 n=30+28)
DELETE_pods_p90                    30.7 ± 1%   30.6 ± 0%   -0.44%  (p=0.000 n=29+28)
DELETE_pods_p99                    77.2 ±34%   56.3 ±27%  -26.99%  (p=0.000 n=30+28)
PUT_replicationcontrollers_p50     5.86 ±26%   5.83 ±36%     ~     (p=1.000 n=29+28)
PUT_replicationcontrollers_p90     15.8 ± 7%   15.9 ± 6%     ~     (p=0.936 n=29+28)
PUT_replicationcontrollers_p99     57.8 ±35%   56.7 ±41%     ~     (p=0.725 n=29+28)
PUT_nodes_p50                      14.9 ± 2%   14.9 ± 1%   -0.55%  (p=0.020 n=30+28)
PUT_nodes_p90                      16.5 ± 1%   16.4 ± 2%   -0.60%  (p=0.040 n=27+28)
PUT_nodes_p99                      57.9 ±47%   44.6 ±42%  -23.02%  (p=0.000 n=30+29)
POST_replicationcontrollers_p50    6.35 ±29%   6.33 ±23%     ~     (p=0.957 n=30+28)
POST_replicationcontrollers_p90    15.4 ± 5%   15.2 ± 6%   -1.14%  (p=0.034 n=29+28)
POST_replicationcontrollers_p99    52.2 ±71%   53.4 ±52%     ~     (p=0.720 n=29+27)
POST_pods_p50                      8.99 ±13%   9.33 ±13%   +3.79%  (p=0.023 n=30+29)
POST_pods_p90                      16.2 ± 4%   16.3 ± 4%     ~     (p=0.113 n=29+29)
POST_pods_p99                      30.9 ±21%   28.4 ±23%   -8.26%  (p=0.001 n=28+29)
POST_bindings_p50                  9.34 ±12%   8.98 ±17%     ~     (p=0.083 n=30+29)
POST_bindings_p90                  16.6 ± 1%   16.5 ± 2%   -0.76%  (p=0.000 n=28+26)
POST_bindings_p99                  23.5 ± 9%   21.4 ± 5%   -8.98%  (p=0.000 n=27+27)
PUT_pods_p50                       10.8 ±11%   10.3 ± 5%   -4.67%  (p=0.000 n=30+28)
PUT_pods_p90                       16.1 ± 1%   16.0 ± 1%   -0.55%  (p=0.003 n=29+29)
PUT_pods_p99                       23.4 ± 9%   21.6 ±14%   -8.03%  (p=0.000 n=28+28)
DELETE_replicationcontrollers_p50  2.42 ±16%   2.50 ±13%     ~     (p=0.072 n=29+29)
DELETE_replicationcontrollers_p90  11.5 ±12%   11.7 ±10%     ~     (p=0.190 n=30+28)
DELETE_replicationcontrollers_p99  19.5 ±21%   19.0 ±22%     ~     (p=0.298 n=29+28)
GET_nodes_p90                      1.20 ±16%   1.18 ±19%     ~     (p=0.626 n=28+29)
GET_nodes_p99                      11.4 ±48%    8.3 ±40%  -27.31%  (p=0.000 n=28+28)
GET_replicationcontrollers_p90     1.04 ±25%   1.03 ±21%     ~     (p=0.682 n=30+29)
GET_replicationcontrollers_p99     12.1 ±81%  10.0 ±123%     ~     (p=0.135 n=28+28)
GET_pods_p90                       1.06 ±19%   1.08 ±21%     ~     (p=0.597 n=29+29)
GET_pods_p99                       3.92 ±43%   2.81 ±39%  -28.39%  (p=0.000 n=27+28)
LIST_pods_p50                      68.0 ±16%   65.3 ±13%     ~     (p=0.066 n=29+29)
LIST_pods_p90                       119 ±19%    115 ±12%     ~     (p=0.091 n=28+27)
LIST_pods_p99                       230 ±18%    226 ±21%     ~     (p=0.251 n=27+28)
2016-04-21 15:53:47 -04:00
Andy Goldstein 049e63d253 Honor starting resourceVersion in watch cache
Compare the requested resourceVersion to each event's resourceVersion to ensure events that occurred
in the past are not sent to the client.
2016-04-14 09:37:22 -04:00
Daniel Smith 4c539bf082 Merge pull request #23490 from wojtek-t/remove_set_from_storage_interface
Remove Set() from storage.Interface.
2016-04-13 14:22:05 -07:00
Jordan Liggitt ada60236f7 Make watch cache behave like uncached watch 2016-04-12 10:14:07 -04:00