storage: clean up timer in cacheWatcher.add

In the e2e benchmarks, this timer is a significant source of garbage
and stale timers. Because the timer is not stopped after its use
in the select, it stays in the timer heap until it eventually fires
(5 seconds later). Under load, a lot of 5-second timers can pile up
before any start going away. The timer heap being large makes timer
operations take longer; the operations are O(log N) but N is still big.

The way to fix this in current versions of Go is to stop the underlying
timer explicitly, which this CL does for this one case.

There are many other places in the code that use the same idiom,
but those do not show up on profiles of the e2e server.
I am investigating changes for Go 1.7's runtime that would make
the old code behave like this new code transparently, so I don't
think it's worth updating any uses of the idiom that are not in
hot spots found with profiling.

Measuring 'LIST nodes' latency in milliseconds during e2e test
shows the benefit of this change.

Using Go 1.4.2:

BEFORE  p50: 148±7   p90: 328±19  p99: 513±29  n: 10
AFTER   p50: 151±8   p90: 339±19  p99: 479±20  n: 9

Using Go 1.6.0:

BEFORE  p50: 141±9   p90: 383±32  p99: 604±44  n: 11
AFTER   p50: 140±14  p90: 360±31  p99: 483±39  n: 10
pull/6/head
Russ Cox 2016-03-11 13:35:43 -05:00
parent 4877c988c3
commit e4b369e1d7
1 changed files with 3 additions and 1 deletions

View File

@ -502,9 +502,11 @@ func (c *cacheWatcher) stop() {
}
func (c *cacheWatcher) add(event watchCacheEvent) {
t := time.NewTimer(5 * time.Second)
defer t.Stop()
select {
case c.input <- event:
case <-time.After(5 * time.Second):
case <-t.C:
// This means that we couldn't send event to that watcher.
// Since we don't want to blockin on it infinitely,
// we simply terminate it.