Commit Graph

533 Commits (d6ed1df4fba92c213e7f51dc9c56ba004f480a63)

Author SHA1 Message Date
Julius Volz 7e85711df0 Beginnings of a tiered index implementation.
This reintroduces a LevelDB-based metrics index.

Change-Id: I4111540301c52255a07b2f570761707a32f72c05
2014-11-25 17:02:00 +01:00
Julius Volz 8dfaa5ecd2 Remove use of freelists for chunk bufs.
Change-Id: Ib887fdb61e1d96da0cd32545817b925ba88831c1
2014-11-25 17:02:00 +01:00
Julius Volz 7b35e0f0b8 Use constants from math package instead of literals.
Change-Id: I55427ba32c2cbb32ee42ec1e3153160965ab8b3c
2014-11-25 17:02:00 +01:00
Julius Volz 15929eece2 Unpin any already loaded chunks upon preloading error.
Change-Id: Ib451136e3ef21bce8b814c21b66eaab727ab341b
2014-11-25 17:02:00 +01:00
Julius Volz fd01d07589 Check that chunk buffer length fits in 16 bit.
Change-Id: Id086a54aa8a1990c1979e747c1c02e53bed6d447
2014-11-25 17:02:00 +01:00
Bjoern Rabenstein 1ca7f24137 Remove float diff tolerance altogether.
Change-Id: I9ea9683a4665d5800fca75560bb4b8a8b4406d55
2014-11-25 17:02:00 +01:00
Bjoern Rabenstein d742edfe0d Fix precision loss.
Large delta values often imply a difference between a large base value
and the large delta value, potentially resulting in small numbers with
a huge precision error. Since large delta values need 8 bytes anyway,
we are not even saving memory.

As a solution, always save the absoluto value rather than a delta once
8 bytes would be needed for the delta. Timestamps are then saved as 8
byte integers, while values are always saved as float64 in that case.

Change-Id: I01100d600515e16df58ce508b50982ffd762cc49
2014-11-25 17:02:00 +01:00
Bjoern Rabenstein dc2e463a97 Improvements after review.
Change-Id: I484359282d4c7113518bbbb131f4f18383c08fdb
2014-11-25 17:02:00 +01:00
Bjoern Rabenstein 52c9dc43a3 Improve testing.
In particular, create a fuzz test for time series.

Change-Id: I523a17912405a0b6b46bd395c781d201dfe55036
2014-11-25 17:02:00 +01:00
Julius Volz 3b25867d61 Add chunk persistence tests, fix storage tests.
Change-Id: Id0b8f5382e99efa839cc0f826e92bbda985fe9a9
2014-11-25 17:02:00 +01:00
Bjoern Rabenstein ecdf5ab14f Index-persistence switched from gob to a hand-coded solution.
Change-Id: Ib4ec42535bd08df16d34d4774bb638e35c5a1841
2014-11-25 17:02:00 +01:00
Julius Volz e7ed39c9a6 Initial experimental snapshot of next-gen storage.
Change-Id: Ifb8709960dbedd1d9f5efd88cdd359ee9fa9d26d
2014-11-25 17:02:00 +01:00
Julius Volz c6e9f085a3 Update used Go version to 1.3.
Go downloads moved to a different URL and require following redirects
(curl's '-L' option) now.

Go 1.3 deliberately randomizes ranges over maps, which uncovered some
bugs in our tests. These are fixed too.

Change-Id: Id2d9e185d8d2379a9b7b8ad5ba680024565d15f4
2014-11-25 17:02:00 +01:00
Bjoern Rabenstein 1909686789 Make metrics exported by the Prometheus server itself more consistent.
- Always spell out the time unit (e.g. milliseconds instead of ms).

- Remove "_total" from the names of metrics that are not counters.

- Make use of the "Namespace" and "Subsystem" fields in the options.

- Removed the "capacity" facet from all metrics about channels/queues.
  These are all fixed via command line flags and will never change
  during the runtime of a process. Also, they should not be part of
  the same metric family. I have added separate metrics for the
  capacity of queues as convenience. (They will never change and are
  only set once.)

- I left "metric_disk_latency_microseconds" unchanged, although that
  metric measures the latency of the storage device, even if it is not
  a spinning disk. "SSD" is read by many as "solid state disk", so
  it's not too far off. (It should be "solid state drive", of course,
  but "metric_drive_latency_microseconds" is probably confusing.)

- Brian suggested to not mix "failure" and "success" outcome in the
  same metric family (distinguished by labels). For now, I left it as
  it is. We are touching some bigger issue here, especially as other
  parts in the Prometheus ecosystem are following the same
  principle. We still need to come to terms here and then change
  things consistently everywhere.

Change-Id: If799458b450d18f78500f05990301c12525197d3
2014-11-25 17:02:00 +01:00
Julius Volz 80b3d3bf34 Speed up disk flushes by removing unnecessary sort.
The first sort in groupByFingerprint already ensures that all resulting sample
lists contain only one fingerprint. We also already assume that all
samples passed into AppendSamples (and thus groupByFingerprint) are
chronologically sorted within each fingerprint.

The extra chronological sort is thus superfluous. Furthermore, this
second sort didn't only sort chronologically, but also compared all
metric fingerprints again (although we already know that we're only
sorting within samples for the same fingerprint). This caused a huge
memory and runtime overhead.

In a heavily loaded real Prometheus, this brought down disk flush times
from ~9 minutes to ~1 minute.

OLD:
BenchmarkLevelDBAppendRepeatingValues   5  331391808 ns/op  44542953 B/op   597788 allocs/op
BenchmarkLevelDBAppendsRepeatingValues  5  329893512 ns/op  46968288 B/op  3104373 allocs/op

NEW:
BenchmarkLevelDBAppendRepeatingValues   5  299298635 ns/op  43329497 B/op   567616 allocs/op
BenchmarkLevelDBAppendsRepeatingValues 20   92204601 ns/op   1779454 B/op    70975 allocs/op

Change-Id: Ie2d8db3569b0102a18010f9e106e391fda7f7883
2014-11-25 17:01:59 +01:00
Julius Volz 21cafe6cd7 Only evict memory series after they are on disk.
This fixes the problem where samples become temporarily unavailable for
queries while they are being flushed to disk. Although the entire
flushing code could use some major refactoring, I'm explicitly trying to
do the minimal change to fix the problem since there's a whole new
storage implementation in the pipeline.

Change-Id: I0f5393a30b88654c73567456aeaea62f8b3756d9
2014-11-25 17:01:59 +01:00
Bjoern Rabenstein 8956faeccb Migrate to new client_golang.
This change will only be submitted when the new client_golang has been
moved to the new version.

Change-Id: Ifceb59333072a08286a8ac910709a8ba2e3a1581
2014-11-25 17:01:59 +01:00
Brian Brazil e041c0cd46 Add console and alert templates with access to all data.
Move rulemanager to it's own package to break cicrular dependency.
Make NewTestTieredStorage available to tests, remove duplication.

Change-Id: I33b321245a44aa727bfc3614a7c9ae5005b34e03
2014-05-30 16:24:56 +01:00
Bjoern Rabenstein ca6a4fccef Weed out our homegrown test.Tester.
The Go stdlib has testing.TB now, which fulfills the exact same
purpose.

Change-Id: I0db9c73400e208ca376b932a02b7e3402234b87c
2014-05-21 19:27:24 +02:00
Julius Volz 4df5c7ab18 Optimize label matcher memory and runtime behavior.
This optimizes the runtime and memory allocation behavior for label matchers
other than type "Equal". Instead of creating a new set for every union of
fingerprints, this simply adds new fingerprints to the existing set to achieve
the same effect.

The current behavior made a production Prometheus unresponsive when running a
NotEqual match against the "instance" label (a label with high value
cardinality).

BEFORE:
BenchmarkGetFingerprintsForNotEqualMatcher        10   170430297 ns/op  39229944 B/op    40709 allocs/op

AFTER:
BenchmarkGetFingerprintsForNotEqualMatcher      5000      706260 ns/op    217717 B/op     1116 allocs/op

Change-Id: Ifd78e81e7dfbf5d7249e50ad1903a5d9c42c347a
2014-05-05 11:29:17 -04:00
Bjoern Rabenstein de9a88b964 Ensure temporal order in streams.
BenchmarkAppendSample.* before this change:

BenchmarkAppendSample1   1000000              1142 ns/op
--- BENCH: BenchmarkAppendSample1
        memory_test.go:81: 1 cycles with 9992.000000 bytes per cycle, totalling 9992
        memory_test.go:81: 100 cycles with 250.399994 bytes per cycle, totalling 25040
        memory_test.go:81: 10000 cycles with 239.428802 bytes per cycle, totalling 2394288
        memory_test.go:81: 1000000 cycles with 255.504684 bytes per cycle, totalling 255504688
BenchmarkAppendSample10   500000              3823 ns/op
--- BENCH: BenchmarkAppendSample10
        memory_test.go:81: 1 cycles with 15536.000000 bytes per cycle, totalling 15536
        memory_test.go:81: 100 cycles with 662.239990 bytes per cycle, totalling 66224
        memory_test.go:81: 10000 cycles with 601.937622 bytes per cycle, totalling 6019376
        memory_test.go:81: 500000 cycles with 598.582764 bytes per cycle, totalling 299291408
BenchmarkAppendSample100           50000             41111 ns/op
--- BENCH: BenchmarkAppendSample100
        memory_test.go:81: 1 cycles with 79824.000000 bytes per cycle, totalling 79824
        memory_test.go:81: 100 cycles with 4924.479980 bytes per cycle, totalling 492448
        memory_test.go:81: 10000 cycles with 4278.019043 bytes per cycle, totalling 42780192
        memory_test.go:81: 50000 cycles with 4275.242676 bytes per cycle, totalling 213762144
BenchmarkAppendSample1000           5000            533933 ns/op
--- BENCH: BenchmarkAppendSample1000
        memory_test.go:81: 1 cycles with 840224.000000 bytes per cycle, totalling 840224
        memory_test.go:81: 100 cycles with 62789.281250 bytes per cycle, totalling 6278928
        memory_test.go:81: 5000 cycles with 55208.601562 bytes per cycle, totalling 276043008
ok      github.com/prometheus/prometheus/storage/metric/tiered  27.828s

BenchmarkAppendSample.* after this change:

BenchmarkAppendSample1   1000000              1109 ns/op
--- BENCH: BenchmarkAppendSample1
        memory_test.go:131: 1 cycles with 9992.000000 bytes per cycle, totalling 9992
        memory_test.go:131: 100 cycles with 250.399994 bytes per cycle, totalling 25040
        memory_test.go:131: 10000 cycles with 239.220795 bytes per cycle, totalling 2392208
        memory_test.go:131: 1000000 cycles with 255.492630 bytes per cycle, totalling 255492624
BenchmarkAppendSample10   500000              3663 ns/op
--- BENCH: BenchmarkAppendSample10
        memory_test.go:131: 1 cycles with 15536.000000 bytes per cycle, totalling 15536
        memory_test.go:131: 100 cycles with 662.239990 bytes per cycle, totalling 66224
        memory_test.go:131: 10000 cycles with 601.889587 bytes per cycle, totalling 6018896
        memory_test.go:131: 500000 cycles with 598.550903 bytes per cycle, totalling 299275472
BenchmarkAppendSample100           50000             40694 ns/op
--- BENCH: BenchmarkAppendSample100
        memory_test.go:131: 1 cycles with 78976.000000 bytes per cycle, totalling 78976
        memory_test.go:131: 100 cycles with 4928.319824 bytes per cycle, totalling 492832
        memory_test.go:131: 10000 cycles with 4277.961426 bytes per cycle, totalling 42779616
        memory_test.go:131: 50000 cycles with 4275.054199 bytes per cycle, totalling 213752720
BenchmarkAppendSample1000           5000            530744 ns/op
--- BENCH: BenchmarkAppendSample1000
        memory_test.go:131: 1 cycles with 842192.000000 bytes per cycle, totalling 842192
        memory_test.go:131: 100 cycles with 62765.441406 bytes per cycle, totalling 6276544
        memory_test.go:131: 5000 cycles with 55209.812500 bytes per cycle, totalling 276049056
ok      github.com/prometheus/prometheus/storage/metric/tiered  27.468s

Change-Id: Idaa339cd83539b5e4391614541a2c3a04002d66d
2014-04-22 15:22:54 +02:00
Julius Volz 1b29975865 Fix RWLock memory storage deadlock.
This fixes https://github.com/prometheus/prometheus/issues/390

The cause for the deadlock was a lock semantic in Go that wasn't
obvious to me when introducing this bug:

http://golang.org/pkg/sync/#RWMutex.Lock

Key phrase: "To ensure that the lock eventually becomes available, a
blocked Lock call excludes new readers from acquiring the lock."

In the memory series storage, we have one function
(GetFingerprintsForLabelMatchers) acquiring an RLock(), which calls
another function also acquiring the same RLock()
(GetLabelValuesForLabelName). That normally doesn't deadlock, unless a
Lock() call from another goroutine happens right in between the two
RLock() calls, blocking both the Lock() and the second RLock() call from
ever completing.

  GoRoutine 1          GoRoutine 2
  ======================================
  RLock()
  ...                  Lock() [DEADLOCK]
  RLock() [DEADLOCK]   Unlock()
  RUnlock()
  RUnlock()

Testing deadlocks is tricky, but the regression test I added does
reliably detect the deadlock in the original code on my machine within a
normal concurrent reader/writer run duration of 250ms.

Change-Id: Ib34c2bb8df1a80af44550cc2bf5007055cdef413
2014-04-17 13:43:13 +02:00
Julius Volz 01f652cb4c Separate storage implementation from interfaces.
This was initially motivated by wanting to distribute the rule checker
tool under `tools/rule_checker`. However, this was not possible without
also distributing the LevelDB dynamic libraries because the tool
transitively depended on Levigo:

rule checker -> query layer -> tiered storage layer -> leveldb

This change separates external storage interfaces from the
implementation (tiered storage, leveldb storage, memory storage) by
putting them into separate packages:

- storage/metric: public, implementation-agnostic interfaces
- storage/metric/tiered: tiered storage implementation, including memory
                         and LevelDB storage.

I initially also considered splitting up the implementation into
separate packages for tiered storage, memory storage, and LevelDB
storage, but these are currently so intertwined that it would be another
major project in itself.

The query layers and most other parts of Prometheus now have notion of
the storage implementation anymore and just use whatever implementation
they get passed in via interfaces.

The rule_checker is now a static binary :)

Change-Id: I793bbf631a8648ca31790e7e772ecf9c2b92f7a0
2014-04-16 13:30:19 +02:00
Matt T. Proud 3e969a8ca2 Parameterize the buffer for marshal/unmarshal.
We are not reusing buffers yet.  This could introduce problems,
so the behavior is disabled for now.

Cursory benchmark data:
- Marshal for 10,000 samples: -30% overhead.
- Unmarshal for 10,000 samples: -15% overhead.

Change-Id: Ib006bdc656af45dca2b92de08a8f905d8d728cac
2014-04-16 12:16:59 +02:00
Matt T. Proud 58ef638e72 Merge "Use idiomatic one-to-many one-time signal pattern." 2014-04-15 21:26:31 +02:00
Matt T. Proud 6ec72393c4 Correct size of unmarshalling destination buffer.
The format header size is not deducted from the size of the byte
stream when calculating the output buffer size for samples.  I have
yet to notice problems directly as a result of this, but it is good
to fix.

Change-Id: Icb07a0718366c04ddac975d738a6305687773af0
2014-04-15 11:55:44 +02:00
Matt T. Proud 81367893fd Use idiomatic one-to-many one-time signal pattern.
The idiomatic pattern for signalling a one-time message to multiple
consumers from a single producer is as follows:

```
  c := make(chan struct{})
  w := new(sync.WaitGroup)  // Boilerplate to ensure synchronization.

  for i := 0; i < 1000; i++ {
    w.Add(1)
    go func() {
      defer w.Done()

      for {
        select {
        case _, ok := <- c:
          if !ok {
            return
          }
        default:
          // Do something here.
        }
      }
    }()
  }

  close(c)  // Signal the one-to-many single-use message.
  w.Wait()

```

Change-Id: I755f73ba4c70a923afd342a4dea63365bdf2144b
2014-04-15 10:15:25 +02:00
Julius Volz c7c0b33d0b Add regex-matching support for labels.
There are four label-matching ops for selecting timeseries now:

- Equal: =
- NotEqual: !=
- RegexMatch: =~
- RegexNoMatch: !~

Instead of looking up labels by a simple clientmodel.LabelSet (basically
an equals op for every key/value pair in the set), timeseries
fingerprint selection is now done via a list of metric.LabelMatchers.

Change-Id: I510a83f761198e80946146770ebb64e4abc3bb96
2014-04-01 14:24:53 +02:00
Julius Volz ae30453214 Add label names -> label values index.
Change-Id: Ie39b4044558afc4d1aa937de7dcf8df61f821fb4
2014-03-28 15:16:37 +01:00
Julius Volz 7a577b86b7 Fix interval op special case.
In the case that a getValuesAtIntervalOp's ExtractSamples() is called
with a current time after the last chunk time, we return without
extracting any further values beyond the last one in the chunk
(correct), but also without advancing the op's time (incorrect). This
leads to an infinite loop in renderView(), since the op is called
repeatedly without ever being advanced and consumed.

This adds handling for this special case. When detecting this case, we
immediately set the op to be consumed, since we would always get a value
after the current time passed in if there was one.

Change-Id: Id99149e07b5188d655331382b8b6a461b677005c
2014-03-26 13:29:03 +01:00
Bjoern Rabenstein 257b720e87 Fix typo.
Change-Id: I6e7edcb48ace7fe4d6de4ff16519da5bb326b6ce
2014-03-25 12:22:18 +01:00
Bjoern Rabenstein caf47b2fbc New encoding for OpenTSDB tag values (and metric names).
Change-Id: I0f4393f638c6e2bb2b2ce14e58e38b49ce456da8
2014-03-21 17:18:44 +01:00
Julius Volz 9d5c367745 Fix incorrect interval op advancement.
This fixes a bug where an interval op might advance too far past the end
of the currently extracted chunk, effectively skipping over relevant
(to-be-extracted) values in the subsequent chunk. The result: missing
samples at chunk boundaries in the resulting view.

Change-Id: Iebf5d086293a277d330039c69f78e1eaf084b3c8
2014-03-18 16:22:50 +01:00
Julius Volz cc04238a85 Switch to new "__name__" metric name label.
This also fixes the compaction test, which before worked only because
the input sample sorting was accidentally equal to the resulting on-disk
sample sorting.

Change-Id: I2a21c4b46ba562424b27058fc02eba84fa6a6006
2014-03-14 16:52:37 +01:00
Bjoern Rabenstein c3b282bd14 Add regression tests for 'loop until op is consumed' bug.
- Most of this is the actual regression test in tiered_test.go.

- Working on that regression tests uncovered problems in
  tiered_test.go that are fixed in this commit.

- The 'op.consumed = false' line added to freelist.go was actually not
  fixing a bug. Instead, there was no bug at all. So this commit
  removes that line again, but adds a regression test to make sure
  that the assumed bug is indeed not there (cf. freelist_test.go).

- Removed more code duplication in operation.go (following the same
  approach as before, i.e. embedding op type A into op type B if
  everything in A is the same as in B with the exception of String()
  and ExtractSample()). (This change make struct literals for ops more
  clunky, but that only affects tests. No code change whatsoever was
  necessary in the actual code after this refactoring.)

- Fix another op leak in tiered.go.

Change-Id: Ia165c52e33290ad4f6aba9c83d92318d4f583517
2014-03-12 18:40:24 +01:00
Julius Volz 86fc13a52e Convert metric.Values to slice of values.
The initial impetus for this was that it made unmarshalling sample
values much faster.

Other relevant benchmark changes in ns/op:

Benchmark                                 old        new   speedup
==================================================================
BenchmarkMarshal                       179170     127996     1.4x
BenchmarkUnmarshal                     404984     132186     3.1x

BenchmarkMemoryGetValueAtTime           57801      50050     1.2x
BenchmarkMemoryGetBoundaryValues        64496      53194     1.2x
BenchmarkMemoryGetRangeValues           66585      54065     1.2x

BenchmarkStreamAdd                       45.0       75.3     0.6x
BenchmarkAppendSample1                   1157       1587     0.7x
BenchmarkAppendSample10                  4090       4284     0.95x
BenchmarkAppendSample100                45660      44066     1.0x
BenchmarkAppendSample1000              579084     582380     1.0x
BenchmarkMemoryAppendRepeatingValues 22796594   22005502     1.0x

Overall, this gives us good speedups in the areas where they matter
most: decoding values from disk and accessing the memory storage (which
is also used for views).

Some of the smaller append examples take minimally longer, but the cost
seems to get amortized over larger appends, so I'm not worried about
these. Also, we're currently not bottlenecked on the write path and have
plenty of other optimizations available in that area if it becomes
necessary.

Memory allocations during appends don't change measurably at all.

Change-Id: I7dc7394edea09506976765551f35b138518db9e8
2014-03-11 18:23:37 +01:00
Julius Volz a7d0973fe3 Add version field to LevelDB sample format.
This doesn't add complex discriminator logic yet, but adds a single
version byte to the beginning of each samples chunk. If we ever need to
change the disk format again, this will make it easy to do so without
having to wipe the entire database.

Change-Id: I60c39274256f790bc2da83167a1effaa174588fe
2014-03-11 14:08:40 +01:00
Julius Volz 1eee448bc1 Store samples in custom binary encoding.
This has been shown to provide immense decoding speed benefits.

See also:

https://groups.google.com/forum/#!topic/prometheus-developers/FeGl_qzGrYs

Change-Id: I7d45b4650e44ddecaa91dad9d7fdb3cd0b9f15fe
2014-03-09 22:31:38 +01:00
Julius Volz c2a2a20f36 Remove obsolete scanjobs timer.
Change-Id: Ifb29b4d93c9c1c6cacb8b098d5237866925c9fac
2014-03-07 17:10:28 +01:00
Julius Volz dd4892dcad Ensure no ops are leaked in renderView().
Change-Id: I6970a9098be305fcd010d46443b040d864d9740a
2014-03-07 14:33:13 +01:00
Julius Volz 5745ce0a60 Fixups for single-op-per-fingerprint view rendering.
Change-Id: Ie496d4529b65a3819c6042f43d7cf99e0e1ac60b
2014-03-07 00:54:28 +01:00
Björn Rabenstein 8b43497002 Merge "Fix memory series indexing bug." 2014-03-06 11:53:10 +01:00
Björn Rabenstein 0bb33b6525 Merge "Remove unused labelname -> fingerprints index." 2014-03-06 11:40:09 +01:00
Julius Volz d6827b6898 Fix memory series indexing bug.
This fixes https://github.com/prometheus/prometheus/issues/381.

For any stale series we dropped from memory, this bug caused us to also drop
any other series from the labelpair->fingerprints memory index if they had any
label/value-pairs in common with the intentionally dropped series.

To fix this issue more easily, I converted the labelpair->fingerprints index
map values to a utility.Set of clientmodel.Fingerprints. This makes handling
this index much easier in general.

Change-Id: If5e81e202e8c542261bbd9797aa1257376c5c074
2014-03-06 01:23:22 +01:00
Julius Volz c6013ff309 Remove unused labelname -> fingerprints index.
Change-Id: Ie4ccea3a230532e670030ca64ede9435b1b3e506
2014-03-05 23:49:33 +01:00
Bjoern Rabenstein 9ea9189dd1 Remove the multi-op-per-fingerprint capability.
Currently, rendering a view is capable of handling multiple ops for
the same fingerprint efficiently. However, this capability requires a
lot of complexity in the code, which we are not using at all because
the way we assemble a viewRequest will never have more than one
operation per fingerprint.

This commit weeds out the said capability, along with all the code
needed for it. It is still possible to have more than one operation
for the same fingerprint, it will just be handled in a less efficient
way (as proven by the unit tests).

As a result, scanjob.go could be removed entirely.

This commit also contains a few related refactorings and removals of
dead code in operation.go, view,go, and freelist.go. Also, the
docstrings received some love.

Change-Id: I032b976e0880151c3f3fdb3234fb65e484f0e2e5
2014-03-04 16:29:56 +01:00
Bjoern Rabenstein e11e8c7a23 Unify LevelDB.*Options.
We have seven different types all called like LevelDB.*Options.  One
of them is the plain LevelDBOptions. All others are just wrapping that
type without adding anything except clunkier handling.

If there ever was a plan to add more specific options to the various
LevelDB.*Options types, history has proven that nothing like that is
going to happen anytime soon.

To keep the code a bit shorter and more focused on the real (quite
significant) complexities we have to deal with here, this commit
reduces all uses of LevelDBOptions to the actual LevelDBOptions type.

1576 fewer characters to read...

Change-Id: I3d7a2b7ffed78b337aa37f812c53c058329ecaa6
2014-02-27 16:03:58 +01:00
Bjoern Rabenstein 6bc083f38b Major code cleanup in storage.
- Mostly docstring fixed/additions.
  (Please review these carefully, since most of them were missing, I
  had to guess them from an outsider's perspective. (Which on the
  other hand proves how desperately required many of these docstrings
  are.))

- Removed all uses of new(...) to meet our own style guide (draft).

- Fixed all other 'go vet' and 'golint' issues (except those that are
  not fixable (i.e. caused by bugs in or by design of 'go vet' and
  'golint')).

- Some trivial refactorings, like reorder functions, minor renames, ...

- Some slightly less trivial refactoring, mostly to reduce code
  duplication by embedding types instead of writing many explicit
  forwarders.

- Cleaned up the interface structure a bit. (Most significant probably
  the removal of the View-like methods from MetricPersistenc. Now they
  are only in View and not duplicated anymore.)

- Removed dead code. (Probably not all of it, but it's a first
  step...)

- Fixed a leftover in storage/metric/end_to_end_test.go (that made
  some parts of the code never execute (incidentally, those parts
  were broken (and I fixed them, too))).

Change-Id: Ibcac069940d118a88f783314f5b4595dce6641d5
2014-02-27 15:22:37 +01:00
Björn Rabenstein 59febe771a Merge "Minor code cleanups." 2014-02-13 15:29:16 +01:00
Julius Volz c4adfc4f25 Minor code cleanups.
Change-Id: Ib3729cf38b107b7f2186ccf410a745e0472e3630
2014-02-13 15:24:43 +01:00
Julius Volz 8cadae6102 Merge "Fix LevelDB closing order." 2014-02-03 23:22:30 +01:00
Julius Volz 94666e20b7 Minor test error reporting cleanup.
Change-Id: Ie11c16b4e60de7c179c6d2a86e063f4432e2000f
2014-02-03 12:27:01 +01:00
Julius Volz fd2158e746 Store copy of metric during fingerprint caching
Problem description:
====================
If a rule evaluation referencing a metric/timeseries M happens at a time
when M doesn't have a memory timeseries yet, looking up the fingerprint
for M (via TieredStorage.GetMetricForFingerprint()) will create a new
Metric object for M which gets both: a) attached to a new empty memory
timeseries (so we don't have to ask disk for the Metric's fingerprint
next time), and b) returned to the rule evaluation layer. However, the
rule evaluation layer replaces the name label (and possibly other
labels) of the metric with the name of the recorded rule.  Since both
the rule evaluator and the memory storage share a reference to the same
Metric object, the original memory timeseries will now also be
incorrectly renamed.

Fix:
====
Instead of storing a reference to a shared metric object, take a copy of
the object when creating an empty memory timeseries for caching
purposes.

Change-Id: I9f2172696c16c10b377e6708553a46ef29390f1e
2014-02-02 17:11:08 +01:00
Julius Volz 718ad2224b Fix LevelDB closing order.
The storage itself should be closed before any of the objects passed into it
are closed (otherwise closing the storage can randomly freeze). Defers are
executed in reverse order, so closing the storage should be the last of the
defer statements.

Change-Id: Id920318b876f5b94767ed48c81221b3456770620
2014-01-28 15:16:06 +01:00
Bjoern Rabenstein c342ad33a0 Fix OperatorError.
This used to work with Go 1.1, but only because of a compiler bug.
The bug is fixed in Go 1.2, so we have to fix our code now.

Change-Id: I5a9f3a15878afd750e848be33e90b05f3aa055e1
2014-01-21 16:49:51 +01:00
Julius Volz d5ef0c64dc Merge "Add optional sample replication to OpenTSDB." 2014-01-08 17:45:08 +01:00
Julius Volz 61d26e8445 Add optional sample replication to OpenTSDB.
Prometheus needs long-term storage. Since we don't have enough resources
to build our own timeseries storage from scratch ontop of Riak,
Cassandra or a similar distributed datastore at the moment, we're
planning on using OpenTSDB as long-term storage for Prometheus. It's
data model is roughly compatible with that of Prometheus, with some
caveats.

As a first step, this adds write-only replication from Prometheus to
OpenTSDB, with the following things worth noting:

1)
I tried to keep the integration lightweight, meaning that anything
related to OpenTSDB is isolated to its own package and only main knows
about it (essentially it tees all samples to both the existing storage
and TSDB). It's not touching the existing TieredStorage at all to avoid
more complexity in that area. This might change in the future,
especially if we decide to implement a read path for OpenTSDB through
Prometheus as well.

2)
Backpressure while sending to OpenTSDB is handled by simply dropping
samples on the floor when the in-memory queue of samples destined for
OpenTSDB runs full.  Prometheus also only attempts to send samples once,
rather than implementing a complex retry algorithm. Thus, replication to
OpenTSDB is best-effort for now.  If needed, this may be extended in the
future.

3)
Samples are sent in batches of limited size to OpenTSDB. The optimal
batch size, timeout parameters, etc. may need to be adjusted in the
future.

4)
OpenTSDB has different rules for legal characters in tag (label) values.
While Prometheus allows any characters in label values, OpenTSDB limits
them to a to z, A to Z, 0 to 9, -, _, . and /. Currently any illegal
characters in Prometheus label values are simply replaced by an
underscore. Especially when integrating OpenTSDB with the read path in
Prometheus, we'll need to reconsider this: either we'll need to
introduce the same limitations for Prometheus labels or escape/encode
illegal characters in OpenTSDB in such a way that they are fully
decodable again when reading through Prometheus, so that corresponding
timeseries in both systems match in their labelsets.

Change-Id: I8394c9c55dbac3946a0fa497f566d5e6e2d600b5
2014-01-02 18:21:38 +01:00
Stuart Nelson 0c58e388f6 rename curation metrics to prometheus_curation
Change-Id: I6a0bf277e88ea8eb737670b7e865ae20f2cbfb91
2013-12-13 17:45:01 -05:00
Stuart Nelson 28f59edf16 Added telemetry for counting stored samples
Change-Id: I0f36f7c2738d070ca2f107fcb315f98e46803af3
2013-12-12 10:06:41 -05:00
Tobias Schmidt 6947ee9bc9 Try to create metrics root directory if missing
This change tries to be nice and create the metrics directoy first
before erroring out.

Change-Id: I72691cdc32469708cd671c6ef1fb7db55fe60430
2013-12-03 18:16:13 +07:00
Julius Volz 740d448983 Use custom timestamp type for sample timestamps and related code.
So far we've been using Go's native time.Time for anything related to sample
timestamps. Since the range of time.Time is much bigger than what we need, this
has created two problems:

- there could be time.Time values which were out of the range/precision of the
  time type that we persist to disk, therefore causing incorrectly ordered keys.
  One bug caused by this was:

  https://github.com/prometheus/prometheus/issues/367

  It would be good to use a timestamp type that's more closely aligned with
  what the underlying storage supports.

- sizeof(time.Time) is 192, while Prometheus should be ok with a single 64-bit
  Unix timestamp (possibly even a 32-bit one). Since we store samples in large
  numbers, this seriously affects memory usage. Furthermore, copying/working
  with the data will be faster if it's smaller.

*MEMORY USAGE RESULTS*
Initial memory usage comparisons for a running Prometheus with 1 timeseries and
100,000 samples show roughly a 13% decrease in total (VIRT) memory usage. In my
tests, this advantage for some reason decreased a bit the more samples the
timeseries had (to 5-7% for millions of samples). This I can't fully explain,
but perhaps garbage collection issues were involved.

*WHEN TO USE THE NEW TIMESTAMP TYPE*
The new clientmodel.Timestamp type should be used whenever time
calculations are either directly or indirectly related to sample
timestamps.

For example:
- the timestamp of a sample itself
- all kinds of watermarks
- anything that may become or is compared to a sample timestamp (like the timestamp
  passed into Target.Scrape()).

When to still use time.Time:
- for measuring durations/times not related to sample timestamps, like duration
  telemetry exporting, timers that indicate how frequently to execute some
  action, etc.

*NOTE ON OPERATOR OPTIMIZATION TESTS*
We don't use operator optimization code anymore, but it still lives in
the code as dead code. It still has tests, but I couldn't get all of them to
pass with the new timestamp format. I commented out the failing cases for now,
but we should probably remove the dead code soon. I just didn't want to do that
in the same change as this.

Change-Id: I821787414b0debe85c9fffaeb57abd453727af0f
2013-12-03 09:11:28 +01:00
Julius Volz 6b7de31a3c Upgrade to LevelDB 1.14.0 to fix LevelDB bugs.
This tentatively fixes https://github.com/prometheus/prometheus/issues/368 due
to an upstream bugfix in snapshotted LevelDB iterator handling, which got fixed
in LevelDB 1.14.0:

https://code.google.com/p/leveldb/issues/detail?id=200

Change-Id: Ib0cc67b7d3dc33913a1c16736eff32ef702c63bf
2013-12-03 09:07:15 +01:00
Julius Volz db015de65b Comment and "go fmt" fixups in compaction tests.
Change-Id: Iaa0eda6a22a5caa0590bae87ff579f9ace21e80a
2013-10-30 17:06:17 +01:00
Julius Volz 51408bdfe8 Merge changes I3ffeb091,Idffefea4
* changes:
  Add chunk sanity checking to dumper tool.
  Add compaction regression tests.
2013-10-24 13:58:14 +02:00
Julius Volz 2162e57784 Merge "Fix watermarker default time / LevelDB key ordering bug." 2013-10-24 13:57:48 +02:00
Julius Volz 5e18255920 Merge "Fix chunk corruption compaction bug." 2013-10-24 13:57:31 +02:00
Julius Volz eb461a707d Add chunk sanity checking to dumper tool.
Also, move codecs/filters to common location so they can be used in subsequent
test.

Change-Id: I3ffeb09188b8f4552e42683cbc9279645f45b32e
2013-10-23 01:06:49 +02:00
Julius Volz 6ea22f2bf9 Add compaction regression tests.
This adds regression tests that catch the two error cases reported in

  https://github.com/prometheus/prometheus/issues/367

It also adds a commented-out test case for the crash in

  https://github.com/prometheus/prometheus/issues/368

but there's no fix for the latter crash yet.

Change-Id: Idffefea4ed7cc281caae660bcad2e3c13ec3bd17
2013-10-23 01:06:28 +02:00
Conor Hennessy 9a48010cec Add a check for metrics directory existence.
Previously on startup the program would just quit without stating
explicitly why.

Change-Id: I833b85eb74d2dd27cdc3f0f2e65d7bb1c42caa39
2013-10-22 20:54:34 +02:00
Julius Volz b5f6e3c90c Fix watermarker default time / LevelDB key ordering bug.
This fixes part 2) of https://github.com/prometheus/prometheus/issues/367
(uninitialized time.Time mapping to a higher LevelDB key than "normal"
timestamps).

Change-Id: Ib079974110a7b7c4757948f81fc47d3d29ae43c9
2013-10-21 14:32:21 +02:00
Julius Volz a1a97ed064 Fix chunk corruption compaction bug.
This fixes part 1) of https://github.com/prometheus/prometheus/issues/367 (the
storing of samples with the wrong fingerprint into a compacted chunk, thus
corrupting it).

Change-Id: I4c36d0d2e508e37a0aba90b8ca2ecc78ee03e3f1
2013-10-21 14:30:22 +02:00
Matt T. Proud 86fcbe5bde Retain DTO on each cycle.
Change-Id: Ifc6f68f98eacb01097771d0dbf043c98bba1d518
2013-09-05 10:14:34 +02:00
Matt T. Proud 4a87c002e8 Update low-level i'faces to reflect wireformats.
This commit fixes a critique of the old storage API design, whereby
the input parameters were always as raw bytes and never Protocol
Buffer messages that encapsulated the data, meaning every place a
read or mutation was conducted needed to manually perform said
translations on its own.  This is taxing.

Change-Id: I4786938d0d207cefb7782bd2bd96a517eead186f
2013-09-04 17:13:58 +02:00
Matt T. Proud 7910f6e863 Prevent total storage locking during memory flush.
While a hack, this change should allow us to serve queries
expeditiously during a flush operation.

Change-Id: I9a483fd1dd2b0638ab24ace960df08773c4a5079
2013-08-29 11:33:38 +02:00
Matt T. Proud 12d5e6ca5a Curation should not starve user-interactive ops.
The background curation should be staggered to ensure that disk
I/O yields to user-interactive operations in a timely manner. The
lack of routine prioritization necessitates this.

Change-Id: I9b498a74ccd933ffb856e06fedc167430e521d86
2013-08-26 19:40:55 +02:00
Matt T. Proud 2b42fd0068 Snapshot of no more frontier.
Change-Id: Icd52da3f52bfe4529829ea70b4865ed7c9f6c446
2013-08-23 17:13:58 +02:00
Matt T. Proud 7db518d3a0 Abstract high watermark cache into standard LRU.
Conflicts:
	storage/metric/memory.go
	storage/metric/tiered.go
	storage/metric/watermark.go

Change-Id: Iab2aedbd8f83dc4ce633421bd4a55990fa026b85
2013-08-19 12:26:55 +02:00
Matt T. Proud d74c2c54d4 Interfacification of stream.
Move the stream to an interface, for a number of additional changes
around it are underway.

Conflicts:
	storage/metric/memory.go

Change-Id: I4a5fc176f4a5274a64ebdb1cad52600954c463c3
2013-08-16 17:35:21 +02:00
Matt T. Proud c262907fec Kill interface cruft.
These pieces were never used and should be thusly removed.

Change-Id: I8dd151ec4c40b6d3ccffad1bb9b8b75a92e9ee37
2013-08-15 11:39:07 +02:00
Matt T. Proud b23acccea8 Kill AppendSample interface definition.
AppendSample will be repcated with AppendSamples, which will take
advantage of bulks appends.  This is a necessary step for indexing
pipeline decoupling.

Change-Id: Ia83811a87bcc89973d3b64d64b85a28710253ebc
2013-08-15 11:35:50 +02:00
Matt T. Proud aaaf3367d6 Include forgotten imports.
This fixes the build.

Change-Id: Id132f4342adb9ed20116191086f157ca7f7cf515
2013-08-14 18:52:55 +02:00
Matt T. Proud acf91f38bd Build layered indexers.
The indexers will be extracted in a short while and wrapped accordingly with
these types.

Change-Id: I4d1abda4e46117210babad5aa0d42f9ca1f6594f
2013-08-14 13:32:53 +02:00
Matt T. Proud 972e856d9b Kill the curation state channel.
The use of the channels for curation state were always unidiomatic.

Change-Id: I1cb1d7175ebfb4faf28dff84201066278d6a0d92
2013-08-13 17:20:22 +02:00
Matt T. Proud 1ceb25b701 Publication of LevelDBMetricPersistence Fields.
This will enable us to break down the onerous construction method.

Change-Id: Ia89337ba39d6745af6757180af2485ec8a990a3b
2013-08-13 00:36:12 +02:00
Julius Volz 0003027dce Add needed trailing spaces in logs. 2013-08-12 18:22:48 +02:00
Julius Volz aa5d251f8d Use github.com/golang/glog for all logging. 2013-08-12 17:54:36 +02:00
Matt T. Proud a5141e4d0a Depointerize storage conf. and chain ingester.
The storage builders need to work with the assumption that they have
a copy of the underlying configuration data if any mutations are made.
2013-08-12 17:07:03 +02:00
Matt T. Proud 820e551988 Code Review: Nits. 2013-08-07 13:29:10 +02:00
Matt T. Proud a3bf2efdd5 Replace index writes with wrapped interface.
This commit is the first of several and should not be regarded as the
desired end state for these cleanups.  What this one does it, however,
is wrap the query index writing behind an interface type that can be
injected into the storage stack and have its lifecycle managed
separately as needed.  It also would mean we can swap out underlying
implementations to support remote indexing, buffering, no-op indexing
very easily.

In the future, most of the individual index interface members in the
tiered storage will go away in favor of agents that can query and
resolve what they need from the datastore without the user knowing
how and why they work.
2013-08-07 12:15:48 +02:00
Matt T. Proud 52664f701a Hot Fix: Use extracted time. 2013-08-06 14:18:02 +02:00
Matt T. Proud 38dac35b3e Code Review: Short name consistency. 2013-08-06 12:38:35 +02:00
Matt T. Proud a00f18d78b Code Review: Manual re-alignment. 2013-08-06 12:23:06 +02:00
Matt T. Proud cc989c68e1 Replace direct curation table access with wrapper. 2013-08-06 12:02:52 +02:00
Matt T. Proud 07ac921aec Code Review: First pass. 2013-08-05 17:31:49 +02:00
Matt T. Proud d8792cfd86 Extract HighWatermarking.
Clean up the rest.
2013-08-05 11:03:03 +02:00
Matt T. Proud f4669a812c Extract index storage into separate types. 2013-08-04 15:31:52 +02:00
Matt T. Proud 772d3d6b11 Consolidate LevelDB storage construction.
There are too many parameters to constructing a LevelDB storage
instance for a construction method, so I've opted to take an
idiomatic approach of embedding them in a struct for easier
mediation and versioning.
2013-08-03 17:25:03 +02:00
Julius Volz e3415e953f Add notifications telemetry. 2013-07-31 12:40:56 +02:00
juliusv 927435d68e Merge pull request #333 from prometheus/round-time
Round time to nearest second in memory storage.
2013-07-16 05:52:31 -07:00
Julius Volz 5d88e8cc45 Round time to nearest second in memory storage.
When samples get flushed to disk, they lose sub-second precision anyways. By
already dropping sub-second precision, data fetched from memory vs. disk will
behave the same. Later, we should consider also storing a more compact
representation than time.Time in memory if we're not going to use its full
precision.
2013-07-16 14:51:54 +02:00
Matt T. Proud f7704af4f8 Code Review: Formatting comments. 2013-07-15 15:12:01 +02:00
Julius Volz a76a797f3f Always treat series without watermarks as too old.
Current series always get watermarks written out upon append now. This
drops support for old series without any watermarks by always reporting
them as too old (stale) during queries.
2013-06-27 17:10:06 +02:00
Julius Volz d2da21121c Implement getValueRangeAtIntervalOp for faster range queries.
This also short-circuits optimize() for now, since it is complex to implement
for the new operator, and ops generated by the query layer already fulfill the
needed invariants. We should still investigate later whether to completely
delete operator optimization code or extend it to support
getValueRangeAtIntervalOp operators.
2013-06-26 18:10:36 +02:00
Julius Volz e7f049c85b Fix expunging of empty memory series (loop var pointerization bug) 2013-06-26 18:00:47 +02:00
Julius Volz baa5b07829 Fix condition for dropping empty memory series. 2013-06-25 17:57:35 +02:00
Matt T. Proud 30b1cf80b5 WIP - Snapshot of Moving to Client Model. 2013-06-25 15:52:42 +02:00
juliusv 42198c1f1c Merge pull request #311 from prometheus/fix/watermarking/on-first-write
Ensure new metrics are watermarked early.
2013-06-25 03:13:58 -07:00
Matt T. Proud b811ccc161 Disable paranoid checks and expose max FDs option.
We shouldn't need paranoid checks now.  We also shouldn't need
too many FDs being open due to rule evaluator hitting in-memory
values stream.
2013-06-24 12:10:14 +02:00
Matt T. Proud 4137c75523 Shrink default LRU cache sizes.
Observing Prometheus in production confirms we can lower these values
safely.
2013-06-24 12:09:16 +02:00
Matt T. Proud ecb9c7bb9d Code Review: Swap ordering of elements. 2013-06-21 21:17:50 +02:00
Matt T. Proud 5daa0a09ea Code Review: Swap ordering of watermark getting.
A test for Julius.
2013-06-21 18:34:08 +02:00
Matt T. Proud ee840904d2 Code Review: !Before -> After. 2013-06-21 18:26:40 +02:00
Matt T. Proud 2d5de99fbf Regard in-memory series as new.
This commit ensures that series that exist only in-memory and not
on-disk are not regarded as too old for operation exclusion.
2013-06-21 18:26:39 +02:00
Matt T. Proud 81c406630a Merge pull request #312 from prometheus/fix/sample-append-logging
Log correct sample count when appending to disk.
2013-06-21 08:55:51 -07:00
Matt T. Proud a1a23fbaf8 Ensure new metrics are watermarked early.
With the checking of fingerprint freshness to cull stale metrics
from queries, we should write watermarks early to aid in more
accurate responses.
2013-06-21 16:38:46 +02:00
Julius Volz ba8c122147 Log correct sample count when appending to disk. 2013-06-21 12:23:27 +02:00
Julius Volz f2b4067b7b Speedup and clean up operation optimization. 2013-06-20 03:01:13 +02:00
Julius Volz 008bc09da8 Move check for empty memory series to separate method. 2013-06-19 14:19:53 +02:00
Julius Volz 16364eda37 Drop empty series from memory after flushing. 2013-06-19 12:14:23 +02:00
Julius Volz 71199e2c93 Cache disk fingerprint->metric lookups in memory. 2013-06-18 14:08:58 +02:00
Matt T. Proud a73f061d3c Persist solely Protocol Buffers.
An design question was open for me in the beginning was whether to
serialize other types to disk, but Protocol Buffers quickly won out,
which allows us to drop support for other types.  This is a good
start to cleaning up a lot of cruft in the storage stack and
can let us eventually decouple the various moving parts into
separate subsystems for easier reasoning.

This commit is not strictly required, but it is a start to making
the rest a lot more enjoyable to interact with.
2013-06-08 11:02:35 +02:00
juliusv 95400cb785 Merge pull request #290 from prometheus/fix/go-vet
Minor "go tool vet" cleanups
2013-06-07 06:52:48 -07:00
Julius Volz 558281890b Minor "go tool vet" cleanups 2013-06-07 15:34:41 +02:00
juliusv 615972dd01 Merge pull request #288 from prometheus/fix/curator/fallthrough-compaction-ordering
Fix fallthrough compaction value ordering.
2013-06-07 05:46:15 -07:00
Matt T. Proud 86f63b078b Fix fallthrough compaction value ordering.
We discovered a regression whereby data chunks could be appended out
of order if the fallthrough case was hit.
2013-06-07 14:41:00 +02:00
Julius Volz 7b9ee95030 Minor LevelDB watermark handling cleanups. 2013-06-06 23:56:31 +02:00
Julius Volz 84741b227d Use LRU cache to avoid querying stale series. 2013-06-06 23:56:19 +02:00
Julius Volz f98853d7b7 Fix type error in watermark list handling. 2013-06-06 23:56:14 +02:00
Matt T. Proud ef1d5fd8a2 Introduce semaphores for tiered storage.
This commit wraps the tiered storage access componnets in semaphores,
since we can handle several concurrent memory reads.
2013-06-06 18:16:18 +02:00
Matt T. Proud 819045541e Code Review: Make double-drain a panic. 2013-06-06 12:40:06 +02:00
Matt T. Proud e217a9fb41 Race Work: Make memory arena locks more coarse.
We can optimize these as needed later.
2013-06-06 12:08:20 +02:00
Matt T. Proud beaaf386e7 Add storage state guards and transition callbacks.
To ensure that we access tiered storage in the proper way, we have
guards now.
2013-06-06 11:52:09 +02:00
Matt T. Proud abb5353ade Merge pull request #283 from prometheus/feature/storage/consult-watermark
Include LRU cache for fingerprint watermarks.
2013-06-06 02:33:45 -07:00
Matt T. Proud 2c3df44af6 Ensure database access waits until it is started.
This commit introduces a channel message to ensure serving
state has been reached with the storage stack before anything attempts
to use it.
2013-06-06 10:42:21 +02:00
Matt T. Proud cbe2f3a7b1 Include LRU cache for fingerprint watermarks. 2013-06-06 10:13:18 +02:00
Julius Volz 51689d965d Add debug timers to instant and range queries.
This adds timers around several query-relevant code blocks. For now, the
query timer stats are only logged for queries initiated through the UI.
In other cases (rule evaluations), the stats are simply thrown away.

My hope is that this helps us understand where queries spend time,
especially in cases where they sometimes hang for unusual amounts of
time.
2013-06-05 18:32:54 +02:00
Matt T. Proud 8339a189cb Code Review: Fix seriesPresent scope.
The seriesPresent scope should be constrained to the scope of a
scanJob, since this is keyed to given series.
2013-06-04 13:16:59 +02:00
Matt T. Proud fe41ce0b19 Conditionalize disk initializations.
This commit conditionalizes the creation of the diskFrontier and
seriesFrontier along with the iterator such that they are provisioned
once something is actually required from disk.
2013-06-04 12:53:57 +02:00
Julius Volz a8468a2e5e Fix reversed disk flush cutoff behavior. 2013-05-28 16:14:30 +02:00
Julius Volz eb1f956909 Revert "Revert "Ensure that all extracted samples are added to view.""
This reverts commit 4b30fb86b4.
2013-05-28 14:36:03 +02:00
Matt T. Proud 4b30fb86b4 Revert "Ensure that all extracted samples are added to view."
This reverts commit 008314b5a8. By
running an automated git bisection described in
https://gist.github.com/matttproud-soundcloud/22a371a8d2cba382ea64
this commit was found.
2013-05-23 13:36:22 +02:00
Julius Volz 750f862d9a Use GetBoundaryValues() for non-counter deltas. 2013-05-22 19:13:47 +02:00
Julius Volz f2b48b8c4a Make getValuesAtIntervalOp consume all chunk data in one pass.
This is mainly a small performance improvement, since we skip past the last
extracted time immediately if it was also the last sample in the chunk, instead
of trying to extract non-existent values before the chunk end again and again
and only gradually approaching the end of the chunk.
2013-05-22 18:14:45 +02:00
Julius Volz 83d60bed89 extractValuesAroundTime() code simplification. 2013-05-22 18:14:45 +02:00
Julius Volz 008314b5a8 Ensure that all extracted samples are added to view.
The current behavior only adds those samples to the view that are extracted by
the last pass of the last processed op and throws other ones away. This is a
bug. We need to append all samples that are extracted by each op pass.

This also makes view.appendSamples() take an array of samples.
2013-05-22 18:14:37 +02:00
Matt T. Proud b586801830 Code Review: Fix to-disk queue infinite growth.
We discovered a bug while manually testing this branch on a live
instance, whereby the to-disk queue was never actually dumped to
disk.
2013-05-22 17:59:53 +02:00
Matt T. Proud 285a8b701b Code Review: Extend lock. 2013-05-22 17:59:53 +02:00
Matt T. Proud 2526ab8c81 Code Review: Extend lock scope for appending. 2013-05-22 17:59:53 +02:00
Matt T. Proud f994482d15 Code Review: Avenues for future improvemnet noted. 2013-05-22 17:59:53 +02:00
Matt T. Proud 298a90c143 Code Review: Initial arena size name. 2013-05-22 17:59:53 +02:00
Matt T. Proud c07abf8521 Initial move away from skiplist. 2013-05-22 17:59:53 +02:00
Matt T. Proud 74a66fd938 Spawn grouping of fingerprints with free semaphore.
The previous implementation spawned N goroutines to group samples
together and would not start work until the semaphore unblocked.
While this didn't leak, it polluted the scheduling space.  Thusly,
the routine only starts after a semaphore has been acquired.
2013-05-21 16:11:35 +02:00
Julius Volz 5b105c77fc Repointerize fingerprints. 2013-05-21 14:28:14 +02:00
Matt T. Proud ec5b5bae28 Fuck you, Travis. 2013-05-21 09:42:00 +02:00
Matt T. Proud e5ac91222b Benchmark memory arena; simplify map generation.
The one-off keys have been replaced with ``model.LabelPair``, which is
indexable.  The performance impact is negligible, but it represents
a cognitive simplification.
2013-05-21 09:39:12 +02:00
juliusv 360477f66c Merge pull request #257 from prometheus/feature/better-memory-behaviors
Pointerize memorySeriesArena.
2013-05-16 07:36:40 -07:00
Matt T. Proud e1f20de2e9 Pointerize memorySeriesArena. 2013-05-16 17:09:28 +03:00
Matt T. Proud 8f4c7ece92 Destroy naked returns in half of corpus.
The use of naked return values is frowned upon.  This is the first
of two bulk updates to remove them.
2013-05-16 10:53:25 +03:00
Matt T. Proud 4e0c932a4f Simplify Encoder's encoding signature.
The reality is that if we ever try to encode a Protocol Buffer and it
fails, it's likely that such an error is ultimately not a runtime error
and should be fixed forthwith.  Thusly, we should rename
``Encoder.Encode`` to ``Encoder.MustEncode`` and drop the error return
value.
2013-05-16 00:54:18 +03:00
juliusv 516101f015 Merge pull request #250 from prometheus/refactor/drop-unused-storage-setting
Drop unused writeMemoryInterval
2013-05-14 08:45:59 -07:00
juliusv 9ff00b651d Merge pull request #251 from prometheus/fix/memory-metric-mutability
Fix GetMetricForFingerprint() metric mutability.
2013-05-14 08:12:45 -07:00
Bernerd Schaefer 63d9988b9c Drop unused writeMemoryInterval 2013-05-14 17:03:03 +02:00
Julius Volz 83c60ad43a Fix GetMetricForFingerprint() metric mutability.
Some users of GetMetricForFingerprint() end up modifying the returned metric
labelset. Since the memory storage's implementation of
GetMetricForFingerprint() returned a pointer to the metric (and maps are
reference types anyways), the external mutation propagated back into the memory
storage.

The fix is to make a copy of the metric before returning it.
2013-05-14 16:46:30 +02:00
Bernerd Schaefer 428d91c86f Rename test helper files to helpers_test.go
This ensures that these files are properly included only in testing.
2013-05-14 16:30:47 +02:00
juliusv 98e512d755 Merge pull request #246 from prometheus/fix/interval-value-extraction
Fix and optimize getValuesAtIntervalOp data extraction.
2013-05-14 05:55:22 -07:00
Julius Volz 71a3172abb Fix and optimize getValuesAtIntervalOp data extraction.
- only the data extracted in the last loop iteration of ExtractSamples() was
  emitted as output
- if e.g. op interval < sample interval, there were situations where the same
  sample was added multiple times to the output
2013-05-14 13:55:17 +02:00
Matt T. Proud 244a4a9cdb Update to go1.1.
This commit updates the documentation, Makefiles, formatting, and
code semantics to support the 1.1. runtime, which includes ...

1. ``make advice``,

2. ``make format``, and

3. ``go fix`` on various targets.
2013-05-14 12:39:08 +02:00
Matt T. Proud b224251981 Simplify compaction and expose database sizes.
This commit simplifies the way that compactions across a database's
keyspace occur due to reading the LevelDB internals. Secondarily it
introduces the database size estimation mechanisms.

Include database health and help interfaces.

Add database statistics; remove status goroutines.

This commit kills the use of Go routines to expose status throughout
the web components of Prometheus. It also dumps raw LevelDB status
on a separate /databases endpoint.
2013-05-14 12:29:53 +02:00
juliusv 92ad65ff13 Merge pull request #232 from prometheus/optimize/granular-storage-locking
Synchronous memory appends and more fine-grained storage locks.
2013-05-13 10:11:57 -07:00
Matt T. Proud 1f7f89b4e3 Simplify compaction and expose database sizes.
This commit simplifies the way that compactions across a database's
keyspace occur due to reading the LevelDB internals.  Secondarily it
introduces the database size estimation mechanisms.
2013-05-13 13:15:35 +02:00
Matt T. Proud d538b0382f Include long-tail data deletion mechanism.
This commit introduces the long-tail deletion mechanism, which will
automatically cull old sample values.  It is an acceptable
hold-over until we get a resampling pipeline implemented.

Kill legacy OS X documentation, too.
2013-05-13 10:54:36 +02:00
Julius Volz ce1ee444f1 Synchronous memory appends and more fine-grained storage locks.
This does two things:

1) Make TieredStorage.AppendSamples() write directly to memory instead of
   buffering to a channel first. This is needed in cases where a rule might
   immediately need the data generated by a previous rule.

2) Replace the single storage mutex by two new ones:
   - memoryMutex - needs to be locked at any time that two concurrent
                   goroutines could be accessing (via read or write) the
                   TieredStorage memoryArena.
   - memoryDeleteMutex - used to prevent any deletion of samples from
                         memoryArena as long as renderView is running and
                         assembling data from it.
   The LevelDB disk storage does not need to be protected by a mutex when
   rendering a view since renderView works off a LevelDB snapshot.

The rationale against adding memoryMutex directly to the memory storage: taking
a mutex does come with a small inherent time cost, and taking it is only
required in few places. In fact, no locking is required for the memory storage
instance which is part of a view (and not the TieredStorage).
2013-05-10 17:15:52 +02:00
Matt T. Proud fa6a1f97d0 Expose interfaces for pruner and make pruner tool.
In order to run database cleanups and diagnostics, we should have
a means for pruning a database---even if LevelDB does this for us.
2013-05-10 17:07:03 +02:00
Matt T. Proud 161c8fbf9b Include deletion processor for long-tail values.
This commit extracts the model.Values truncation behavior into the actual
tiered storage, which uses it and behaves in a peculiar way—notably the
retention of previous elements if the chunk were to ever go empty.  This is
done to enable interpolation between sparse sample values in the evaluation
cycle.  Nothing necessarily new here—just an extraction.

Now, the model.Values TruncateBefore functionality would do what a user
would expect without any surprises, which is required for the
DeletionProcessor, which may decide to split a large chunk in two if it
determines that the chunk contains the cut-off time.
2013-05-10 12:19:12 +02:00
Matt Proud 7f0d816574 Schedule the background compactors to run.
This commit introduces three background compactors, which compact
sparse samples together.

1. Older than five minutes is grouped together into chunks of 50 every 30
   minutes.

2. Older than 60 minutes is grouped together into chunks of 250 every 50
   minutes.

3. Older than one day is grouped together into chunks of 5000 every 70
   minutes.
2013-05-07 17:14:04 +02:00
Julius Volz caab131ada Repointerize TieredStorage method receiver types. 2013-05-07 15:12:33 +02:00
juliusv 89de116ea9 Merge pull request #225 from prometheus/refactor/fmt-cleanups
Slice expression simplifications.
2013-05-07 04:27:27 -07:00
Julius Volz 05afa970d2 Slice expression simplifications. 2013-05-07 13:22:29 +02:00
Matt T. Proud f897164bcf Expose TieredStorage.DiskStorage. 2013-05-07 10:26:28 +02:00
Matt T. Proud ce45787dbf Storage interface to TieredStorage.
This commit drops the Storage interface and just replaces it with a
publicized TieredStorage type.  Storage had been anticipated to be
used as a wrapper for testability but just was not used due to
practicality.  Merely overengineered.  My bad.  Anyway, we will
eventually instantiate the TieredStorage dependencies in main.go and
pass them in for more intelligent lifecycle management.

These changes will pave the way for managing the curators without
Law of Demeter violations.
2013-05-03 15:54:14 +02:00
Bernerd Schaefer 5eb9840ed7 Fix goroutine leak in leveldb.AppendSamples
The error channels in AppendSamples need to be buffered, since in the
presence of errors their values may not be consumed.
2013-05-03 12:13:05 +02:00
Matt T. Proud a3f1d81e24 Publicize a few storage components for curation.
This commit introduces the publicization of Stop and other
components, which the compaction curator shall take advantage
of.
2013-05-02 13:16:04 +02:00
Matt T. Proud 4298bab2b0 Publicize Curator and Processors.
This commit publicizes the curation and processor frameworks for
purposes of making them available in the main processor loop.
2013-05-02 12:37:24 +02:00
Julius Volz 368a792dd2 Adjust memory queue size after change to send arrays over channel. 2013-04-30 13:41:04 +02:00
juliusv b02debd69c Merge pull request #205 from prometheus/julius-channel-arrays
Send sample arrays instead of single samples over channels.
2013-04-29 09:05:05 -07:00
Julius Volz d8110fcd9c Send sample arrays instead of single samples over channels. 2013-04-29 17:24:17 +02:00
Matt T. Proud 3362bf36e2 Include curator status in web heads-up-display. 2013-04-29 12:40:33 +02:00
Matt T. Proud 6fac20c8af Harden the tests against OOMs.
This commit employs explicit memory freeing for the in-memory storage
arenas.  Secondarily, we take advantage of smaller channel buffer sizes
in the test.
2013-04-29 11:46:01 +02:00
Matt T. Proud 66bc3711ea Merge pull request #197 from prometheus/feature/storage/curation-table
Add curation remark table and refactor error mgmt.
2013-04-29 01:01:33 -07:00
Matt T. Proud d46cd089b5 Merge pull request #199 from prometheus/refactor/telemetry/api-refresh
Refresh Prometheus client API usage.
2013-04-28 22:50:30 -07:00
Matt T. Proud 3fa260f180 Complete sentence. 2013-04-28 20:26:44 +02:00
Matt T. Proud e527941b6a Use tagged struct fields. 2013-04-28 20:09:30 +02:00
Matt T. Proud a48ab34dd0 Refresh Prometheus client API usage.
The client API has been updated per https://github.com/prometheus/client_golang/pull/9.
2013-04-28 19:40:30 +02:00
Matt T. Proud 561974308d Add curation remark table and refactor error mgmt.
The curator requires the existence of a curator remark table, which
stores the progress for a given curation policy.  The tests for the
curator create an ad hoc table, but core Prometheus presently lacks
said table, which this commit adds.

Secondarily, the error handling for the LevelDB lifecycle functions
in the metric persistence have been wrapped into an UncertaintyGroup,
which mirrors some of the functions of sync.WaitGroup but adds error
capturing capability to the mix.
2013-04-28 17:26:34 +02:00
Matt T. Proud b3e34c6658 Implement batch database sample curator.
This commit introduces to Prometheus a batch database sample curator,
which corroborates the high watermarks for sample series against the
curation watermark table to see whether a curator of a given type
needs to be run.

The curator is an abstract executor, which runs various curation
strategies across the database.  It remarks the progress for each
type of curation processor that runs for a given sample series.

A curation procesor is responsible for effectuating the underlying
batch changes that are request.  In this commit, we introduce the
CompactionProcessor, which takes several bits of runtime metadata and
combine sparse sample entries in the database together to form larger
groups.  For instance, for a given series it would be possible to
have the curator effectuate the following grouping:

- Samples Older than Two Weeks: Grouped into Bunches of 10000
- Samples Older than One Week: Grouped into Bunches of 1000
- Samples Older than One Day: Grouped into Bunches of 100
- Samples Older than One Hour: Grouped into Bunches of 10

The benefits hereof of such a compaction are 1. a smaller search
space in the database keyspace, 2. better employment of compression
for repetious values, and 3. reduced seek times.
2013-04-27 17:38:18 +02:00
Julius Volz 2202cd71c9 Track alerts over time and write out alert timeseries. 2013-04-26 14:35:21 +02:00
Johannes 'fish' Ziemke 1ad41d4c00 Call closer.Close() earlier. 2013-04-25 13:29:28 +02:00
Johannes 'fish' Ziemke 22da76e8ab Close of reportTicker to exit goroutine. 2013-04-25 13:29:22 +02:00
Johannes 'fish' Ziemke 5043c6fce7 Have goroutine exit on signal via defer block. 2013-04-25 12:14:38 +02:00
juliusv af7ddc36e2 Merge pull request #176 from prometheus/optimization/view-materialization/slice-chunking
Truncate irrelevant chunk values.
2013-04-24 05:19:54 -07:00
Julius Volz 9b8c671ec9 Fixes/cleanups to renderView() samples truncation. 2013-04-24 12:42:58 +02:00
Matt T. Proud 05504d3642 WIP - Truncate irrelevant chunk values.
This does not work with the view tests.
2013-04-24 11:07:22 +02:00
Matt T. Proud a32602140e Convert the TestInstant value into UTC.
For the forthcoming Curator, we don't record timezone information in
the samples, nor do we in the curation remarks.  All times are
recorded UTC.  That said, for the test environment to better match
production, the special instant should be in UTC.
2013-04-23 18:58:39 +02:00
Matt T. Proud b1a8e51b07 Extract dto.SampleValueSeries into model.Values. 2013-04-22 13:31:11 +02:00
Matt T. Proud 422003da8e Convert trailing float64s. 2013-04-21 20:52:21 +02:00
Matt T. Proud db4ffbb262 Wrap dto.SampleKey with business logic type.
The curator work can be done easier if dto.SampleKey is no longer
directly accessed but rather has a higher level type around it that
captures a certain modicum of business logic.  This doesn't look
terribly interesting today, but it will get more so.
2013-04-21 20:38:39 +02:00
Matt T. Proud f9e99bd08a Refresh SampleValue to 64-bit floating point.
We always knew that this needed to be fixed.
2013-04-21 20:31:50 +02:00
Matt T. Proud 092c7bd88e Stochastic test support plural SampleValueSeries.
After SampleValue was refactored into SampleValueSeries, which
involves plural values under a common super key, the stochastic
test was never refreshed to reflect this reality.  We had other
tests that validated the functionality, but this one was
insufficently forward-ported.
2013-04-21 20:31:32 +02:00
Julius Volz 99dcbe0f94 Integrate memory and disk layers in view rendering. 2013-04-19 16:01:27 +02:00
Julius Volz 63625bd244 Make view use memory persistence, remove obsolete code.
This makes the memory persistence the backing store for views and
adjusts the MetricPersistence interface accordingly. It also removes
unused Get* method implementations from the LevelDB persistence so they
don't need to be adapted to the new interface. In the future, we should
rethink these interfaces.

All staleness and interpolation handling is now removed from the storage
layer and will be handled only by the query layer in the future.
2013-04-18 22:26:29 +02:00
Matt T. Proud d468271e2f Fix append queue telemetry and parameterize sizes.
The original append queue telemetry never worked, because it was
updated only upon the exit of the select statement, which would
usually liberate the queues of contents.  This has been fixed to
be reported arbitrarily.

The queue sizes are now parameterizable via flags.
2013-04-16 17:13:29 +02:00
Julius Volz 95b081f9bc Stop serving tiered storage after draining it. 2013-04-15 13:30:03 +02:00
Matt T. Proud a55602df4a Validate diskFrontier domain for series candidate.
It is the case with the benchmark tool that we thought that we
generated multiple series and saved them to the disk as such, when
in reality, we overwrote the fields of the outgoing metrics via
Go map reference behavior.  This was accidental.  In the course of
diagnosing this, a few errors were found:

1. ``newSeriesFrontier`` should check to see if the candidate fingerprint is within the given domain of the ``diskFrontier``.  If not, as the contract in the docstring stipulates, a ``nil`` ``seriesFrontier`` should be emitted.

2. In the interests of aiding debugging, the raw LevelDB ``levigoIterator`` type now includes a helpful forensics ``String()`` method.

This work produced additional cleanups:

1. ``Close() error`` with the storage stack is technically incorrect, since nowhere in the bowels of it does an error actually occur.  The interface has been simplified to remove this for now.
2013-04-09 11:47:16 +02:00
Matt T. Proud d79c932a8e Merge pull request #120 from prometheus/feature/storage/compaction
Spin up curator run in the tests.
2013-04-05 04:55:59 -07:00
Matt T. Proud c3e3460ca6 Spin up curator run in the tests.
After this commit, we'll need to add validations that it does the
desired work, which we presently know that it doesn't.  Given the
changes I made with a plethora of renamings, I want to commit this
now before it gets even larger.
2013-04-05 13:55:11 +02:00
Matt T. Proud 461da0b3a8 Merge pull request #117 from prometheus/feature/storage/compaction
Spin up storage layers for made fixtures.
2013-04-03 04:41:52 -07:00
Matt T. Proud d0ad6cbeaa Spin up storage layers for made fixtures. 2013-04-03 12:09:05 +02:00
Julius Volz c59f3fc538 Fix formatting in tiered_test.go. 2013-03-28 12:16:31 +01:00
juliusv 39826d7335 Merge pull request #107 from prometheus/julius-fix-get-fingerprints
Fix bug in GetFingerprintsForLabelSet().
2013-03-27 10:54:17 -07:00
Julius Volz 2668700e54 Fix bug in GetFingerprintsForLabelSet(). 2013-03-27 18:50:30 +01:00
Matt T. Proud c53a72a894 Test data for the curator. 2013-03-27 18:13:43 +01:00
Matt T. Proud 6dcaa28806 Include LevelDB fixture generators for curator.
This will help reduce common boilerplate for our test process
with respect to LevelDB-related things.
2013-03-27 15:13:40 +01:00
Julius Volz 55ca65aa6e More userfriendly output when we fail to create the tiered storage. 2013-03-27 11:25:05 +01:00
Matt T. Proud c4e971d7d9 Merge pull request #101 from prometheus/refactor/test/directory-extraction
Create temporary directory handler.
2013-03-26 10:46:28 -07:00
Matt T. Proud b86b0ea41a Create temporary directory handler. 2013-03-26 18:09:25 +01:00
Julius Volz 8cf2af3923 Abort view job processing on timeout. 2013-03-26 17:18:51 +01:00
Julius Volz 2b8f0b2cc7 Constantize metric name label name. 2013-03-26 16:20:23 +01:00
Julius Volz e096896932 PR comment fixups. 2013-03-26 15:28:00 +01:00
Julius Volz dd67ab115b Change GetAllMetricNames() to GetAllValuesForLabel(). 2013-03-26 14:47:07 +01:00
Julius Volz 42bdf921d1 Fetch integrated memory/disk data for simple Get* functions. 2013-03-26 14:47:07 +01:00
Julius Volz 11bb94a7e5 Implement GetAllMetricNames() for memory storage. 2013-03-26 14:47:07 +01:00
Julius Volz 991dc68d78 Rename misnamed oldestSampleTimestamp variable. 2013-03-26 11:56:10 +01:00
Matt T. Proud 3e97a3630d Include nascent curator scaffolding.
The curator doesn't do anything yet; rather, this is the type
definition including the anciliary testing scaffold.

Improve Makefile and Git developer experience.

The top-level Makefile was a bit overloaded in terms of generation of
assets and their management.  This has been offloaded into separate
Makefiles.

The Git developer experience sucked due to lack of .gitignore
policies.

Also: Fix faulty skiplist naming from old merge.
2013-03-25 19:38:14 +01:00
Matt T. Proud b2e4c88b80 Wrap LevelDB iterator operations behind interface.
The LevelDB storage types return an interface type now that wraps
around the underlying iterator.  This both enhances testability but
improves upon, in my opinion, the interface design for the LevelDB
iterator.

Secondarily, the resource reaping behaviors for the LevelDB iterators
have been improved by dropping the externalized io.Closer object.

Finally, the iterator provisioning methods provide the option for
indicating whether one wants a snapshotted iterator or not.
2013-03-25 12:57:58 +01:00
Matt T. Proud f2a30cf20c Several important cleanups and deprecations.
EachFunc is deprecated.

Remove deprecated ``Pair`` and ``GetAll``.

These were originally used for forensic and the old gorest impl.
Nothing today in the user-facing path nor the tests uses them,
especially since the advent of the ForEach protocol in the
interface.
2013-03-25 08:38:21 +01:00
Matt T. Proud 70448711ec Merge pull request #95 from prometheus/feature/persistence/batching
Several interface cleanups.
2013-03-24 00:19:46 -07:00
Matt T. Proud 8f6b55be71 Several interface cleanups.
- Kill Close in Persistent and document interface.
 - Extract batching behavior into interface.
 - Kill IteratorManager, which was used for unknown reasons.
2013-03-24 07:35:43 +01:00
Julius Volz a33d2726bc Mark range op as consumed if it receives no data points in range. 2013-03-22 11:50:02 +01:00
Julius Volz 3c9d6cb66c Add several needed persistence proxy methods to tiered storage. 2013-03-21 18:16:43 +01:00
Julius Volz 081d250929 Fix view's GetRangeValues() reverse iteration behavior. 2013-03-21 18:16:31 +01:00
Julius Volz 0be0aa59c2 Wait until storage is drained before closing the underlying leveldb. 2013-03-21 18:16:07 +01:00
Julius Volz becc278eb6 Fix two bugs in range op time advancement. 2013-03-21 18:15:52 +01:00
Matt T. Proud ceb6611957 Fix regression in subsequent range op. compactions.
We have an anomaly whereby subsequent range operations fail to be
compacted into one single range operation.  This fixes such
behavior.
2013-03-21 18:11:04 +01:00
Matt T. Proud 669abdfefe ``make format`` invocation. 2013-03-21 18:11:04 +01:00
Julius Volz bdb067b47f Implement remaining View Get* methods. 2013-03-21 18:11:04 +01:00
Julius Volz 1f42364733 Fix typo in comment. 2013-03-21 18:11:03 +01:00
Matt T. Proud 758a3f0764 Add documentation and cull junk. 2013-03-21 18:11:03 +01:00
Matt T. Proud bd8bb0edfd One additional reduction. 2013-03-21 18:11:03 +01:00
Matt T. Proud 73b463e814 Additional simplifications. 2013-03-21 18:11:03 +01:00
Matt T. Proud fd47ac570f Implied simplifications. 2013-03-21 18:11:03 +01:00