Commit Graph

52 Commits (22890b1eb3eafb64a8fc37aa39e05aad1c10a676)

Author SHA1 Message Date
Marco Pracucci 501bc6419e
Add ShardedPostings() support to TSDB (#10421)
This PR is a reference implementation of the proposal described in #10420.

In addition to what described in #10420, in this PR I've introduced labels.StableHash(). The idea is to offer an hashing function which doesn't change over time, and that's used by query sharding in order to get a stable behaviour over time. The implementation of labels.StableHash() is the hashing function used by Prometheus before stringlabels, and what's used by Grafana Mimir for query sharding (because built before stringlabels was a thing).

Follow up work
As mentioned in #10420, if this PR is accepted I'm also open to upload another foundamental piece used by Grafana Mimir query sharding to accelerate the query execution: an optional, configurable and fast in-memory cache for the series hashes.

Signed-off-by: Marco Pracucci <marco@pracucci.com>
2024-01-29 11:57:27 +00:00
Bryan Boreham 3f30ad3cc2
Merge pull request #13015 from bboreham/smaller-txring
tsdb: make transaction isolation data structures smaller
2024-01-25 10:48:15 +00:00
Matthieu MOREL 8f6cf3aabb tsdb: use Go standard errors
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-12-11 12:18:54 +00:00
Fiona Liao 5bee0cfce2
Change `ChunkReader.Chunk()` to `ChunkOrIterable()`
The ChunkReader interface's Chunk() has been changed to ChunkOrIterable(). 

This is a precursor to OOO native histogram support - with OOO native histograms, the chunks.Meta passed to Chunk() can result in multiple chunks being returned rather than just a single chunk (e.g. if oooMergedChunk has a counter reset in the middle). 

To support this, ChunkOrIterable() requires either a single chunk or an iterable to be returned. If an iterable is returned, the caller has the responsibility of converting the samples from the iterable into possibly multiple chunks. The OOOHeadChunkReader now returns an iterable rather than a chunk to prepare for the native histograms case. Also as a beneficial side effect, oooMergedChunk and boundedChunk has been simplified as they only need to implement the Iterable interface now, not the full Chunk interface.

---------

Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
2023-11-28 11:14:29 +01:00
Oleksandr Redko fa90ca46e5 ci(lint): enable godot; append dot at the end of comments
Signed-off-by: Oleksandr Redko <Oleksandr_Redko@epam.com>
2023-10-31 19:53:38 +02:00
Bryan Boreham 6fe8217ce4 tsdb: shrink txRing with smaller integers
4 billion active transactions ought to be enough for anyone.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-10-21 12:44:34 +00:00
Goutham Veeramachaneni 86729d4d7b
Update exp package (#12650) 2023-09-21 22:53:51 +02:00
Arve Knudsen 156222cc50
Add context argument to LabelQuerier.LabelValues (#12665)
Add context argument to LabelQuerier.LabelValues and
LabelQuerier.SortedLabelValues.

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2023-09-14 16:02:04 +02:00
Arve Knudsen a964349e97
Add context argument to LabelQuerier.LabelNames (#12666)
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2023-09-14 10:39:51 +02:00
Arve Knudsen 4451ba10b4
Add context argument to IndexReader.Postings (#12667)
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2023-09-13 17:45:06 +02:00
Łukasz Mierzwa 3c80963e81
Use a linked list for memSeries.headChunk (#11818)
Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks.
When append() is called on memSeries it might decide that a new headChunk is needed to use for given append() call.
If that happens it will first mmap existing head chunk and only after that happens it will create a new empty headChunk and continue appending
our sample to it.

Since appending samples uses write lock on memSeries no other read or write can happen until any append is completed.
When we have an append() that must create a new head chunk the whole memSeries is blocked until mmapping of existing head chunk finishes.
Mmapping itself uses a lock as it needs to be serialised, which means that the more chunks to mmap we have the longer each chunk might wait
for it to be mmapped.
If there's enough chunks that require mmapping some memSeries will be locked for long enough that it will start affecting
queries and scrapes.
Queries might timeout, since by default they have a 2 minute timeout set.
Scrapes will be blocked inside append() call, which means there will be a gap between samples. This will first affect range queries
or calls using rate() and such, since the time range requested in the query might have too few samples to calculate anything.

To avoid this we need to remove mmapping from append path, since mmapping is blocking.
But this means that when we cut a new head chunk we need to keep the old one around, so we can mmap it later.
This change makes memSeries.headChunk a linked list, memSeries.headChunk still points to the 'open' head chunk that receives new samples,
while older, yet to be mmapped, chunks are linked to it.
Mmapping is done on a schedule by iterating all memSeries one by one. Thanks to this we control when mmapping is done, since we trigger
it manually, which reduces the risk that it will have to compete for mmap locks with other chunks.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
2023-07-31 11:10:24 +02:00
Julien Pivotto bf5bf1a4b3 TSDB: Remove usused import of sort
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2023-07-11 14:29:31 +02:00
Julien Pivotto 0f85e4f41d
Merge pull request #12539 from bboreham/slices-sorts
Replace sort.Slice with faster slices.SortFunc
2023-07-11 13:09:02 +02:00
Bryan Boreham ce153e3fff Replace sort.Sort with faster slices.SortFunc
The generic version is more efficient.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-07-10 09:43:45 +00:00
Bryan Boreham 5255bf06ad Replace sort.Slice with faster slices.SortFunc
The generic version is more efficient.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-07-02 22:17:08 +00:00
George Krajcsovits 92d6980360
Fix populateWithDelChunkSeriesIterator and gauge histograms (#12330)
Use AppendableGauge to detect corrupt chunk with gauge histograms.
Detect if first sample is a gauge but the chunk is not set up to contain
gauge histograms.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>
2023-05-19 10:24:06 +02:00
beorn7 5b53aa1108 style: Replace `else if` cascades with `switch`
Wiser coders than myself have come to the conclusion that a `switch`
statement is almost always superior to a statement that includes any
`else if`.

The exceptions that I have found in our codebase are just these two:

* The `if else` is followed by an additional statement before the next
  condition (separated by a `;`).
* The whole thing is within a `for` loop and `break` statements are
  used. In this case, using `switch` would require tagging the `for`
  loop, which probably tips the balance.

Why are `switch` statements more readable?

For one, fewer curly braces. But more importantly, the conditions all
have the same alignment, so the whole thing follows the natural flow
of going down a list of conditions. With `else if`, in contrast, all
conditions but the first are "hidden" behind `} else if `, harder to
spot and (for no good reason) presented differently from the first
condition.

I'm sure the aforemention wise coders can list even more reasons.

In any case, I like it so much that I have found myself recommending
it in code reviews. I would like to make it a habit in our code base,
without making it a hard requirement that we would test on the CI. But
for that, there has to be a role model, so this commit eliminates all
`if else` occurrences, unless it is autogenerated code or fits one of
the exceptions above.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-04-19 17:22:31 +02:00
Ganesh Vernekar 1b7d973f14
tsdb: Fix a comment in tsdb/head_read.go
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2023-03-21 15:15:36 +05:30
Ganesh Vernekar 0c0c2af7f5
Do not re-encode head chunk in ChunkQuerier
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2023-03-15 17:58:01 +05:30
Ganesh Vernekar d504c950a2
Remove unnecessary chunk fetch in Head queries
`safeChunk` is only obtained from the `headChunkReader.Chunk` call where
the chunk is already fetched and stored with the `safeChunk`. So, when
getting the iterator for the `safeChunk`, we don't need to get the chunk again.

Also removed a couple of unnecessary fields from `safeChunk` as a part of this.

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2023-02-22 12:21:12 +05:30
Ganesh Vernekar cb2be6e62f
Merge pull request #11779 from codesome/memseries-ooo
tsdb: Only initialise out-of-order fields when required
2023-01-16 10:58:05 +05:30
Ganesh Vernekar 38fa151a7c
tsdb: Only initialise out-of-order fields when required
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2023-01-12 20:29:16 +05:30
Oleg Zaytsev de93a279a0
Shortcut postings for matchers when empty postings are selected (#11813)
* Add more benchmark cases
* Add shortcuts for empty postings

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2023-01-10 15:21:49 +05:30
Bryan Boreham 10b27dfb84 Simplify IndexReader.Series interface
Instead of passing in a `ScratchBuilder` and `Labels`, just pass the
builder and the caller can extract labels from it. In many cases the
caller didn't use the Labels value anyway.

Now in `Labels.ScratchBuilder` we need a slightly different API: one
to assign what will be the result, instead of overwriting some other
`Labels`. This is safer and easier to reason about.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2022-12-19 15:22:09 +00:00
Bryan Boreham 543c318ec2 Update package tsdb for new labels.Labels type
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2022-12-19 15:22:09 +00:00
Bryan Boreham d3d96ec887 tsdb/index: use ScratchBuilder to create Labels
This necessitates a change to the `tsdb.IndexReader` interface:
`index.Reader` is used from multiple goroutines concurrently, so we
can't have state in it.

We do retain a `ScratchBuilder` in `blockBaseSeriesSet` which is
iterator-like.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2022-12-19 15:22:09 +00:00
Bryan Boreham 463f5cafdd storage: re-use iterators to save garbage
Re-use previous memory if it is already of the correct type.

In `NewListSeries` we hoist the conversion to an interface value out
so it only allocates once.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2022-12-15 18:32:45 +00:00
Signed-off-by: Jesus Vazquez 3362bf6d79
Fix merge conflicts
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
2022-10-11 22:53:37 +05:30
Jesus Vazquez e934d0f011 Merge 'main' into sparsehistogram
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
2022-10-05 22:14:49 +02:00
Bryan Boreham 3330d85ba8
Replace sort.Strings and sort.Ints with faster slices.Sort (#11318)
Use new experimental package `golang.org/x/exp/slices`.

slices.Sort works on values that are directly comparable, like ints,
so avoids the overhad of an interface call to `.Less()`.

Left tests unchanged, because they don't need the speed and it may be
a cross-check that slices.Sort gives the same answer.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2022-09-30 20:03:56 +05:30
Bryan Boreham d166da7b59
tsdb: stop saving a copy of last 4 samples in memSeries (#11296)
* TSDB chunks: remove race between writing and reading

Because the data is stored as a bit-stream, the last byte in the stream
could change if the stream is appended to after an Iterator is obtained.
Copy the last byte when the Iterator is created, so we don't have to
read it later.

Clarify in comments that concurrent Iterator and Appender are allowed,
but the chunk must not be modified while an Iterator is created.
(This was already the case, in order to copy the bstream slice header.)

* TSDB: stop saving last 4 samples in memSeries

This extra copy of the last 4 samples was introduced to avoid a race
condition between reading the last byte of the chunk and writing to it.

But now we have fixed that by having `bstreamReader` copy the last byte,
we don't need to copy the last 4 samples.

This change saves 56 bytes per series, which is very worthwhile when
you have millions or tens of millions of series.

* TSDB: tidy up stopIterator re-use

Previous changes have left this code duplicating some lines; pull
them out to a separate function and tidy up.

* TSDB head_test: stop checking when iterators are wrapped

The behaviour has changed so chunk iterators are only wrapped when
transaction isolation requires them to stop short of the end.
This makes tests fail which are checking the type.

Tests should check the observable behaviour, not the type.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
2022-09-27 19:32:05 +05:30
Bryan Boreham ff00dee262
tsdb: turn off transaction isolation for head compaction (#11317)
* tsdb: add a basic test for read/write isolation

* tsdb: store the min time with isolationAppender
So that we can see when appending has moved past a certain point in time.

* tsdb: allow RangeHead to have isolation disabled
This will be used when for head compaction.

* tsdb: do head compaction with isolation disabled
This saves a lot of work tracking appends done while compaction is ongoing.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2022-09-27 19:31:23 +05:30
Jesus Vazquez c1b669bf9b
Add out-of-order sample support to the TSDB (#11075)
* Introduce out-of-order TSDB support

This implementation is based on this design doc:
https://docs.google.com/document/d/1Kppm7qL9C-BJB1j6yb6-9ObG3AbdZnFUBYPNNWwDBYM/edit?usp=sharing

This commit adds support to accept out-of-order ("OOO") sample into the TSDB
up to a configurable time allowance. If OOO is enabled, overlapping querying
are automatically enabled.

Most of the additions have been borrowed from
https://github.com/grafana/mimir-prometheus/
Here is the list ist of the original commits cherry picked
from mimir-prometheus into this branch:
- 4b2198d7ec
- 2836e5513f
- 00b379c3a5
- ff0dc75758
- a632c73352
- c6f3d4ab33
- 5e8406a1d4
- abde1e0ba1
- e70e769889
- df59320886

Co-authored-by: Jesus Vazquez <jesus.vazquez@grafana.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Dieter Plaetinck <dieter@grafana.com>
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* gofumpt files

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Add license header to missing files

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Fix OOO tests due to existing chunk disk mapper implementation

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Fix truncate int overflow

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Add Sync method to the WAL and update tests

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* remove useless sync

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Update minOOOTime after truncating Head

* Update minOOOTime after truncating Head

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix lint

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Add a unit test

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Load OutOfOrderTimeWindow only once per appender

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Fix OOO Head LabelValues and PostingsForMatchers

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Fix replay of OOO mmap chunks

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Remove unnecessary err check

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Prevent panic with ApplyConfig

Signed-off-by: Ganesh Vernekar 15064823+codesome@users.noreply.github.com
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Run OOO compaction after restart if there is OOO data from WBL

Signed-off-by: Ganesh Vernekar 15064823+codesome@users.noreply.github.com
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Apply Bartek's suggestions

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Refactor OOO compaction

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Address comments and TODOs

- Added a comment explaining why we need the allow overlapping
  compaction toggle
- Clarified TSDBConfig OutOfOrderTimeWindow doc
- Added an owner to all the TODOs in the code

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Run go format

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Fix remaining review comments

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix tests

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Change wbl reference when truncating ooo in TestHeadMinOOOTimeUpdate

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Fix TestWBLAndMmapReplay test failure on windows

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Address most of the feedback

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Refactor the block meta for out of order

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix windows error

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix review comments

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: Ganesh Vernekar 15064823+codesome@users.noreply.github.com
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Dieter Plaetinck <dieter@grafana.com>
Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com>
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
2022-09-20 22:35:50 +05:30
Bryan Boreham d2701be53a
tsdb: remove chunk pool from memSeries (#11280)
The chunk pool belongs to the head not to the series. Pass it down where
required, and remove the copy of the pointer that `memSeries` was
holding.

`safeChunk` also needs to hold it, because in scenarios where it is used
we don't have a reference to the head. However it was already holding
`chunkDiskMapper` for the same reason, so no big change.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2022-09-15 13:22:09 +05:30
beorn7 4210aac74a Merge branch 'main' into sparsehistogram 2022-03-22 14:47:42 +01:00
Oleg Zaytsev 3947238ce0
Label values with matchers by intersecting postings (#9907)
* LabelValues w/matchers by intersecting postings

Instead of iterating all matched series to find the values, this
checks if each one of the label values is present in the matched series
(postings).

Pending to be benchmarked.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Benchmark labelValuesWithMatchers

name                                                                   old time/op    new time/op
Querier/Head/labelValuesWithMatchers/i_with_n="1"                         157ms ± 0%        48ms ± 0%
Querier/Head/labelValuesWithMatchers/i_with_n="^.+$"                      1.80s ± 0%       0.46s ± 0%
Querier/Head/labelValuesWithMatchers/i_with_n="1",j!="foo"                144ms ± 0%        57ms ± 0%
Querier/Head/labelValuesWithMatchers/i_with_n="1",i=~"^.*$",j!="foo"      304ms ± 0%       111ms ± 0%
Querier/Head/labelValuesWithMatchers/n_with_j!="foo"                      761ms ± 0%       164ms ± 0%
Querier/Head/labelValuesWithMatchers/n_with_i="1"                        6.11µs ± 0%      6.62µs ± 0%
Querier/Block/labelValuesWithMatchers/i_with_n="1"                        117ms ± 0%        62ms ± 0%
Querier/Block/labelValuesWithMatchers/i_with_n="^.+$"                     1.44s ± 0%       0.24s ± 0%
Querier/Block/labelValuesWithMatchers/i_with_n="1",j!="foo"              92.1ms ± 0%      70.3ms ± 0%
Querier/Block/labelValuesWithMatchers/i_with_n="1",i=~"^.*$",j!="foo"     196ms ± 0%       115ms ± 0%
Querier/Block/labelValuesWithMatchers/n_with_j!="foo"                     1.23s ± 0%       0.21s ± 0%
Querier/Block/labelValuesWithMatchers/n_with_i="1"                       1.06ms ± 0%      0.88ms ± 0%

name                                                                   old alloc/op   new alloc/op
Querier/Head/labelValuesWithMatchers/i_with_n="1"                        29.5MB ± 0%      26.9MB ± 0%
Querier/Head/labelValuesWithMatchers/i_with_n="^.+$"                     46.8MB ± 0%     251.5MB ± 0%
Querier/Head/labelValuesWithMatchers/i_with_n="1",j!="foo"               29.5MB ± 0%      22.3MB ± 0%
Querier/Head/labelValuesWithMatchers/i_with_n="1",i=~"^.*$",j!="foo"     46.8MB ± 0%      23.9MB ± 0%
Querier/Head/labelValuesWithMatchers/n_with_j!="foo"                     10.3kB ± 0%  138535.2kB ± 0%
Querier/Head/labelValuesWithMatchers/n_with_i="1"                        5.54kB ± 0%      7.09kB ± 0%
Querier/Block/labelValuesWithMatchers/i_with_n="1"                       39.1MB ± 0%      28.5MB ± 0%
Querier/Block/labelValuesWithMatchers/i_with_n="^.+$"                     287MB ± 0%       253MB ± 0%
Querier/Block/labelValuesWithMatchers/i_with_n="1",j!="foo"              34.3MB ± 0%      23.9MB ± 0%
Querier/Block/labelValuesWithMatchers/i_with_n="1",i=~"^.*$",j!="foo"    51.6MB ± 0%      25.5MB ± 0%
Querier/Block/labelValuesWithMatchers/n_with_j!="foo"                     144MB ± 0%       139MB ± 0%
Querier/Block/labelValuesWithMatchers/n_with_i="1"                       6.43kB ± 0%      8.66kB ± 0%

name                                                                   old allocs/op  new allocs/op
Querier/Head/labelValuesWithMatchers/i_with_n="1"                          104k ± 0%        500k ± 0%
Querier/Head/labelValuesWithMatchers/i_with_n="^.+$"                       204k ± 0%        600k ± 0%
Querier/Head/labelValuesWithMatchers/i_with_n="1",j!="foo"                 104k ± 0%        500k ± 0%
Querier/Head/labelValuesWithMatchers/i_with_n="1",i=~"^.*$",j!="foo"       204k ± 0%        500k ± 0%
Querier/Head/labelValuesWithMatchers/n_with_j!="foo"                       66.0 ± 0%       255.0 ± 0%
Querier/Head/labelValuesWithMatchers/n_with_i="1"                          61.0 ± 0%       205.0 ± 0%
Querier/Block/labelValuesWithMatchers/i_with_n="1"                         304k ± 0%        600k ± 0%
Querier/Block/labelValuesWithMatchers/i_with_n="^.+$"                     5.20M ± 0%       0.70M ± 0%
Querier/Block/labelValuesWithMatchers/i_with_n="1",j!="foo"                204k ± 0%        600k ± 0%
Querier/Block/labelValuesWithMatchers/i_with_n="1",i=~"^.*$",j!="foo"      304k ± 0%        600k ± 0%
Querier/Block/labelValuesWithMatchers/n_with_j!="foo"                     3.00M ± 0%       0.00M ± 0%
Querier/Block/labelValuesWithMatchers/n_with_i="1"                         61.0 ± 0%       247.0 ± 0%

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Don't expand postings to intersect them

Using a min heap we can check whether matched postings intersect with
each one of the label values postings. This avoid expanding postings
(and thus having all of them in memory at any point).

Slightly slower than the expanding postings version for some cases, but
definitely pays the price once the cardinality grows.

Still offers 10x latency improvement where previous latencies were
reaching 1s.

Benchmark results:

name \ time/op                                                         old.txt      intersect.txt    intersect_noexpand.txt
Querier/Head/labelValuesWithMatchers/i_with_n="1"                       157ms ± 0%        48ms ± 0%              110ms ± 0%
Querier/Head/labelValuesWithMatchers/i_with_n="^.+$"                    1.80s ± 0%       0.46s ± 0%              0.18s ± 0%
Querier/Head/labelValuesWithMatchers/i_with_n="1",j!="foo"              144ms ± 0%        57ms ± 0%              125ms ± 0%
Querier/Head/labelValuesWithMatchers/i_with_n="1",i=~"^.*$",j!="foo"    304ms ± 0%       111ms ± 0%              177ms ± 0%
Querier/Head/labelValuesWithMatchers/n_with_j!="foo"                    761ms ± 0%       164ms ± 0%              134ms ± 0%
Querier/Head/labelValuesWithMatchers/n_with_i="1"                      6.11µs ± 0%      6.62µs ± 0%             4.29µs ± 0%
Querier/Block/labelValuesWithMatchers/i_with_n="1"                      117ms ± 0%        62ms ± 0%              120ms ± 0%
Querier/Block/labelValuesWithMatchers/i_with_n="^.+$"                   1.44s ± 0%       0.24s ± 0%              0.15s ± 0%
Querier/Block/labelValuesWithMatchers/i_with_n="1",j!="foo"            92.1ms ± 0%      70.3ms ± 0%            125.4ms ± 0%
Querier/Block/labelValuesWithMatchers/i_with_n="1",i=~"^.*$",j!="foo"   196ms ± 0%       115ms ± 0%              170ms ± 0%
Querier/Block/labelValuesWithMatchers/n_with_j!="foo"                   1.23s ± 0%       0.21s ± 0%              0.14s ± 0%
Querier/Block/labelValuesWithMatchers/n_with_i="1"                     1.06ms ± 0%      0.88ms ± 0%             0.92ms ± 0%

name \ alloc/op                                                        old.txt      intersect.txt    intersect_noexpand.txt
Querier/Head/labelValuesWithMatchers/i_with_n="1"                      29.5MB ± 0%      26.9MB ± 0%             19.1MB ± 0%
Querier/Head/labelValuesWithMatchers/i_with_n="^.+$"                   46.8MB ± 0%     251.5MB ± 0%             36.3MB ± 0%
Querier/Head/labelValuesWithMatchers/i_with_n="1",j!="foo"             29.5MB ± 0%      22.3MB ± 0%             19.1MB ± 0%
Querier/Head/labelValuesWithMatchers/i_with_n="1",i=~"^.*$",j!="foo"   46.8MB ± 0%      23.9MB ± 0%             20.7MB ± 0%
Querier/Head/labelValuesWithMatchers/n_with_j!="foo"                   10.3kB ± 0%  138535.2kB ± 0%              6.4kB ± 0%
Querier/Head/labelValuesWithMatchers/n_with_i="1"                      5.54kB ± 0%      7.09kB ± 0%             4.30kB ± 0%
Querier/Block/labelValuesWithMatchers/i_with_n="1"                     39.1MB ± 0%      28.5MB ± 0%             20.7MB ± 0%
Querier/Block/labelValuesWithMatchers/i_with_n="^.+$"                   287MB ± 0%       253MB ± 0%               38MB ± 0%
Querier/Block/labelValuesWithMatchers/i_with_n="1",j!="foo"            34.3MB ± 0%      23.9MB ± 0%             20.7MB ± 0%
Querier/Block/labelValuesWithMatchers/i_with_n="1",i=~"^.*$",j!="foo"  51.6MB ± 0%      25.5MB ± 0%             22.3MB ± 0%
Querier/Block/labelValuesWithMatchers/n_with_j!="foo"                   144MB ± 0%       139MB ± 0%                0MB ± 0%
Querier/Block/labelValuesWithMatchers/n_with_i="1"                     6.43kB ± 0%      8.66kB ± 0%             5.86kB ± 0%

name \ allocs/op                                                       old.txt      intersect.txt    intersect_noexpand.txt
Querier/Head/labelValuesWithMatchers/i_with_n="1"                        104k ± 0%        500k ± 0%               300k ± 0%
Querier/Head/labelValuesWithMatchers/i_with_n="^.+$"                     204k ± 0%        600k ± 0%               400k ± 0%
Querier/Head/labelValuesWithMatchers/i_with_n="1",j!="foo"               104k ± 0%        500k ± 0%               300k ± 0%
Querier/Head/labelValuesWithMatchers/i_with_n="1",i=~"^.*$",j!="foo"     204k ± 0%        500k ± 0%               300k ± 0%
Querier/Head/labelValuesWithMatchers/n_with_j!="foo"                     66.0 ± 0%       255.0 ± 0%              139.0 ± 0%
Querier/Head/labelValuesWithMatchers/n_with_i="1"                        61.0 ± 0%       205.0 ± 0%               87.0 ± 0%
Querier/Block/labelValuesWithMatchers/i_with_n="1"                       304k ± 0%        600k ± 0%               400k ± 0%
Querier/Block/labelValuesWithMatchers/i_with_n="^.+$"                   5.20M ± 0%       0.70M ± 0%              0.50M ± 0%
Querier/Block/labelValuesWithMatchers/i_with_n="1",j!="foo"              204k ± 0%        600k ± 0%               400k ± 0%
Querier/Block/labelValuesWithMatchers/i_with_n="1",i=~"^.*$",j!="foo"    304k ± 0%        600k ± 0%               400k ± 0%
Querier/Block/labelValuesWithMatchers/n_with_j!="foo"                   3.00M ± 0%       0.00M ± 0%              0.00M ± 0%
Querier/Block/labelValuesWithMatchers/n_with_i="1"                       61.0 ± 0%       247.0 ± 0%              129.0 ± 0%

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Apply comment suggestions from the code review

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>

* Change else { if } to else if

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Remove sorting of label values

We were not sorting them before, so no need to sort them now

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
2021-12-28 15:59:03 +01:00
beorn7 e4e24453fa Merge branch 'main' into beorn7/merge2 2021-11-30 17:19:06 +01:00
Björn Rabenstein 7e42acd3b1
tsdb: Rework iterators (#9877)
- Pick At... method via return value of Next/Seek.
- Do not clobber returned buckets.
- Add partial FloatHistogram suppert.

Note that the promql package is now _only_ dealing with
FloatHistograms, following the idea that PromQL only knows float
values.

As a byproduct, I have removed the histogramSeries metric. In my
understanding, series can have both float and histogram samples, so
that metric doesn't make sense anymore.

As another byproduct, I have converged the sampleBuf and the
histogramSampleBuf in memSeries into one. The sample type stored in
the sampleBuf has been extended to also contain histograms even before
this commit.

Signed-off-by: beorn7 <beorn@grafana.com>
2021-11-29 13:24:23 +05:30
Ganesh Vernekar 26c0a433f5
Support appending different sample types to the same series (#9705)
* Support appending different sample types to the same series

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix comments

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix build

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-11-26 17:43:27 +05:30
Darshan Chaudhary 9dcf8b2208
Add the ability to disable tsdb isolation (#9270)
* Disable isolation in isolation struct

Signed-off-by: darshanime <deathbullet@gmail.com>

* Run tsdb tests with isolation disabled

Signed-off-by: darshanime <deathbullet@gmail.com>

* Check for isolation disabled in isoState.Close()

Signed-off-by: darshanime <deathbullet@gmail.com>

* use t.Skip to skip isolation tests when disabled

Signed-off-by: darshanime <deathbullet@gmail.com>

* address review comments

Signed-off-by: darshanime <deathbullet@gmail.com>

* fix test for defaultIsolationState

Signed-off-by: darshanime <deathbullet@gmail.com>

* Change flag name. Set flag in DB. Do not init txRing. Close isoState.

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Test disabled isolation in CircleCI test_go

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Skip isolation related tests in db_test.go

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-11-19 15:41:32 +05:30
beorn7 5d4db805ac Merge branch 'main' into sparsehistogram 2021-11-17 19:57:31 +01:00
Dieter Plaetinck 067efc3725
clarify HeadChunkID type and usage (#9726)
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>

Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
2021-11-17 18:35:10 +05:30
Dieter Plaetinck 0fac9bb859
Add basic initial developer docs for TSDB (#9451)
* Add basic initial developer docs for TSDB

There's a decent amount of content already out there (blog posts,
conference talks, etc), but:
* when they get stale, they don't tend to get updated
* they still leave me with questions that I'ld like to answer
  for developers (like me) who want to use, or work with, TSDB

What I propose is developer docs inside the prometheus
repository.  Easy to find and harness the power of the community
to expand it and keep it up to date.

* perfect is the enemy of good.  Let's have a base and incrementally improve
* Markdown docs should be broad but not too deep.  Source code comments
  can complement them, and are the ideal place for implementation details.

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* use example code that works out of the box

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* Apply suggestions from code review

Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* PR feedback

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* more docs

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* PR feedback

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* Apply suggestions from code review

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Apply suggestions from code review

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>

* feedback

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* Update tsdb/docs/usage.md

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>

* final tweaks

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* workaround docs versioning issue

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* Move example code to real executable, testable example.

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* cleanup example test and make sure it always reproduces

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* obtain temp dir in a way that works with older Go versions

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* Fix Ganesh's comments

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-11-17 15:51:27 +05:30
beorn7 4c28d9fac7 Move to histogram.Histogram pointers
This is to avoid copying the many fields of a histogram.Histogram all
the time.

This also fixes a bunch of formerly broken tests.

Signed-off-by: beorn7 <beorn@grafana.com>
2021-11-12 23:17:35 +01:00
beorn7 c954cd9d1d Move packages out of deprecated pkg directory
This creates a new `model` directory and moves all data-model related
packages over there:
  exemplar labels relabel rulefmt textparse timestamp value

All the others are more or less utilities and have been moved to `util`:
  gate logging modetimevfs pool runtime

Signed-off-by: beorn7 <beorn@grafana.com>
2021-11-09 08:03:10 +01:00
Dieter Plaetinck cda025b5b5
TSDB: demistify SeriesRefs and ChunkRefs (#9536)
* TSDB: demistify seriesRefs and ChunkRefs

The TSDB package contains many types of series and chunk references,
all shrouded in uint types.  Often the same uint value may
actually mean one of different types, in non-obvious ways.

This PR aims to clarify the code and help navigating to relevant docs,
usage, etc much quicker.

Concretely:

* Use appropriately named types and document their semantics and
  relations.
* Make multiplexing and demuxing of types explicit
  (on the boundaries between concrete implementations and generic
  interfaces).
* Casting between different types should be free.  None of the changes
  should have any impact on how the code runs.

TODO: Implement BlockSeriesRef where appropriate (for a future PR)

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* feedback

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* agent: demistify seriesRefs and ChunkRefs

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
2021-11-06 15:40:04 +05:30
Ganesh Vernekar c8b267efd6
Get histograms from TSDB to the rate() function implementation
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-11-03 19:04:18 +05:30
beorn7 7a8bb8222c Style cleanup of all the changes in sparsehistogram so far
A lot of this code was hacked together, literally during a
hackathon. This commit intends not to change the code substantially,
but just make the code obey the usual style practices.

A (possibly incomplete) list of areas:

* Generally address linter warnings.

* The `pgk` directory is deprecated as per dev-summit. No new packages should
  be added to it. I moved the new `pkg/histogram` package to `model`
  anticipating what's proposed in #9478.

* Make the naming of the Sparse Histogram more consistent. Including
  abbreviations, there were just too many names for it: SparseHistogram,
  Histogram, Histo, hist, his, shs, h. The idea is to call it "Histogram" in
  general. Only add "Sparse" if it is needed to avoid confusion with
  conventional Histograms (which is rare because the TSDB really has no notion
  of conventional Histograms). Use abbreviations only in local scope, and then
  really abbreviate (not just removing three out of seven letters like in
  "Histo"). This is in the spirit of
  https://github.com/golang/go/wiki/CodeReviewComments#variable-names

* Several other minor name changes.

* A lot of formatting of doc comments. For one, following
  https://github.com/golang/go/wiki/CodeReviewComments#comment-sentences
  , but also layout question, anticipating how things will look like
  when rendered by `godoc` (even where `godoc` doesn't render them
  right now because they are for unexported types or not a doc comment
  at all but just a normal code comment - consistency is queen!).

* Re-enabled `TestQueryLog` and `TestEndopints` (they pass now,
  leaving them disabled was presumably an oversight).

* Bucket iterator for histogram.Histogram is now created with a
  method.

* HistogramChunk.iterator now allows iterator recycling. (I think
  @dieterbe only commented it out because he was confused by the
  question in the comment.)

* HistogramAppender.Append panics now because we decided to treat
  staleness marker differently.

Signed-off-by: beorn7 <beorn@grafana.com>
2021-10-11 13:02:03 +02:00
beorn7 fd5ea4e0b5 Merge branch 'main' into sparsehistogram 2021-10-07 23:16:42 +02:00
Bryan Boreham 87d909df4a
Remove symbols map from TSDB head (#9301)
This saves memory, effort and locking.

Since every symbol is also added to postings, `Symbols()` can be
implemented there instead. This now has to build a map for
deduplication, but `Symbols()` is only called for compaction, and `gc()`
used to rebuild the symbols map after every compaction so not an
additional cost.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2021-09-08 14:48:48 +05:30