This reverts commit 50ef0dc954.
Memory allocation goes so high in Prombench that the system is unusable.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Tests for Mempostings.{Add,Get} data race
* Fix MemPostings.{Add,Get} data race
We can't modify the postings list that are held in MemPostings as they
might already be in use by some readers.
* Modify BenchmarkHeadStripeSeriesCreate to have common labels
If there are no common labels on the series, we don't excercise the
ordering part of MemSeries, as we're just creating slices of one element
for each label value.
---------
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
We are still seeing lock contention on MemPostings.mtx, and MemPostings.Delete() is by far the most expensive operation on that mutex.
This adds parallelism to that method, trying to reduce the amount of time we spend with the mutex held.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Pass affected labels to MemPostings.Delete
As suggested by @bboreham, we can track the labels of the deleted series
and avoid iterating through all the label/value combinations.
This looks much faster on the MemPostings.Delete call. We don't have a
benchmark on stripeSeries.gc() where we'll pay the price of iterating
the labels of each one of the deleted series.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* MemPostings.PostingsForLabelMatching: let mutex go
This changes the `MemPostings.PostingsForLabelMatching` implementation
to stop holding the read mutex while matching the label values.
We've seen that this method can be slow when the matcher is expensive,
that's why we even added a context expiration check.
However, there are critical process that might be waiting on this mutex:
writes (adding new series) and compaction (deleting the
garbage-collected ones), so we should avoid holding it for a long period
of time.
Given that we've copied the values to a slice anyway, there's no need to
hold the lock while matching.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* MemPostings: reduce locking/unlocking
MemPostings.Delete is called from Head.gc(), i.e. it gets the IDs of the
series that have churned.
I'd assume that many label values aren't affected by that churn at all,
so it doesn't make sense to touch the lock while checking them.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Follow up on https://github.com/prometheus/prometheus/pull/14096
As promised, I bring a benchmark, which shows a very small improvement
if context is checked every 128 iterations of label instead of every
100.
It's much easier for a computer to check modulo 128 than modulo 100.
This is a very small 0-2% improvement but I'd say this is one of the
hottest paths of the app so this is still relevant.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* tsdb: check for context cancel before regex matching postings
Regex matching can be heavy if the regex takes a lot of cycles to
evaluate and we can get stuck evaluating postings for a long time
without this fix. The constant checkContextEveryNIterations=100
may be changed later.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Reverts change from https://github.com/prometheus/prometheus/pull/12906
The benchmarks show that it's slower when intersecting, which is a
common usage for ListPostings (when intersecting matchers from Head)
(old is before #12906, new is #12906):
│ old │ new │
│ sec/op │ sec/op vs base │
Intersect/LongPostings1-16 20.54µ ± 1% 21.11µ ± 1% +2.76% (p=0.000 n=20)
Intersect/LongPostings2-16 51.03m ± 1% 52.40m ± 2% +2.69% (p=0.000 n=20)
Intersect/ManyPostings-16 194.2m ± 3% 332.1m ± 1% +71.00% (p=0.000 n=20)
geomean 5.882m 7.161m +21.74%
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
The Next() call of ListPostings() was updating two values, while we can
just update the position. This is up to 30% faster for high number of
Postings.
goos: linux
goarch: amd64
pkg: github.com/prometheus/prometheus/tsdb/index
cpu: 11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz
│ old │ new │
│ sec/op │ sec/op vs base │
ListPostings/count=100-16 819.2n ± 0% 732.6n ± 0% -10.58% (p=0.000 n=20)
ListPostings/count=1000-16 2.685µ ± 1% 2.017µ ± 0% -24.88% (p=0.000 n=20)
ListPostings/count=10000-16 21.43µ ± 1% 14.81µ ± 0% -30.91% (p=0.000 n=20)
ListPostings/count=100000-16 209.4µ ± 1% 143.3µ ± 0% -31.55% (p=0.000 n=20)
ListPostings/count=1000000-16 2.086m ± 1% 1.436m ± 1% -31.18% (p=0.000 n=20)
geomean 29.02µ 21.41µ -26.22%
We're talking about microseconds here, but they just keep adding.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Problem:
LabelValueStats - This will provide a list of the label names and memory used in bytes.
It is calculated by adding the length of all values for a given label name.
But internally Prometheus stores the name and the value independently for each series.
Solution:
MemPostings struct maintains the values to seriesRef map which is used
to get the number of series which contains the label values.
Using that LabelValueStats is calculated as: seriesCnt * len(value
name)
Signed-off-by: Baskar Shanmugam <baskar.shanmugam.career@gmail.com>
This allocates memory for all the returned values, which skews the
result. We aren't trying to benchmark `ExpandPostings`, so just step
through all the values without storing them to consume them.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Previously all the postings constructed were consumed on the first
iteration, so subsequent iterations did no work.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Unexported postingsWithIndexHeap's methods that don't need to be
exported, and added detailed comments.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
See this comment for detailed explanation:
https://github.com/prometheus/prometheus/pull/9907#issuecomment-1002189932
TL;DR: if we don't call Pop() on the heap implementation, we don't need
to return our param as an `interface{}` so we save an allocation.
This would be popped for every label value, so it can be thousands of
saved allocations here (see benchmarks).
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
This creates a new `model` directory and moves all data-model related
packages over there:
exemplar labels relabel rulefmt textparse timestamp value
All the others are more or less utilities and have been moved to `util`:
gate logging modetimevfs pool runtime
Signed-off-by: beorn7 <beorn@grafana.com>
* TSDB: demistify seriesRefs and ChunkRefs
The TSDB package contains many types of series and chunk references,
all shrouded in uint types. Often the same uint value may
actually mean one of different types, in non-obvious ways.
This PR aims to clarify the code and help navigating to relevant docs,
usage, etc much quicker.
Concretely:
* Use appropriately named types and document their semantics and
relations.
* Make multiplexing and demuxing of types explicit
(on the boundaries between concrete implementations and generic
interfaces).
* Casting between different types should be free. None of the changes
should have any impact on how the code runs.
TODO: Implement BlockSeriesRef where appropriate (for a future PR)
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
* feedback
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
* agent: demistify seriesRefs and ChunkRefs
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
The previous code re-used series IDs 1-1000 many times over, so a lot
of time was spent ensuring the lists of series were in ascending order.
The intended use of `MemPostings.Add()` is that all series IDs are
unique, and changing the benchmark to do this lets it finish ten times
faster.
(It doesn't affect the benchmark result, just the setup code)
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Testify: move to require
Moving testify to require to fail tests early in case of errors.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* More moves
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Refactor test assertions
This pull request gets rid of assert.True where possible to use
fine-grained assertions.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Adding TSDB Head Stats like cardinality to Status Page
Signed-off-by: Sharad Gaur <sgaur@splunk.com>
* Moving mutx to Head
Signed-off-by: Sharad Gaur <sgaur@splunk.com>
* Renaming variabls
Signed-off-by: Sharad Gaur <sgaur@splunk.com>
* Renaming variabls and html
Signed-off-by: Sharad Gaur <sgaur@splunk.com>
* Removing unwanted whitespaces
Signed-off-by: Sharad Gaur <sgaur@splunk.com>
* Adding Tests, Banchmarks and Max Heap for Postings Stats
Signed-off-by: Sharad Gaur <sgaur@splunk.com>
* Adding more tests for postingstats and web handler
Signed-off-by: Sharad Gaur <sgaur@splunk.com>
* Adding more tests for postingstats and web handler
Signed-off-by: Sharad Gaur <sgaur@splunk.com>
* Remove generated asset file that is no longer used
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
* Changing comment and variable name for more readability
Signed-off-by: Sharad Gaur <sgaur@splunk.com>
* Using time.Duration in postings status function and removing refresh button from web page
Signed-off-by: Sharad Gaur <sgaur@splunk.com>
This change also uses the latest staticcheck version which comes with
new verifications, hence some clean up in the code.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Avoid a tree of merge objects, which can result in
what I suspect is n^2 calls to Seek when using Without.
With 100k metrics, and a regex of ^$ in BenchmarkHeadPostingForMatchers:
Before:
BenchmarkHeadPostingForMatchers-8 1 51633185216 ns/op 29745528 B/op 200357 allocs/op
After:
BenchmarkHeadPostingForMatchers-8 10 108924996 ns/op 25715025 B/op 101748 allocs/op
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
This saves memory, about a quarter of the size of the postings map
itself with high-cardinality labels (not including the post ids).
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>