This is to avoid copying the many fields of a histogram.Histogram all
the time.
This also fixes a bunch of formerly broken tests.
Signed-off-by: beorn7 <beorn@grafana.com>
This creates a new `model` directory and moves all data-model related
packages over there:
exemplar labels relabel rulefmt textparse timestamp value
All the others are more or less utilities and have been moved to `util`:
gate logging modetimevfs pool runtime
Signed-off-by: beorn7 <beorn@grafana.com>
* TSDB: demistify seriesRefs and ChunkRefs
The TSDB package contains many types of series and chunk references,
all shrouded in uint types. Often the same uint value may
actually mean one of different types, in non-obvious ways.
This PR aims to clarify the code and help navigating to relevant docs,
usage, etc much quicker.
Concretely:
* Use appropriately named types and document their semantics and
relations.
* Make multiplexing and demuxing of types explicit
(on the boundaries between concrete implementations and generic
interfaces).
* Casting between different types should be free. None of the changes
should have any impact on how the code runs.
TODO: Implement BlockSeriesRef where appropriate (for a future PR)
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
* feedback
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
* agent: demistify seriesRefs and ChunkRefs
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
* Use dedicated Ref type
Throughout the code base, there are reference types masked as
regular integers. Let's use dedicated types. They are
equivalent, but clearer semantically.
This also makes it trivial to find where they are used,
and from uses, find the centralized docs.
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
* postpone some work until after possible return
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
* clarify
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
* rename feedback
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
* skip header is up to caller
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
A lot of this code was hacked together, literally during a
hackathon. This commit intends not to change the code substantially,
but just make the code obey the usual style practices.
A (possibly incomplete) list of areas:
* Generally address linter warnings.
* The `pgk` directory is deprecated as per dev-summit. No new packages should
be added to it. I moved the new `pkg/histogram` package to `model`
anticipating what's proposed in #9478.
* Make the naming of the Sparse Histogram more consistent. Including
abbreviations, there were just too many names for it: SparseHistogram,
Histogram, Histo, hist, his, shs, h. The idea is to call it "Histogram" in
general. Only add "Sparse" if it is needed to avoid confusion with
conventional Histograms (which is rare because the TSDB really has no notion
of conventional Histograms). Use abbreviations only in local scope, and then
really abbreviate (not just removing three out of seven letters like in
"Histo"). This is in the spirit of
https://github.com/golang/go/wiki/CodeReviewComments#variable-names
* Several other minor name changes.
* A lot of formatting of doc comments. For one, following
https://github.com/golang/go/wiki/CodeReviewComments#comment-sentences
, but also layout question, anticipating how things will look like
when rendered by `godoc` (even where `godoc` doesn't render them
right now because they are for unexported types or not a doc comment
at all but just a normal code comment - consistency is queen!).
* Re-enabled `TestQueryLog` and `TestEndopints` (they pass now,
leaving them disabled was presumably an oversight).
* Bucket iterator for histogram.Histogram is now created with a
method.
* HistogramChunk.iterator now allows iterator recycling. (I think
@dieterbe only commented it out because he was confused by the
question in the comment.)
* HistogramAppender.Append panics now because we decided to treat
staleness marker differently.
Signed-off-by: beorn7 <beorn@grafana.com>
* Call delete on head if interval overlaps
Signed-off-by: darshanime <deathbullet@gmail.com>
* Garbage collect tombstones during head gc
Signed-off-by: darshanime <deathbullet@gmail.com>
* Truncate tombstones before min time during head gc
Signed-off-by: darshanime <deathbullet@gmail.com>
* Lock less by deleting all keys in a single pass
Signed-off-by: darshanime <deathbullet@gmail.com>
* Pass map to DeleteTombstones
Signed-off-by: darshanime <deathbullet@gmail.com>
* Create new slice to replace old one
Signed-off-by: darshanime <deathbullet@gmail.com>
This saves memory, effort and locking.
Since every symbol is also added to postings, `Symbols()` can be
implemented there instead. This now has to build a map for
deduplication, but `Symbols()` is only called for compaction, and `gc()`
used to rebuild the symbols map after every compaction so not an
additional cost.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Remove query hacks in the API and fix metrics
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Tests for the metrics
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Better way to count series on restart
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* PromQL: Fix start and end keywords masking label and metric names
This commit fixes an issue with the "at modifier" that introduced two
new keywords: `start` and `end`. In grouping options and in metric
names, these keywords took precedence over metric or label names, so
that those metrics and labels could no longer be referenced.
Signed-off-by: Clayton Peters <clayton.peters@man.com>
* Add in additional tests for metrics and/or labels called start/end.
Signed-off-by: Clayton Peters <clayton.peters@man.com>
* *: Cut 2.29.0-rc.0
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
* VERSION: bump to 2.29.0-rc.0
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
* Remove experimental wording on size-based retention
Followup of #9004
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Fix PR reference in changelog
Signed-off-by: George Brighton <george@gebn.co.uk>
* Describe EC2 availability zone IDs at most once per refresh (#9142)
Signed-off-by: George Brighton <george@gebn.co.uk>
* Describe EC2 availability zones at most once per SD load
Closes#9142.
Signed-off-by: George Brighton <george@gebn.co.uk>
* Incorporate feedback
Signed-off-by: George Brighton <george@gebn.co.uk>
* Integrate feedback
Signed-off-by: George Brighton <george@gebn.co.uk>
* Add a compatibility note for macOS users.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* *: Cut v2.29.0-rc.1
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
* Fix `kuma_sd` targetgroup reporting (#9157)
* Bundle all xDS targets into a single group
Signed-off-by: austin ce <austin.cawley@gmail.com>
* *: cut v2.29.0-rc.2
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
* Rename links
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* bump codemirror-promql to 0.17.0
Signed-off-by: Augustin Husson <husson.augustin@gmail.com>
* *: cut v2.29.0
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
* tsdb: align atomically accessed int64 (#9192)
This prevents a panic in 32-bit archs:
https://pkg.go.dev/sync/atomic#pkg-note-BUGFixed#9190
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Release 2.29.1 (#9193)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
Co-authored-by: Clayton Peters <clayton.peters@man.com>
Co-authored-by: Frederic Branczyk <fbranczyk@gmail.com>
Co-authored-by: George Brighton <george@gebn.co.uk>
Co-authored-by: Austin Cawley-Edwards <austin.cawley@gmail.com>
Co-authored-by: Levi Harrison <git@leviharrison.dev>
Co-authored-by: Augustin Husson <husson.augustin@gmail.com>
* Fix `kuma_sd` targetgroup reporting (#9157)
* Bundle all xDS targets into a single group
Signed-off-by: austin ce <austin.cawley@gmail.com>
* Snapshot in-memory chunks on shutdown for faster restarts (#7229)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Rename links
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Remove Individual Data Type Caps in Per-shard Buffering for Remote Write (#8921)
* Moved everything to nPending buffer
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Simplify exemplar capacity addition
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Added pre-allocation
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Don't allocate if not sending exemplars
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Avoid deadlock when processing duplicate series record (#9170)
* Avoid deadlock when processing duplicate series record
`processWALSamples()` needs to be able to send on its output channel
before it can read the input channel, so reads to allow this in case the
output channel is full.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* processWALSamples: update comment
Previous text seems to relate to an earlier implementation.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Optimise WAL loading by removing extra map and caching min-time (#9160)
* BenchmarkLoadWAL: close WAL after use
So that goroutines are stopped and resources released
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* BenchmarkLoadWAL: make series IDs co-prime with #workers
Series are distributed across workers by taking the modulus of the
ID with the number of workers, so multiples of 100 are a poor choice.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* BenchmarkLoadWAL: simulate mmapped chunks
Real Prometheus cuts chunks every 120 samples, then skips those samples
when re-reading the WAL. Simulate this by creating a single mapped chunk
for each series, since the max time is all the reader looks at.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Fix comment
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Remove series map from processWALSamples()
The locks that is commented to reduce contention in are now sharded
32,000 ways, so won't be contended. Removing the map saves memory and
goes just as fast.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* loadWAL: Cache the last mmapped chunk time
So we can skip calling append() for samples it will reject.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Improvements from code review
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Full stops and capitals on comments
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Cache max time in both places mmappedChunks is updated
Including refactor to extract function `setMMappedChunks`, to reduce
code duplication.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Update head min/max time when mmapped chunks added
This ensures we have the correct values if no WAL samples are added for
that series.
Note that `mSeries.maxTime()` was always `math.MinInt64` before, since
that function doesn't consider mmapped chunks.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Split Go and React Tests (#8897)
* Added go-ci and react-ci
Co-authored-by: Julien Pivotto <roidelapluie@inuits.eu>
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Remove search keymap from new expression editor (#9184)
Signed-off-by: Julius Volz <julius.volz@gmail.com>
Co-authored-by: Austin Cawley-Edwards <austin.cawley@gmail.com>
Co-authored-by: Levi Harrison <git@leviharrison.dev>
Co-authored-by: Julien Pivotto <roidelapluie@inuits.eu>
Co-authored-by: Bryan Boreham <bjboreham@gmail.com>
Co-authored-by: Julius Volz <julius.volz@gmail.com>
* BenchmarkLoadWAL: close WAL after use
So that goroutines are stopped and resources released
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* BenchmarkLoadWAL: make series IDs co-prime with #workers
Series are distributed across workers by taking the modulus of the
ID with the number of workers, so multiples of 100 are a poor choice.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* BenchmarkLoadWAL: simulate mmapped chunks
Real Prometheus cuts chunks every 120 samples, then skips those samples
when re-reading the WAL. Simulate this by creating a single mapped chunk
for each series, since the max time is all the reader looks at.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Fix comment
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Remove series map from processWALSamples()
The locks that is commented to reduce contention in are now sharded
32,000 ways, so won't be contended. Removing the map saves memory and
goes just as fast.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* loadWAL: Cache the last mmapped chunk time
So we can skip calling append() for samples it will reject.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Improvements from code review
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Full stops and capitals on comments
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Cache max time in both places mmappedChunks is updated
Including refactor to extract function `setMMappedChunks`, to reduce
code duplication.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Update head min/max time when mmapped chunks added
This ensures we have the correct values if no WAL samples are added for
that series.
Note that `mSeries.maxTime()` was always `math.MinInt64` before, since
that function doesn't consider mmapped chunks.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
I was struggling to understand the purpose of this method until I
tweaked the tests, so I decided to write down my observations.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Push the matchers for LabelNames all the way into the index.
NB This doesn't actually implement it in the index, just plumbs it through for now...
Signed-off-by: Tom Wilkie <tom@grafana.com>
* Hack it up. Does not work.
Signed-off-by: Tom Wilkie <tom@grafana.com>
* Revert changes I don't understand
Can't see why do we need to hold a mutex on symbols, and the purpose of
the LabelNamesFor method.
Maybe I'll need to re-add this later.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Implement LabelNamesFor
This method provides the label names that appear in the postings
provided. We do that deeper than the label values because we know
beforehand that most of the label names we'll be the same across
different postings, and we don't want to go down an up looking up the
same symbols for all different series.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Mutex on symbols should be unlocked
However, I still don't understand why do we need a mutex here.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Fix head.LabelNamesFor
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Implement mockIndex LabelNames with matchers
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Nitpick on slice initialisation
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Add tests for LabelNamesWithMatchers
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Fix the mutex mess on head.LabelValues/LabelNames
I still don't see why we need to grab that unrelated mutex, but at least
now we're grabbing it consistently
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Check error after iterating postings
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Use the error from posting when there was en error in postings
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Update storage/interface.go comment
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
* Update tsdb/index/index.go comment
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
* Update tsdb/index/index.go wrapped error msg
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
* Update tsdb/index/index.go wrapped error msg
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
* Update tsdb/index/index.go warpped error msg
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
* Remove unneeded comment
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Add testcases for LabelNames w/matchers in api.go
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Use t.Cleanup() instead of defer in tests
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Co-authored-by: Tom Wilkie <tom@grafana.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
* Create experimental circular buffer resize method, benchmarks
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* Optimize exemplar resize to only replay as many exemplars as needed
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* More comments, benchmark AddExemplar
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* optimizations
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* comment
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* Slight refactor of resize benchmark + make use of resize via runtime
reloadable storage config.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Some more config related changes.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Address some review comments.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Address more review comments.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Refactor to remove usage of noopExemplarStorage and avoid race condition
when resizing from Head code.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Fix or add comments to clarify some of the new behaviour.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* fix potential panics related to negative exemplar buffer lengths
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: Callum Styan <callumstyan@gmail.com>
Fetch the low watermark value under the same lock as we need for the
appender, rather than releasing then re-aquiring a lock on the same
Mutex.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Do not panic on histoAppender.Append
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* M-map all chunks on shutdown
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Support negative schema for querying
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Append sparse histograms into the Head block
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Add AtHistogram() to Iterator interface. Make HistoChunk conform to Chunk interface.
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* integer types and timestamp separation
1) unify types to int64. as suggested by beorn. we want to support
counters going down (resets) even if we plan to create new chunks for
now, in that case
2) histogram type doesn't know its own timestamp. include it separately
in appending and iteration
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
* correction: count and zeroCount to remain unsigned
to make api more resilient and that's what we use in protobuf anyway
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
* temp hack. Ganesh will fix
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
* Added walreplay API endpoint
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Added starting page to react-ui
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Documented the new endpoint
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Fixed typos
Signed-off-by: Levi Harrison <git@leviharrison.dev>
Co-authored-by: Julius Volz <julius.volz@gmail.com>
* Removed logo
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Changed isResponding to isUnexpected
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Changed width of progress bar
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Changed width of progress bar
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Added DB stats object
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Updated starting page to work with new fields
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Passing nil
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Passing nil (pt. 2)
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Passing nil (pt. 3)
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Passing nil (and also implementing a method this time) (pt. 4)
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Passing nil (and also implementing a method this time) (pt. 5)
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Changed const to let
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Passing nil (pt. 6)
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Remove SetStats method
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Added comma
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Changed api
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Changed to triple equals
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Fixed data response types
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Don't return pointer
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Changed version
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Fixed interface issue
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Fixed pointer
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Fixed copying lock value error
Signed-off-by: Levi Harrison <git@leviharrison.dev>
Co-authored-by: Julius Volz <julius.volz@gmail.com>
* Write exemplars to the WAL and send them over remote write.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Update example for exemplars, print data in a more obvious format.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Add metrics for remote write of exemplars.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Fix incorrect slices passed to send in remote write.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* We need to unregister the new metrics.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Address review comments
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Order of exemplar append vs write exemplar to WAL needs to change.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Several fixes to prevent sending uninitialized or incorrect samples with an exemplar. Fix dropping exemplar for missing series. Add tests for queue_manager sending exemplars
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* Store both samples and exemplars in the same timeseries buffer to remove the alloc when building final request, keep sub-slices in separate buffers for re-use
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* Condense sample/exemplar delivery tests to parameterized sub-tests
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* Rename test methods for clarity now that they also handle exemplars
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* Rename counter variable. Fix instances where metrics were not updated correctly
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* Add exemplars to LoadWAL benchmark
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* last exemplars timestamp metric needs to convert value to seconds with
ms precision
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Process exemplar records in a separate go routine when loading the WAL.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Address review comments related to clarifying comments and variable
names. Also refactor sample/exemplar to enqueue prompb types.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Regenerate types proto with comments, update protoc version again.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Put remote write of exemplars behind a feature flag.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Address some of Ganesh's review comments.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Move exemplar remote write feature flag to a config file field.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Address Bartek's review comments.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Don't allocate exemplar buffers in queue_manager if we're not going to
send exemplars over remote write.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Add ValidateExemplar function, validate exemplars when appending to head
and log them all to WAL before adding them to exemplar storage.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Address more reivew comments from Ganesh.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Add exemplar total label length check.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Address a few last review comments
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: Martin Disibio <mdisibio@gmail.com>
* Add range query test cases
This includes a couple of failing ones that double count some points due
to the iterator seek bug.
Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
* Add Seek() implementation for memSafeIterator
Previously, calling memSafeIterator.Seek() would call the Seek() method
on its embedded iterator. This was causing the embedded iterator and the
memSafeIterator to get out of sync because when the embedded Seek()
moved to the next element of the embedded iterator, memSafeIterator
didn't "know" about it. memSafeIterator has to "know" when the embedded
iterator has moved to be able to work out when it should be reading from
its buffer rather than the embedded iterator.
Used same logic as for xorIterator.Seek() (which in runtime is used as
the embedded iterator) - return false if the iterator has an error and
try to move to next element if the required time hasn't been reached, or
if no elements have been read yet. The memSafeIterator.Next() method is
being called so memSafeIterator.i is always accurate.
Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
* Add tsdb package test
Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com>
The purpose of GetRef() is to allow Append() to be called without
the caller needing to copy the labels. To avoid a race where a series
is removed from TSDB between the calls to GetRef() and Append(), we
return TSDB's copy of the labels.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Add method to get reference number for TSDB Appender
In situations where we need to copy labels before calling Add(),
GetRef() allows to check first, then call AddFast() in the case that the series
is already known.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Add explicit interface for GetRef() method
Suggested in code review by @bwplotka
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Rename OptionalGetRef to GetRef
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Simplify return value of GetRef()
0 can be relied on to mean 'no reference'
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
This moves the label lookup into TSDB, whilst still keeping the cached-ref optimisation for repeated Appends.
This makes the API easier to consume and implement. In particular this change is motivated by the scrape-time-aggregation work, which I don't think is possible to implement without it as it needs access to label values.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
In the previous version, 1.18.0, the "megacheck" linter paid attention
to the '//lint:ignore' comment, but that is no longer there.
Newer version pay attention to '//nolint:<linter>,<linter>,...'
comments, optionally followed by a "second" comment introduced by '//'.
Update the directives to use this style.
This is related to prometheus/blackbox_exporter#738 and
prometheus/blackbox_exporter#745.
Signed-off-by: Marcelo E. Magallon <marcelo.magallon@grafana.com>
* Fix TSDB head struct dump on querier error
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Added mint/maxt to RangeHead.String()
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Set the min time of Head properly after truncation
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
* Fix lint
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
* Enhance compaction plan logic for completely deleted small block
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
* Fix review comments
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
* MultiError: Refactored MultiError for more concise and safe usage.
* Less lines
* Goland IDE was marking every usage of old MultiError "potential nil" error
* It was easy to forgot using Err() when error was returned, now it's safely assured on compile time.
NOTE: Potentially I would rename package to merrors. (: In different PR.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Addressed review comments.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Addressed comments.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Fix after rebase.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
As we're looking to expand what's in the WAL,
having old Prometheus servers ignore the new record types
rather than treating them as corruption allows for better
upgrade/downgrade paths.
Adjust some tests accordingly, so they're still testing what they're
meant to test.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* tsdb: Added ChunkQueryable implementations to db; unified compactor, querier and fanout block iterating.
Chained to https://github.com/prometheus/prometheus/pull/7059
* NewMerge(Chunk)Querier now takies multiple primaries allowing tsdb DB code to use it.
* Added single SeriesEntry / ChunkEntry for all series implementations.
* Unified all vertical, and non vertical for compact and querying to single
merge series / chunk sets by reusing VerticalSeriesMergeFunc for overlapping algorithm (same logic as before)
* Added block (Base/Chunk/)Querier for block querying. We then use populateAndTomb(Base/Chunk/) to iterate over chunks or samples.
* Refactored endpoint tests and querier tests to include subtests.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Addressed comments from Brian and Beorn.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Fixed snapshot test and added chunk iterator support for DBReadOnly.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Fixed race when iterating over Ats first.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Fixed tests.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Fixed populate block tests.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Fixed endpoints test.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Fixed test.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Added test & fixed case of head open chunk.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Fixed DBReadOnly tests and bug producing 1 sample chunks.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Added cases for partial block overlap for multiple full chunks.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Added extra tests for chunk meta after compaction.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Fixed small vertical merge bug and added more tests for that.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* tsdb/chunks: Replace sync/atomic with uber-go/atomic
Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>
* tsdb/heaad: Replace sync/atomic with uber-go/atomic
Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>
* vendor: Make go.uber.org/atomic a direct dependency
There is no modifications to go.sum and vendor/ because
it was already vendored.
Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>
* tsdb: Remove comments referring to the sync/atomic alignment bug
Related: https://golang.org/pkg/sync/atomic/#pkg-note-BUG
Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>
* no panic the head memseries has chunks in it
Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
* fix a panic when querying after a wal corruption.
Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
* review nits
Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
* Add test for reading the data after a wal corruption.
Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
Update tsdb/db_test.go
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Update tsdb/db_test.go
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
* spellings
Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
* Fix race during head compaction
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
* Comment out the test
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
* Skip test instead of commenting it out
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
* Track open appenders in doubly-linked list to make lowWatermark O(1).
* Use RW locks.
* Added BenchmarkIsolationWithState.
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
* add time range params to labelNames api
Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>
* evaluate min/max time range when reading labels from the head
Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>
* add time range params to labelValues api
Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>
* fix test, add docs
Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>
* add a test for head min max range
Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>
* fix test to match comment
Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>
* address CR comments
Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>
* combine vars only used once
Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>
* add time range params to labelNames api
Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>
* evaluate min/max time range when reading labels from the head
Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>
* add time range params to labelValues api
Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>
* fix test, add docs
Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>
* add a test for head min max range
Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>
* fix test to match comment
Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>
* address CR comments
Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>
* combine vars only used once
Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>
* fix test
Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>
* restart ci
Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>
* use range expectedLabelNames instead of range actualLabelNames in test
Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>
* Callbacks for lifecycle of series in TSDB
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
* Add more comments
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
When appending to the head and a chunk is full it is flushed to the disk and m-mapped (memory mapped) to free up memory
Prom startup now happens in these stages
- Iterate the m-maped chunks from disk and keep a map of series reference to its slice of mmapped chunks.
- Iterate the WAL as usual. Whenever we create a new series, look for it's mmapped chunks in the map created before and add it to that series.
If a head chunk is corrupted the currpted one and all chunks after that are deleted and the data after the corruption is recovered from the existing WAL which means that a corruption in m-mapped files results in NO data loss.
[Mmaped chunks format](https://github.com/prometheus/prometheus/blob/master/tsdb/docs/format/head_chunks.md) - main difference is that the chunk for mmaping now also includes series reference because there is no index for mapping series to chunks.
[The block chunks](https://github.com/prometheus/prometheus/blob/master/tsdb/docs/format/chunks.md) are accessed from the index which includes the offsets for the chunks in the chunks file - example - chunks of series ID have offsets 200, 500 etc in the chunk files.
In case of mmaped chunks, the offsets are stored in memory and accessed from that. During WAL replay, these offsets are restored by iterating all m-mapped chunks as stated above by matching the series id present in the chunk header and offset of that chunk in that file.
**Prombench results**
_WAL Replay_
1h Wal reply time
30% less wal reply time - 4m31 vs 3m36
2h Wal reply time
20% less wal reply time - 8m16 vs 7m
_Memory During WAL Replay_
High Churn:
10-15% less RAM - 32gb vs 28gb
20% less RAM after compaction 34gb vs 27gb
No Churn:
20-30% less RAM - 23gb vs 18gb
40% less RAM after compaction 32.5gb vs 20gb
Screenshots are in [this comment](https://github.com/prometheus/prometheus/pull/6679#issuecomment-621678932)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Prior to this commit we could have situations where we are creating an
appenderId but never creating an appender to go with it, therefore
blocking the low watermak.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
Previously we were keeping up to around 6 hours of WAL around by
removing 1/3 every hours. This was excessive, so switch to removing 2/3
which will up to around 3 hours of WAL around.
This will roughly halve the size of the WAL and halve startup time for
those who are I/O bound. This may increase the checkpoint size for
those with certain churn patterns, but by much less than we're saving
from the segments.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
This fixes#6992, which was introduced by #6777. There was an
intermediate component which translated TSDB errors into storage errors,
but that component was deleted and this bug went unnoticed, until we
were watching at the Prombench results. Without this, scrape will fail
instead of dropping samples or using "Add" when the series have been
garbage collected.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
With defer having less of a performance penalty, there is no reason
not to do those crucial operations via defer.
Context: With isolation in place, if we forget to Commit/Rollback, the
low watermark will get stuck forever.
The current code should not have any bugs, but moving to defer helps
to avoid future bugs.
This is also moving the `closeAppend` in the `Commit` implementation
itself to defer. If logging to the WAL fails, we would have missed the
`closeAppend`.
Signed-off-by: beorn7 <beorn@grafana.com>
I think the previous behavior is problematic as it will leave
`memSeries` around that still have `pendingCommit` set to `true`.
The only case where this can happen in this code path is a failure to
write to the WAL, in which case we are probably in trouble anyway. I
believe, however, we should still try to do the right thing and do the
full rollback. This will implicitly try to write to the WAL again, but
this time without samples, which may even succeed. (But we propagate
the previous error in any case.)
This also adds `a.head.putSeriesBuffer(a.sampleSeries)` to Rollback,
which was previously missing.
Signed-off-by: beorn7 <beorn@grafana.com>
This is taken from #6918. Since we probably won't merge #6918 before
the relase, we have to do this bit of it as it fixes an actual bug
(iso.closeAppend is not called if the append fails because of an error
logging to the WAL).
Signed-off-by: beorn7 <beorn@grafana.com>
* tsdb: don't allow ingesting empty labelsets
When we ingest an empty labelset in the head, further blocks can not be
compacted, with the error:
```
level=error ts=2020-02-27T21:26:58.379Z caller=db.go:659 component=tsdb
msg="compaction failed" err="persist head block: write compaction:
add series: out-of-order series added with label set \"{}\" / prev:
\"{}\""
```
We should therefore reject those invalid empty labelsets upfront.
This can be reproduced with the following:
```
cat << END > prometheus.yml
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 1s
basic_auth:
username: test
password: test
metric_relabel_configs:
- regex: ".*"
action: labeldrop
static_configs:
- targets:
- 127.0.1.1:9090
END
./prometheus --storage.tsdb.min-block-duration=1m
```
And wait a few minutes.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
Series() will fetch all the metadata for a series,
even if it's going to be filtered later due to time ranges.
For 1M series we save ~1.1s if you only needed some of the data, but take an
extra ~.2s if you did want everything.
benchmark old ns/op new ns/op delta
BenchmarkHeadSeries/1of1000000-4 1443715987 131553480 -90.89%
BenchmarkHeadSeries/10of1000000-4 1433394040 130730596 -90.88%
BenchmarkHeadSeries/100of1000000-4 1437444672 131360813 -90.86%
BenchmarkHeadSeries/1000of1000000-4 1438958659 132573137 -90.79%
BenchmarkHeadSeries/10000of1000000-4 1438061766 145742377 -89.87%
BenchmarkHeadSeries/100000of1000000-4 1455060948 281659416 -80.64%
BenchmarkHeadSeries/1000000of1000000-4 1633524504 1803550153 +10.41%
benchmark old allocs new allocs delta
BenchmarkHeadSeries/1of1000000-4 4000055 28 -100.00%
BenchmarkHeadSeries/10of1000000-4 4000073 87 -100.00%
BenchmarkHeadSeries/100of1000000-4 4000253 630 -99.98%
BenchmarkHeadSeries/1000of1000000-4 4002053 6036 -99.85%
BenchmarkHeadSeries/10000of1000000-4 4020053 60054 -98.51%
BenchmarkHeadSeries/100000of1000000-4 4200053 600074 -85.71%
BenchmarkHeadSeries/1000000of1000000-4 6000053 6000094 +0.00%
benchmark old bytes new bytes delta
BenchmarkHeadSeries/1of1000000-4 229192184 2488 -100.00%
BenchmarkHeadSeries/10of1000000-4 229193336 5568 -100.00%
BenchmarkHeadSeries/100of1000000-4 229204856 35536 -99.98%
BenchmarkHeadSeries/1000of1000000-4 229320056 345104 -99.85%
BenchmarkHeadSeries/10000of1000000-4 230472056 3894673 -98.31%
BenchmarkHeadSeries/100000of1000000-4 241992056 40511632 -83.26%
BenchmarkHeadSeries/1000000of1000000-4 357192056 402380440 +12.65%
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
This is part of https://github.com/prometheus/prometheus/pull/5882 that can be done to simplify things.
All todos I added will be fixed in follow up PRs.
* querier.Querier, querier.Appender, querier.SeriesSet, and querier.Series interfaces merged
with storage interface.go. All imports that.
* querier.SeriesIterator replaced by chunkenc.Iterator
* Added chunkenc.Iterator.Seek method and tests for xor implementation (?)
* Since we properly handle SelectParams for Select methods I adjusted min max
based on that. This should help in terms of performance for queries with functions like offset.
* added Seek to deletedIterator and test.
* storage/tsdb was removed as it was only a unnecessary glue with incompatible structs.
No logic was changed, only different source of abstractions, so no need for benchmarks.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Rather than buffer up symbols in RAM, do it one by one
during compaction. Then use the reader's symbol handling
for symbol lookups during the rest of the index write.
There is some slowdown in compaction, due to having to look through a file
rather than a hash lookup. This is noise to the overall cost of compacting
series with thousands of samples though.
benchmark old ns/op new ns/op delta
BenchmarkCompaction/type=normal,blocks=4,series=10000,samplesPerSeriesPerBlock=101-4 539917175 675341565 +25.08%
BenchmarkCompaction/type=normal,blocks=4,series=10000,samplesPerSeriesPerBlock=1001-4 2441815993 2477453524 +1.46%
BenchmarkCompaction/type=normal,blocks=4,series=10000,samplesPerSeriesPerBlock=2001-4 3978543559 3922909687 -1.40%
BenchmarkCompaction/type=normal,blocks=4,series=10000,samplesPerSeriesPerBlock=5001-4 8430219716 8586610007 +1.86%
BenchmarkCompaction/type=vertical,blocks=4,series=10000,samplesPerSeriesPerBlock=101-4 1786424591 1909552782 +6.89%
BenchmarkCompaction/type=vertical,blocks=4,series=10000,samplesPerSeriesPerBlock=1001-4 5328998202 6020839950 +12.98%
BenchmarkCompaction/type=vertical,blocks=4,series=10000,samplesPerSeriesPerBlock=2001-4 10085059958 11085278690 +9.92%
BenchmarkCompaction/type=vertical,blocks=4,series=10000,samplesPerSeriesPerBlock=5001-4 25497010155 27018079806 +5.97%
BenchmarkCompactionFromHead/labelnames=1,labelvalues=100000-4 2427391406 2817217987 +16.06%
BenchmarkCompactionFromHead/labelnames=10,labelvalues=10000-4 2592965497 2538805050 -2.09%
BenchmarkCompactionFromHead/labelnames=100,labelvalues=1000-4 2437388343 2668012858 +9.46%
BenchmarkCompactionFromHead/labelnames=1000,labelvalues=100-4 2317095324 2787423966 +20.30%
BenchmarkCompactionFromHead/labelnames=10000,labelvalues=10-4 2600239857 2096973860 -19.35%
benchmark old allocs new allocs delta
BenchmarkCompaction/type=normal,blocks=4,series=10000,samplesPerSeriesPerBlock=101-4 500851 470794 -6.00%
BenchmarkCompaction/type=normal,blocks=4,series=10000,samplesPerSeriesPerBlock=1001-4 821527 791451 -3.66%
BenchmarkCompaction/type=normal,blocks=4,series=10000,samplesPerSeriesPerBlock=2001-4 1141562 1111508 -2.63%
BenchmarkCompaction/type=normal,blocks=4,series=10000,samplesPerSeriesPerBlock=5001-4 2141576 2111504 -1.40%
BenchmarkCompaction/type=vertical,blocks=4,series=10000,samplesPerSeriesPerBlock=101-4 871466 841424 -3.45%
BenchmarkCompaction/type=vertical,blocks=4,series=10000,samplesPerSeriesPerBlock=1001-4 1941428 1911415 -1.55%
BenchmarkCompaction/type=vertical,blocks=4,series=10000,samplesPerSeriesPerBlock=2001-4 3071573 3041510 -0.98%
BenchmarkCompaction/type=vertical,blocks=4,series=10000,samplesPerSeriesPerBlock=5001-4 6771648 6741509 -0.45%
BenchmarkCompactionFromHead/labelnames=1,labelvalues=100000-4 731493 824888 +12.77%
BenchmarkCompactionFromHead/labelnames=10,labelvalues=10000-4 793918 887311 +11.76%
BenchmarkCompactionFromHead/labelnames=100,labelvalues=1000-4 811842 905204 +11.50%
BenchmarkCompactionFromHead/labelnames=1000,labelvalues=100-4 832244 925081 +11.16%
BenchmarkCompactionFromHead/labelnames=10000,labelvalues=10-4 921553 1019162 +10.59%
benchmark old bytes new bytes delta
BenchmarkCompaction/type=normal,blocks=4,series=10000,samplesPerSeriesPerBlock=101-4 40532648 35698276 -11.93%
BenchmarkCompaction/type=normal,blocks=4,series=10000,samplesPerSeriesPerBlock=1001-4 60340216 53409568 -11.49%
BenchmarkCompaction/type=normal,blocks=4,series=10000,samplesPerSeriesPerBlock=2001-4 81087336 72065552 -11.13%
BenchmarkCompaction/type=normal,blocks=4,series=10000,samplesPerSeriesPerBlock=5001-4 142485576 120878544 -15.16%
BenchmarkCompaction/type=vertical,blocks=4,series=10000,samplesPerSeriesPerBlock=101-4 208661368 203831136 -2.31%
BenchmarkCompaction/type=vertical,blocks=4,series=10000,samplesPerSeriesPerBlock=1001-4 347345904 340484696 -1.98%
BenchmarkCompaction/type=vertical,blocks=4,series=10000,samplesPerSeriesPerBlock=2001-4 585185856 576244648 -1.53%
BenchmarkCompaction/type=vertical,blocks=4,series=10000,samplesPerSeriesPerBlock=5001-4 1357641792 1358966528 +0.10%
BenchmarkCompactionFromHead/labelnames=1,labelvalues=100000-4 126486664 119666744 -5.39%
BenchmarkCompactionFromHead/labelnames=10,labelvalues=10000-4 122323192 115117224 -5.89%
BenchmarkCompactionFromHead/labelnames=100,labelvalues=1000-4 126404504 119469864 -5.49%
BenchmarkCompactionFromHead/labelnames=1000,labelvalues=100-4 119047832 112230408 -5.73%
BenchmarkCompactionFromHead/labelnames=10000,labelvalues=10-4 136576016 116634800 -14.60%
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
Rather than keeping the entire symbol table in memory, keep every nth
offset and walk from there to the entry we need. This ends up slightly
slower, ~360ms per 1M series returned from PostingsForMatchers which is
not much considering the rest of the CPU such a query would go on to
use.
Make LabelValues use the postings tables, rather than having
to do symbol lookups. Use yoloString, as PostingsForMatchers
doesn't need the strings to stick around and adjust the API
call to keep the Querier open until it's all marshalled.
Remove allocatedSymbols memory optimisation, we no longer keep all the
symbol strings in heap memory. Remove LabelValuesFor and LabelIndices,
they're dead code. Ensure we've still tests for label indices,
and add missing test that we can work with old V1 Format index files.
PostingForMatchers performance is slightly better, with a big drop in
allocation counts due to using yoloString for LabelValues:
benchmark old ns/op new ns/op delta
BenchmarkPostingsForMatchers/Block/n="1"-4 36698 36681 -0.05%
BenchmarkPostingsForMatchers/Block/n="1",j="foo"-4 522786 560887 +7.29%
BenchmarkPostingsForMatchers/Block/j="foo",n="1"-4 511652 537680 +5.09%
BenchmarkPostingsForMatchers/Block/n="1",j!="foo"-4 522102 564239 +8.07%
BenchmarkPostingsForMatchers/Block/i=~".*"-4 113689911 111795919 -1.67%
BenchmarkPostingsForMatchers/Block/i=~".+"-4 135825572 132871085 -2.18%
BenchmarkPostingsForMatchers/Block/i=~""-4 40782628 38038181 -6.73%
BenchmarkPostingsForMatchers/Block/i!=""-4 31267869 29194327 -6.63%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",j="foo"-4 112733329 111568823 -1.03%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",i!="2",j="foo"-4 112868153 111232029 -1.45%
BenchmarkPostingsForMatchers/Block/n="1",i!=""-4 31338257 29349446 -6.35%
BenchmarkPostingsForMatchers/Block/n="1",i!="",j="foo"-4 32054482 29972436 -6.50%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",j="foo"-4 136504654 133968442 -1.86%
BenchmarkPostingsForMatchers/Block/n="1",i=~"1.+",j="foo"-4 27960350 27264997 -2.49%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!="2",j="foo"-4 136765564 133860724 -2.12%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!~"2.*",j="foo"-4 163714583 159453668 -2.60%
benchmark old allocs new allocs delta
BenchmarkPostingsForMatchers/Block/n="1"-4 6 6 +0.00%
BenchmarkPostingsForMatchers/Block/n="1",j="foo"-4 11 11 +0.00%
BenchmarkPostingsForMatchers/Block/j="foo",n="1"-4 11 11 +0.00%
BenchmarkPostingsForMatchers/Block/n="1",j!="foo"-4 17 15 -11.76%
BenchmarkPostingsForMatchers/Block/i=~".*"-4 100012 12 -99.99%
BenchmarkPostingsForMatchers/Block/i=~".+"-4 200040 100040 -49.99%
BenchmarkPostingsForMatchers/Block/i=~""-4 200045 100045 -49.99%
BenchmarkPostingsForMatchers/Block/i!=""-4 200041 100041 -49.99%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",j="foo"-4 100017 17 -99.98%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",i!="2",j="foo"-4 100023 23 -99.98%
BenchmarkPostingsForMatchers/Block/n="1",i!=""-4 200046 100046 -49.99%
BenchmarkPostingsForMatchers/Block/n="1",i!="",j="foo"-4 200050 100050 -49.99%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",j="foo"-4 200049 100049 -49.99%
BenchmarkPostingsForMatchers/Block/n="1",i=~"1.+",j="foo"-4 111150 11150 -89.97%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!="2",j="foo"-4 200055 100055 -49.99%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!~"2.*",j="foo"-4 311238 111238 -64.26%
benchmark old bytes new bytes delta
BenchmarkPostingsForMatchers/Block/n="1"-4 296 296 +0.00%
BenchmarkPostingsForMatchers/Block/n="1",j="foo"-4 424 424 +0.00%
BenchmarkPostingsForMatchers/Block/j="foo",n="1"-4 424 424 +0.00%
BenchmarkPostingsForMatchers/Block/n="1",j!="foo"-4 552 1544 +179.71%
BenchmarkPostingsForMatchers/Block/i=~".*"-4 1600482 1606125 +0.35%
BenchmarkPostingsForMatchers/Block/i=~".+"-4 17259065 17264709 +0.03%
BenchmarkPostingsForMatchers/Block/i=~""-4 17259150 17264780 +0.03%
BenchmarkPostingsForMatchers/Block/i!=""-4 17259048 17264680 +0.03%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",j="foo"-4 1600610 1606242 +0.35%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",i!="2",j="foo"-4 1600813 1606434 +0.35%
BenchmarkPostingsForMatchers/Block/n="1",i!=""-4 17259176 17264808 +0.03%
BenchmarkPostingsForMatchers/Block/n="1",i!="",j="foo"-4 17259304 17264936 +0.03%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",j="foo"-4 17259333 17264965 +0.03%
BenchmarkPostingsForMatchers/Block/n="1",i=~"1.+",j="foo"-4 3142628 3148262 +0.18%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!="2",j="foo"-4 17259509 17265141 +0.03%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!~"2.*",j="foo"-4 20405680 20416944 +0.06%
However overall Select performance is down and involves more allocs, due to
having to do more than a simple map lookup to resolve a symbol and that all the strings
returned are allocated:
benchmark old ns/op new ns/op delta
BenchmarkQuerierSelect/Block/1of1000000-4 506092636 862678244 +70.46%
BenchmarkQuerierSelect/Block/10of1000000-4 505638968 860917636 +70.26%
BenchmarkQuerierSelect/Block/100of1000000-4 505229450 882150048 +74.60%
BenchmarkQuerierSelect/Block/1000of1000000-4 515905414 862241115 +67.13%
BenchmarkQuerierSelect/Block/10000of1000000-4 516785354 874841110 +69.29%
BenchmarkQuerierSelect/Block/100000of1000000-4 540742808 907030187 +67.74%
BenchmarkQuerierSelect/Block/1000000of1000000-4 815224288 1181236903 +44.90%
benchmark old allocs new allocs delta
BenchmarkQuerierSelect/Block/1of1000000-4 4000020 6000020 +50.00%
BenchmarkQuerierSelect/Block/10of1000000-4 4000038 6000038 +50.00%
BenchmarkQuerierSelect/Block/100of1000000-4 4000218 6000218 +50.00%
BenchmarkQuerierSelect/Block/1000of1000000-4 4002018 6002018 +49.97%
BenchmarkQuerierSelect/Block/10000of1000000-4 4020018 6020018 +49.75%
BenchmarkQuerierSelect/Block/100000of1000000-4 4200018 6200018 +47.62%
BenchmarkQuerierSelect/Block/1000000of1000000-4 6000018 8000019 +33.33%
benchmark old bytes new bytes delta
BenchmarkQuerierSelect/Block/1of1000000-4 176001468 227201476 +29.09%
BenchmarkQuerierSelect/Block/10of1000000-4 176002620 227202628 +29.09%
BenchmarkQuerierSelect/Block/100of1000000-4 176014140 227214148 +29.09%
BenchmarkQuerierSelect/Block/1000of1000000-4 176129340 227329348 +29.07%
BenchmarkQuerierSelect/Block/10000of1000000-4 177281340 228481348 +28.88%
BenchmarkQuerierSelect/Block/100000of1000000-4 188801340 240001348 +27.12%
BenchmarkQuerierSelect/Block/1000000of1000000-4 304001340 355201616 +16.84%
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
Rather than keeping the offset of each postings list, instead
keep the nth offset of the offset of the posting list. As postings
list offsets have always been sorted, we can then get to the closest
entry before the one we want an iterate forwards.
I haven't done much tuning on the 32 number, it was chosen to try
not to read through more than a 4k page of data.
Switch to a bulk interface for fetching postings. Use it to avoid having
to re-read parts of the posting offset table when querying lots of it.
For a index with what BenchmarkHeadPostingForMatchers uses RAM
for r.postings drops from 3.79MB to 80.19kB or about 48x.
Bytes allocated go down by 30%, and suprisingly CPU usage drops by
4-6% for typical queries too.
benchmark old ns/op new ns/op delta
BenchmarkPostingsForMatchers/Block/n="1"-4 35231 36673 +4.09%
BenchmarkPostingsForMatchers/Block/n="1",j="foo"-4 563380 540627 -4.04%
BenchmarkPostingsForMatchers/Block/j="foo",n="1"-4 536782 534186 -0.48%
BenchmarkPostingsForMatchers/Block/n="1",j!="foo"-4 533990 541550 +1.42%
BenchmarkPostingsForMatchers/Block/i=~".*"-4 113374598 117969608 +4.05%
BenchmarkPostingsForMatchers/Block/i=~".+"-4 146329884 139651442 -4.56%
BenchmarkPostingsForMatchers/Block/i=~""-4 50346510 44961127 -10.70%
BenchmarkPostingsForMatchers/Block/i!=""-4 41261550 35356165 -14.31%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",j="foo"-4 112544418 116904010 +3.87%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",i!="2",j="foo"-4 112487086 116864918 +3.89%
BenchmarkPostingsForMatchers/Block/n="1",i!=""-4 41094758 35457904 -13.72%
BenchmarkPostingsForMatchers/Block/n="1",i!="",j="foo"-4 41906372 36151473 -13.73%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",j="foo"-4 147262414 140424800 -4.64%
BenchmarkPostingsForMatchers/Block/n="1",i=~"1.+",j="foo"-4 28615629 27872072 -2.60%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!="2",j="foo"-4 147117177 140462403 -4.52%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!~"2.*",j="foo"-4 175096826 167902298 -4.11%
benchmark old allocs new allocs delta
BenchmarkPostingsForMatchers/Block/n="1"-4 4 6 +50.00%
BenchmarkPostingsForMatchers/Block/n="1",j="foo"-4 7 11 +57.14%
BenchmarkPostingsForMatchers/Block/j="foo",n="1"-4 7 11 +57.14%
BenchmarkPostingsForMatchers/Block/n="1",j!="foo"-4 15 17 +13.33%
BenchmarkPostingsForMatchers/Block/i=~".*"-4 100010 100012 +0.00%
BenchmarkPostingsForMatchers/Block/i=~".+"-4 200069 200040 -0.01%
BenchmarkPostingsForMatchers/Block/i=~""-4 200072 200045 -0.01%
BenchmarkPostingsForMatchers/Block/i!=""-4 200070 200041 -0.01%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",j="foo"-4 100013 100017 +0.00%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",i!="2",j="foo"-4 100017 100023 +0.01%
BenchmarkPostingsForMatchers/Block/n="1",i!=""-4 200073 200046 -0.01%
BenchmarkPostingsForMatchers/Block/n="1",i!="",j="foo"-4 200075 200050 -0.01%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",j="foo"-4 200074 200049 -0.01%
BenchmarkPostingsForMatchers/Block/n="1",i=~"1.+",j="foo"-4 111165 111150 -0.01%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!="2",j="foo"-4 200078 200055 -0.01%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!~"2.*",j="foo"-4 311282 311238 -0.01%
benchmark old bytes new bytes delta
BenchmarkPostingsForMatchers/Block/n="1"-4 264 296 +12.12%
BenchmarkPostingsForMatchers/Block/n="1",j="foo"-4 360 424 +17.78%
BenchmarkPostingsForMatchers/Block/j="foo",n="1"-4 360 424 +17.78%
BenchmarkPostingsForMatchers/Block/n="1",j!="foo"-4 520 552 +6.15%
BenchmarkPostingsForMatchers/Block/i=~".*"-4 1600461 1600482 +0.00%
BenchmarkPostingsForMatchers/Block/i=~".+"-4 24900801 17259077 -30.69%
BenchmarkPostingsForMatchers/Block/i=~""-4 24900836 17259151 -30.69%
BenchmarkPostingsForMatchers/Block/i!=""-4 24900760 17259048 -30.69%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",j="foo"-4 1600557 1600621 +0.00%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",i!="2",j="foo"-4 1600717 1600813 +0.01%
BenchmarkPostingsForMatchers/Block/n="1",i!=""-4 24900856 17259176 -30.69%
BenchmarkPostingsForMatchers/Block/n="1",i!="",j="foo"-4 24900952 17259304 -30.69%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",j="foo"-4 24900993 17259333 -30.69%
BenchmarkPostingsForMatchers/Block/n="1",i=~"1.+",j="foo"-4 3788311 3142630 -17.04%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!="2",j="foo"-4 24901137 17259509 -30.69%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!~"2.*",j="foo"-4 28693086 20405680 -28.88%
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Make WAL replay benchmark more representative
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
* Move decoding records from the WAL into goroutine
Decoding the WAL records accounts for a significant amount of time on
startup, and can be done in parallel with creating series/samples to
speed up startup. However, records still must be handled in order, so
only a single goroutine can do the decoding.
benchmark
old ns/op new ns/op delta
BenchmarkLoadWAL/batches=10,seriesPerBatch=100,samplesPerSeries=7200-8
481607033 391971490 -18.61%
BenchmarkLoadWAL/batches=10,seriesPerBatch=10000,samplesPerSeries=50-8
836394378 629067006 -24.79%
BenchmarkLoadWAL/batches=10,seriesPerBatch=1000,samplesPerSeries=480-8
348238658 234218667 -32.74%
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
* Adding TSDB Head Stats like cardinality to Status Page
Signed-off-by: Sharad Gaur <sgaur@splunk.com>
* Moving mutx to Head
Signed-off-by: Sharad Gaur <sgaur@splunk.com>
* Renaming variabls
Signed-off-by: Sharad Gaur <sgaur@splunk.com>
* Renaming variabls and html
Signed-off-by: Sharad Gaur <sgaur@splunk.com>
* Removing unwanted whitespaces
Signed-off-by: Sharad Gaur <sgaur@splunk.com>
* Adding Tests, Banchmarks and Max Heap for Postings Stats
Signed-off-by: Sharad Gaur <sgaur@splunk.com>
* Adding more tests for postingstats and web handler
Signed-off-by: Sharad Gaur <sgaur@splunk.com>
* Adding more tests for postingstats and web handler
Signed-off-by: Sharad Gaur <sgaur@splunk.com>
* Remove generated asset file that is no longer used
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
* Changing comment and variable name for more readability
Signed-off-by: Sharad Gaur <sgaur@splunk.com>
* Using time.Duration in postings status function and removing refresh button from web page
Signed-off-by: Sharad Gaur <sgaur@splunk.com>