This is a bit more conservative than we could be. As long as a chunk
isn't the first in a block, we can be pretty sure that the previous
chunk won't disappear. However, the incremental gain of returning
NotCounterReset in these cases is probably very small and might not be
worth the code complications.
Wwith this, we now also pay attention to an explicitly set counter
reset during ingestion. While the case doesn't show up in practice
yet, there could be scenarios where the metric source knows there was
a counter reset even if it might not be visible from the values in the
histogram. It is also useful for testing.
Signed-off-by: beorn7 <beorn@grafana.com>
- Adjust doc comments to go1.19 style.
- Break down some overly long lines.
- Minor doc comment tweaks and fixes.
- Some renaming.
Some rationales for the last point:
I have renamed “interjections” into “inserts”, mostly because it is
shorter, and the word shows up a lot by now (and the concept is
cryptic enough to not obfuscate it even more with abbreviations).
I have also tried to find more descriptive naming for the “compare
spans” functions.
Signed-off-by: beorn7 <beorn@grafana.com>
This is an optimization on the existing append in OOOChunk.
What we've been doing so far is find the place inside the out-of-order
slice where the new sample should go in and then place it there and move
any samples to the right if necessary. This is OK but requires a binary
search every time the slice is bigger than 0.
The optimization is opinionated and suggests that although out-of-order
samples can be out-of-order amongst themselves they'll probably be in
order thus we can probably optimistically append at the end and if not
do the binary search.
OOOChunks are capped to 30 samples by default so this is a small
optimization but everything adds up, specially if you handle many active
timeseries with out-of-order samples.
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
Signed-off-by: Jesus Vazquez <jesusvazquez@users.noreply.github.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
* adapt code.go and write_handler.go to support float histograms
* adapt watcher.go to support float histograms
* wip adapt queue_manager.go to support float histograms
* address comments for metrics in queue_manager.go
* set test cases for queue manager
* use same counts for histograms and float histograms
* refactor createHistograms tests
* fix float histograms ref in watcher_test.go
* address PR comments
Signed-off-by: Marc Tuduri <marctc@protonmail.com>
This is to check if a gauge histogram can be appended to the given chunk.
If not, it tells what changes to make to the chunk and the histogram
if possible.
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
In the head and in v1 postings on disk, it makes no difference whether
postings are sorted. Only for v2 does the code step through in order.
So, move the sorting to where it is required, and thus skip it entirely
in the head.
Label values in on-disk blocks are already sorted, but `slices.Sort` is
very fast on already-sorted data so we don't bother checking.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
This allocates memory for all the returned values, which skews the
result. We aren't trying to benchmark `ExpandPostings`, so just step
through all the values without storing them to consume them.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Previously all the postings constructed were consumed on the first
iteration, so subsequent iterations did no work.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Extends Appender.AppendHistogram function to accept the FloatHistogram. TSDB supports appending, querying, WAL replay, for this new type of histogram.
Signed-off-by: Marc Tudurí <marctc@protonmail.com>
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
Instead of passing in a `ScratchBuilder` and `Labels`, just pass the
builder and the caller can extract labels from it. In many cases the
caller didn't use the Labels value anyway.
Now in `Labels.ScratchBuilder` we need a slightly different API: one
to assign what will be the result, instead of overwriting some other
`Labels`. This is safer and easier to reason about.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Implement decoding via labels.ScratchBuilder, which we retain and re-use
to reduce memory allocations.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
This necessitates a change to the `tsdb.IndexReader` interface:
`index.Reader` is used from multiple goroutines concurrently, so we
can't have state in it.
We do retain a `ScratchBuilder` in `blockBaseSeriesSet` which is
iterator-like.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Use simpler utility function to create Labels objects, making fewer
assumptions about the data structure.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Re-use previous memory if it is already of the correct type.
In `NewListSeries` we hoist the conversion to an interface value out
so it only allocates once.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Re-use previous memory if it is already of the correct type.
Also turn two levels of function closure into a single object that
holds the required data.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Patterned after `Chunk.Iterator()`: pass the old iterator in so it
can be re-used to avoid allocating a new object.
(This commit does not do any re-use; it is just changing all the method
signatures so re-use is possible in later commits.)
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
This adds negative buckets and access of float histograms to
TestHistogramChunkSameBuckets and TestHistogramChunkBucketChanges.
It also exercises a specific pattern of reusing an iterator (one where
no access has happened).
This exposes two bugs (where entries for positive buckets where used
where the corresponding entries for negative buckets should have been
used). One was fixed in #11627 (not merged), which triggered the work
in this commit.
This commit fixes both issues, so #11627 can be closed.
It also simplifies the code in the histogramIterator.Next method that
aims to recycle existing slice capacity.
Furthermore, this is on top of the release-2.40 branch because we
should probably cut a bugfix release for this.
Signed-off-by: beorn7 <beorn@grafana.com>
Typical parameters are one hour by 1 minute step, where the
function would allocate a slice of 3.6 million samples instead of 60.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Switch from 'sanity' to more inclusive lanuage
"Removing ableist language in code is important; it helps to create and
maintain an environment that welcomes all developers of all backgrounds,
while emphasizing that we as developers select the most articulate,
precise, descriptive language we can rather than relying on metaphors.
The phrase sanity check is ableist, and unnecessarily references mental
health in our code bases. It denotes that people with mental illnesses
are inferior, wrong, or incorrect, and the phrase sanity continues to be
used by employers and other individuals to discriminate against these
people."
From https://gist.github.com/seanmhanson/fe370c2d8bd2b3228680e38899baf5cc
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Inverting the test for chunks deleted by tombstones makes all three
rejections consistent, and also avoids the case where a chunk is
excluded but still causes `trimFront` or `trimBack` to be set.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
This reduces garbage, hence goes faster, when a short time range is
required compared to the amount of chunks in the block. For example
recording rules and alerts often look only at the last few minutes.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Add BenchmarkOpenBlock
* Use specific types when reading offset table
Instead of reading a generic-ish []string, we can read a generic type
which would be specifically labels.Label.
This avoid allocating a slice that escapes to the heap, making it both
faster and more efficient in terms of memory management.
* Update error message for unexpected number of keys
* s/posting offset table/postings offset table/
* Remove useless lastKey assignment
* Use two []bytes vars, simplify
Applied PR feedback: removed generics, moved the label indices reading
to that specific test as we're not using it in production anyway, we're
just testing what we've just built.
Also using two []bytes variables for name and value that use the backing
buffer instead of using strings, this reduces allocations a lot as we
only copy them when we store them (this is optimized by the compiler).
* Fix the dumb bug
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Co-authored-by: Marco Pracucci <marco@pracucci.com>
Function arguments in defer evaluated during definition of defer, not
during execution
Signed-off-by: Slavik Panasovets <slavik@google.com>
Signed-off-by: Slavik Panasovets <slavik@google.com>
With these changes, the "happy path" when the leading and trailing
number of bits don't need an update, fewer operations are needed.
The change is probably very marginal (no change in the benchmark added
here, but the benchmark also doesn't cover non-changing values), and
an argument could me made that avoiding pointers also has its
benefits.
However, I think that reducing the number of return values improves
readability. Which convinced me that I should at least propose this.
Signed-off-by: beorn7 <beorn@grafana.com>
The wlog.WL type can now be used to create a Write Ahead Log or a Write
Behind Log.
Before the prefix for wbl metrics was
'prometheus_tsdb_out_of_order_wal_' and has been replaced with
'prometheus_tsdb_out_of_order_wbl_'.
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
Signed-off-by: Jesus Vazquez <jesusvazquez@users.noreply.github.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
This call was added by PR #11075 merged before #11318 which changed all
similar calls to `sort.Sort` into a faster one.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Use new experimental package `golang.org/x/exp/slices`.
slices.Sort works on values that are directly comparable, like ints,
so avoids the overhad of an interface call to `.Less()`.
Left tests unchanged, because they don't need the speed and it may be
a cross-check that slices.Sort gives the same answer.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Use new experimental package `golang.org/x/exp/slices`.
Some of the speedup comes from comparing SeriesRef (which is an int64)
directly rather than through an interface `.Less()` call; some comes
from exp/slices using "pattern-defeating quicksort(pdqsort)".
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* tsdb/agent: fix application of defaults
MaxTS was being incorrectly constrained to the truncation interval
* add more tests to check validation
* force MaxWALTime = MinWALTime if min > max
Signed-off-by: Robert Fratto <robertfratto@gmail.com>
* TSDB chunks: remove race between writing and reading
Because the data is stored as a bit-stream, the last byte in the stream
could change if the stream is appended to after an Iterator is obtained.
Copy the last byte when the Iterator is created, so we don't have to
read it later.
Clarify in comments that concurrent Iterator and Appender are allowed,
but the chunk must not be modified while an Iterator is created.
(This was already the case, in order to copy the bstream slice header.)
* TSDB: stop saving last 4 samples in memSeries
This extra copy of the last 4 samples was introduced to avoid a race
condition between reading the last byte of the chunk and writing to it.
But now we have fixed that by having `bstreamReader` copy the last byte,
we don't need to copy the last 4 samples.
This change saves 56 bytes per series, which is very worthwhile when
you have millions or tens of millions of series.
* TSDB: tidy up stopIterator re-use
Previous changes have left this code duplicating some lines; pull
them out to a separate function and tidy up.
* TSDB head_test: stop checking when iterators are wrapped
The behaviour has changed so chunk iterators are only wrapped when
transaction isolation requires them to stop short of the end.
This makes tests fail which are checking the type.
Tests should check the observable behaviour, not the type.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
* tsdb: add a basic test for read/write isolation
* tsdb: store the min time with isolationAppender
So that we can see when appending has moved past a certain point in time.
* tsdb: allow RangeHead to have isolation disabled
This will be used when for head compaction.
* tsdb: do head compaction with isolation disabled
This saves a lot of work tracking appends done while compaction is ongoing.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* tsdb: remove chunkRange from memSeries
chunkRange is the (oddly-named) configured duration for the head block.
We don't need a copy of this value per series. Pass it down where
required, and remove the copy.
The value in `Head` is only updated in `resetInMemoryState()`, which
also discards all `memSeries`.
* tsdb: remove oooCapMax from memSeries
oooCapMax is the configured maximum capacity for an out-of-order chunk.
Storing it per-series uses extra memory, and has surprising behaviour
if users change the value in config - series created before the change
will keep their old value.
Instead, pass it down where required, and remove the per-series value.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Introduce out-of-order TSDB support
This implementation is based on this design doc:
https://docs.google.com/document/d/1Kppm7qL9C-BJB1j6yb6-9ObG3AbdZnFUBYPNNWwDBYM/edit?usp=sharing
This commit adds support to accept out-of-order ("OOO") sample into the TSDB
up to a configurable time allowance. If OOO is enabled, overlapping querying
are automatically enabled.
Most of the additions have been borrowed from
https://github.com/grafana/mimir-prometheus/
Here is the list ist of the original commits cherry picked
from mimir-prometheus into this branch:
- 4b2198d7ec
- 2836e5513f
- 00b379c3a5
- ff0dc75758
- a632c73352
- c6f3d4ab33
- 5e8406a1d4
- abde1e0ba1
- e70e769889
- df59320886
Co-authored-by: Jesus Vazquez <jesus.vazquez@grafana.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Dieter Plaetinck <dieter@grafana.com>
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* gofumpt files
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Add license header to missing files
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Fix OOO tests due to existing chunk disk mapper implementation
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Fix truncate int overflow
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Add Sync method to the WAL and update tests
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* remove useless sync
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Update minOOOTime after truncating Head
* Update minOOOTime after truncating Head
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Fix lint
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Add a unit test
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Load OutOfOrderTimeWindow only once per appender
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Fix OOO Head LabelValues and PostingsForMatchers
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Fix replay of OOO mmap chunks
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Remove unnecessary err check
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Prevent panic with ApplyConfig
Signed-off-by: Ganesh Vernekar 15064823+codesome@users.noreply.github.com
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Run OOO compaction after restart if there is OOO data from WBL
Signed-off-by: Ganesh Vernekar 15064823+codesome@users.noreply.github.com
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Apply Bartek's suggestions
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Refactor OOO compaction
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Address comments and TODOs
- Added a comment explaining why we need the allow overlapping
compaction toggle
- Clarified TSDBConfig OutOfOrderTimeWindow doc
- Added an owner to all the TODOs in the code
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Run go format
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Fix remaining review comments
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Fix tests
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Change wbl reference when truncating ooo in TestHeadMinOOOTimeUpdate
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Fix TestWBLAndMmapReplay test failure on windows
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Address most of the feedback
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Refactor the block meta for out of order
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Fix windows error
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Fix review comments
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: Ganesh Vernekar 15064823+codesome@users.noreply.github.com
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Dieter Plaetinck <dieter@grafana.com>
Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com>
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
If some shards did not get any samples mapped, the buffer will be empty
so sending it over the chan to `processWALSamples()` is a waste of time.
This is especially likely now we are checking `minValidTime` before
sending.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
The chunk pool belongs to the head not to the series. Pass it down where
required, and remove the copy of the pointer that `memSeries` was
holding.
`safeChunk` also needs to hold it, because in scenarios where it is used
we don't have a reference to the head. However it was already holding
`chunkDiskMapper` for the same reason, so no big change.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
There's no point splitting a sample onto the right shard and checking
if the series needs to be re-mapped, if we're only going to discard it
once it arrives at `ProcessWALSamples()`. Simply discard it earlier.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Add runtime config to control native histogram ingestion
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Make the config into a CLI flag
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
I find it useful to know why a restriction exists, to check whether that
reason still applies, or in which other places it might apply.
This is based on the note here:
https://github.com/prometheus/prometheus/pull/9270#pullrequestreview-743820956
on the PR where the original comment was added.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Update go to 1.19, set min version to 1.18
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
* Update golangci-lint
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
Metadata was added recently but doesn't seem to be used much, at least as far as I could identify.
Yet it's part of memSeries struct and so even when empty takes 48 bytes,
which is a lot given that without it memSeries requires 224 bytes.
This change turns it into a pointer on the struct, that get set only when metadata is actually set of given series.
Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
* Add histogram validation
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Correct negative offset validation
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Address review comments
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Validation benchmark
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Add more checks
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Attempt to fix tests
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Fix stuff
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* tsdb/record: Extract functions to encode and decode labels
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* tsdb: make use of Encode/Decode Labels
Simplify the code by re-using routines from tsdb/record.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Append metadata to the WAL
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Remove extra whitespace; Reword some docstrings and comments
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Use RLock() for hasNewMetadata check
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Use single byte for metric type in RefMetadata
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Update proposed WAL format for single-byte type metadata
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Implementa MetadataAppender interface for the Agent
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Address first round of review comments
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Amend description of metadata in wal.md
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Correct key used to retrieve metadata from cache
When we're setting metadata entries in the scrapeCace, we're using the
p.Help(), p.Unit(), p.Type() helpers, which retrieve the series name and
use it as the cache key. When checking for cache entries though, we used
p.Series() as the key, which included the metric name _with_ its labels.
That meant that we were never actually hitting the cache. We're fixing
this by utiling the __name__ internal label for correctly getting the
cache entries after they've been set by setHelp(), setType() or
setUnit().
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Put feature behind a feature flag
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Fix AppendMetadata docstring
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Reorder WAL format document
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Change error message of AppendMetadata; Fix access of s.meta in AppendMetadata
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Reuse temporary buffer in Metadata encoder
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Only keep latest metadata for each refID during checkpointing
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Fix test that's referencing decoding metadata
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Avoid creating metadata block if no new metadata are present
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Add tests for corrupt metadata block and relevant record type
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Fix CR comments
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Extract logic about changing metadata in an anonymous function
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Implement new proposed WAL format and amend relevant tests
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Use 'const' for metadata field names
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Apply metadata to head memSeries in Commit, not in AppendMetadata
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Add docstring and rename extracted helper in scrape.go
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Add tests for tsdb-related cases
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Fix linter issues vol1
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Fix linter issues vol2
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Fix Windows test by closing WAL reader files
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Use switch instead of two if statements in metadata decoding
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Fix review comments around TestMetadata* tests
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Add code for replaying WAL; test correctness of in-memory data after a replay
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Remove scrape-loop related code from PR
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Address first round of comments
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Simplify tests by sorting slices before comparison
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Fix test to use separate transactions
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Empty out buffer and record slices after encoding latest metadata
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Fix linting issue
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Update calculation for DroppedMetadata metric
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Rename MetadataAppender interface and AppendMetadata method to MetadataUpdater/UpdateMetadata
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Reuse buffer when encoding latest metadata for each series
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Fix review comments; Check all returned error values using two helpers
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Simplify use of helpers
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Satisfy linter
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
* Add exemplar record case
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* recordType() -> Type.String()
Signed-off-by: Levi Harrison <git@leviharrison.dev>
Previously, the maxTime wasn't updated properly in case of a recoding
happening.
My apologies for reformatting many lines for line length. During the
bug hunt, I tried to make things more readable in a reasonably wide
editor window.
Signed-off-by: beorn7 <beorn@grafana.com>
Previously, the maxTime wasn't updated properly in case of a recoding
happening.
My apologies for reformatting many lines for line length. During the
bug hunt, I tried to make things more readable in a reasonably wide
editor window.
Signed-off-by: beorn7 <beorn@grafana.com>
Relates to @bboreham optimization in https://github.com/prometheus/prometheus/pull/10859
Bryan did reduce the sleep time improving the deltas on the benchmark by
quite a lot. However I've been working on a similar implementation for
out of order and I noticed that we actually get into this method
thousands of times.
@ywwg had the brilliant idea of not always sleeping before the select
but actually make it a case in the select so we only sleep if we need
to.
The benchmark deltas are amazing
```
❯ benchstat old_implementation.txt new_implementation_using_time_after.txt
name old time/op new time/op delta
LoadWAL/batches=10,seriesPerBatch=100,samplesPerSeries=7200,exemplarsPerSeries=0,mmappedChunkT=0-8 521ms ±25% 253ms ± 6% -51.47% (p=0.008 n=5+5)
LoadWAL/batches=10,seriesPerBatch=100,samplesPerSeries=7200,exemplarsPerSeries=36,mmappedChunkT=0-8 773ms ± 3% 369ms ±31% -52.23% (p=0.008 n=5+5)
LoadWAL/batches=10,seriesPerBatch=100,samplesPerSeries=7200,exemplarsPerSeries=72,mmappedChunkT=0-8 592ms ±28% 297ms ±28% -49.80% (p=0.008 n=5+5)
LoadWAL/batches=10,seriesPerBatch=100,samplesPerSeries=7200,exemplarsPerSeries=360,mmappedChunkT=0-8 547ms ± 2% 999ms ±187% ~ (p=0.690 n=5+5)
LoadWAL/batches=10,seriesPerBatch=10000,samplesPerSeries=50,exemplarsPerSeries=0,mmappedChunkT=0-8 11.3s ± 4% 1.3s ±44% -88.48% (p=0.008 n=5+5)
LoadWAL/batches=10,seriesPerBatch=10000,samplesPerSeries=50,exemplarsPerSeries=2,mmappedChunkT=0-8 11.1s ± 1% 1.2s ±20% -89.08% (p=0.008 n=5+5)
LoadWAL/batches=10,seriesPerBatch=1000,samplesPerSeries=480,exemplarsPerSeries=0,mmappedChunkT=0-8 1.24s ± 3% 0.18s ± 7% -85.76% (p=0.008 n=5+5)
LoadWAL/batches=10,seriesPerBatch=1000,samplesPerSeries=480,exemplarsPerSeries=2,mmappedChunkT=0-8 1.24s ± 2% 0.18s ± 5% -85.24% (p=0.008 n=5+5)
LoadWAL/batches=10,seriesPerBatch=1000,samplesPerSeries=480,exemplarsPerSeries=5,mmappedChunkT=0-8 1.23s ± 5% 0.27s ±33% -77.73% (p=0.008 n=5+5)
LoadWAL/batches=10,seriesPerBatch=1000,samplesPerSeries=480,exemplarsPerSeries=24,mmappedChunkT=0-8 1.28s ± 1% 0.36s ± 7% -71.51% (p=0.008 n=5+5)
LoadWAL/batches=100,seriesPerBatch=1000,samplesPerSeries=480,exemplarsPerSeries=0,mmappedChunkT=3800-8 12.1s ± 1% 3.1s ± 6% -74.33% (p=0.008 n=5+5)
LoadWAL/batches=100,seriesPerBatch=1000,samplesPerSeries=480,exemplarsPerSeries=2,mmappedChunkT=3800-8 12.1s ± 1% 3.4s ± 4% -71.94% (p=0.008 n=5+5)
LoadWAL/batches=100,seriesPerBatch=1000,samplesPerSeries=480,exemplarsPerSeries=5,mmappedChunkT=3800-8 12.1s ± 1% 3.8s ±17% -68.35% (p=0.008 n=5+5)
LoadWAL/batches=100,seriesPerBatch=1000,samplesPerSeries=480,exemplarsPerSeries=24,mmappedChunkT=3800-8 12.4s ± 1% 4.0s ±18% -67.71% (p=0.008 n=5+5)
```
Benchmarked on Linux
```
goos: linux
goarch: amd64
pkg: github.com/prometheus/prometheus/tsdb
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
```
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Job queue
This PR reimplements chan chunkWriteJob with custom buffered queue that should use less memory, because it doesn't preallocate entire buffer for maximum queue size at once. Instead it allocates individual "segments" with smaller size.
As elements are added to the queue, they fill individual segments. When elements are removed from the queue (and segments), empty segments can be thrown away. This doesn't change memory usage of the queue when it's full, but should decrease its memory footprint when it's empty (queue will keep max 1 segment in such case).
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
* Modify test to work with low resolution timer.
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
* Improve comments.
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
* dont waste space on the chunkRefMap
* add time factor
* add comments
* better readability
* add instrumentation and more comments
* formatting
* uppercase comments
* Address review feedback. Renamed "free" to "shrink" everywhere, updated comments and threshold to 1000.
* double space
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Co-authored-by: Peter Štibraný <pstibrany@gmail.com>
Co-authored-by: Mauro Stettler <mauro.stettler@gmail.com>
To avoid building up data in memory, commit and make a new appender
periodically.
The number `commitAfter = 10000` was chosen arbitrarily; testing with
10x more or less gives slightly worse results.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
When restarting Prometheus I sometimes see:
caller=db.go:832 level=error component=tsdb msg="compaction failed" err="compact head: persist head block: 2 errors: populate block: context canceled; context canceled"
And prometheus_tsdb_compactions_failed_total metric gets incremented.
This makes it more difficult to write alerts based on
prometheus_tsdb_compactions_failed_total metric since any restart can
trigger it.
Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
The code sleeps for a short time to allow goroutines to finish, however
it seems the duration can be reduced a lot, speeding up the reading
process.
I checked using some WAL data from production, and the queue is almost
always empty at the time we enter `waitForIdle()` so there is no danger
of spinning in the tight loop.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Instead of creating a new hashing object every time, call `crc32.Checksum`
which computes the answer without allocations.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Rename walDir parameter to dir
Signed-off-by: Matej Gera <matejgera@gmail.com>
* Improve NewQueueManager comment
Signed-off-by: Matej Gera <matejgera@gmail.com>
This commit fixes a typo when reporting an error that the the symbols
table size has been exceeded.
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
On macOS, the TestTombstoneCleanRetentionLimitsRace performs very
poorly. It takes more than a second to write out one block, and as it
writes 400 of them, we run into the 10-minute test timeout frequently.
While this doesn't fix the actual performance issue, breaking each
iteration into a subtest makes the test pass reliably (because each
iteration comfortably finishes in under a minute).
Related report: https://groups.google.com/g/prometheus-developers/c/jxQ6Ayg6VJ4/m/03H_DS9PDAAJ
Signed-off-by: Matthias Rampke <matthias@prometheus.io>