* Write exemplars to the WAL and send them over remote write.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Update example for exemplars, print data in a more obvious format.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Add metrics for remote write of exemplars.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Fix incorrect slices passed to send in remote write.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* We need to unregister the new metrics.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Address review comments
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Order of exemplar append vs write exemplar to WAL needs to change.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Several fixes to prevent sending uninitialized or incorrect samples with an exemplar. Fix dropping exemplar for missing series. Add tests for queue_manager sending exemplars
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* Store both samples and exemplars in the same timeseries buffer to remove the alloc when building final request, keep sub-slices in separate buffers for re-use
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* Condense sample/exemplar delivery tests to parameterized sub-tests
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* Rename test methods for clarity now that they also handle exemplars
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* Rename counter variable. Fix instances where metrics were not updated correctly
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* Add exemplars to LoadWAL benchmark
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* last exemplars timestamp metric needs to convert value to seconds with
ms precision
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Process exemplar records in a separate go routine when loading the WAL.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Address review comments related to clarifying comments and variable
names. Also refactor sample/exemplar to enqueue prompb types.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Regenerate types proto with comments, update protoc version again.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Put remote write of exemplars behind a feature flag.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Address some of Ganesh's review comments.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Move exemplar remote write feature flag to a config file field.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Address Bartek's review comments.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Don't allocate exemplar buffers in queue_manager if we're not going to
send exemplars over remote write.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Add ValidateExemplar function, validate exemplars when appending to head
and log them all to WAL before adding them to exemplar storage.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Address more reivew comments from Ganesh.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Add exemplar total label length check.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Address a few last review comments
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: Martin Disibio <mdisibio@gmail.com>
* Added test to reproduce panic on TSDB head chunks truncated while querying
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Added test for Querier too
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Stop the bleed on mmap-ed head chunks panic
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Lower memory pressure in tests to ensure it doesn't OOM
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Skip TestQuerier_ShouldNotPanicIfHeadChunkIsTruncatedWhileReadingQueriedChunks
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Experiment to not trigger runtime.GC() continuously
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Try to fix test in CI
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Do not call runtime.GC() at all
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* I have no idea why it's failing in CI, skipping tests
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Snappy cannot encode records larger than ~3.7 GB and will panic if an
encoding is attempted. Check to make sure that the record is smaller
than this before encoding.
In the future, we could improve this behavior to still compress large
records (or break them up into smaller records), but this avoids the
panic for users with very large single scrape targets.
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
* Add range query test cases
This includes a couple of failing ones that double count some points due
to the iterator seek bug.
Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
* Add Seek() implementation for memSafeIterator
Previously, calling memSafeIterator.Seek() would call the Seek() method
on its embedded iterator. This was causing the embedded iterator and the
memSafeIterator to get out of sync because when the embedded Seek()
moved to the next element of the embedded iterator, memSafeIterator
didn't "know" about it. memSafeIterator has to "know" when the embedded
iterator has moved to be able to work out when it should be reading from
its buffer rather than the embedded iterator.
Used same logic as for xorIterator.Seek() (which in runtime is used as
the embedded iterator) - return false if the iterator has an error and
try to move to next element if the required time hasn't been reached, or
if no elements have been read yet. The memSafeIterator.Next() method is
being called so memSafeIterator.i is always accurate.
Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
* Add tsdb package test
Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com>
The purpose of GetRef() is to allow Append() to be called without
the caller needing to copy the labels. To avoid a race where a series
is removed from TSDB between the calls to GetRef() and Append(), we
return TSDB's copy of the labels.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Add method to get reference number for TSDB Appender
In situations where we need to copy labels before calling Add(),
GetRef() allows to check first, then call AddFast() in the case that the series
is already known.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Add explicit interface for GetRef() method
Suggested in code review by @bwplotka
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Rename OptionalGetRef to GetRef
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Simplify return value of GetRef()
0 can be relied on to mean 'no reference'
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
The main branch tests are not passing due to the fact that #8489 was not
rebased on top of #8007.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
This moves the label lookup into TSDB, whilst still keeping the cached-ref optimisation for repeated Appends.
This makes the API easier to consume and implement. In particular this change is motivated by the scrape-time-aggregation work, which I don't think is possible to implement without it as it needs access to label values.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Right now a new segment might be created unnecessarily if the
uncompressed record would not fit, but after compression (typically
reducing record size in half) it would.
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
* CleanupTombstones refactored, now reloading blocks after every compaction.
The goal is to remove deletable blocks after every compaction and, thus, decrease disk space used when cleaning tombstones.
Signed-off-by: arthursens <arthursens2005@gmail.com>
* Protect DB against parallel reloads
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
* Fix typos
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Co-authored-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
In the previous version, 1.18.0, the "megacheck" linter paid attention
to the '//lint:ignore' comment, but that is no longer there.
Newer version pay attention to '//nolint:<linter>,<linter>,...'
comments, optionally followed by a "second" comment introduced by '//'.
Update the directives to use this style.
This is related to prometheus/blackbox_exporter#738 and
prometheus/blackbox_exporter#745.
Signed-off-by: Marcelo E. Magallon <marcelo.magallon@grafana.com>
We're seeing compactions that are taking hours in Cortex which this is
missing. I know while it is not common in Prometheus, I am pretty sure
there are setups where compaction takes longer than 512s. On our own
Prometheus the average compaction duration is 566s.
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
* Fix TSDB head struct dump on querier error
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Added mint/maxt to RangeHead.String()
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* test: cleanup tempdir for TestBlockWriter
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
* test: cleanup tempdir for TestLogPartialWrite
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
* fix: remove pre-2.21 tmp blocks on start
Signed-off-by: Nguyen Le Vu Long <vulongvn98@gmail.com>
* fix: commenting
Signed-off-by: Nguyen Le Vu Long <vulongvn98@gmail.com>
* tsdb: Expose total number of label pairs in head
Signed-off-by: Nguyen Le Vu Long <vulongvn98@gmail.com>
* fix: add comment for NumLabelPairs
Signed-off-by: Nguyen Le Vu Long <vulongvn98@gmail.com>
* fix: remove comment
Signed-off-by: Nguyen Le Vu Long <vulongvn98@gmail.com>
* Logging added for when compaction takes more than the block time range
Signed-off-by: arthursens <arthursens2005@gmail.com>
* Log only if no errors were already logged
Signed-off-by: arthursens <arthursens2005@gmail.com>
* Log duration as human readable string
Signed-off-by: arthursens <arthursens2005@gmail.com>
* Move logging from compactHead() to Compact()
Signed-off-by: arthursens <arthursens2005@gmail.com>
* Compute duration of all head compactions plus wal truncation
Signed-off-by: arthursens <arthursens2005@gmail.com>
* Remove named return added os first commits
Signed-off-by: arthursens <arthursens2005@gmail.com>
* Address nits
Signed-off-by: arthursens <arthursens2005@gmail.com>
* Change miliseconds to seconds to make fuzzit tests happy
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
* Set the min time of Head properly after truncation
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
* Fix lint
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
* Enhance compaction plan logic for completely deleted small block
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
* Fix review comments
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
* Testify: move to require
Moving testify to require to fail tests early in case of errors.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* More moves
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* MultiError: Refactored MultiError for more concise and safe usage.
* Less lines
* Goland IDE was marking every usage of old MultiError "potential nil" error
* It was easy to forgot using Err() when error was returned, now it's safely assured on compile time.
NOTE: Potentially I would rename package to merrors. (: In different PR.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Addressed review comments.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Addressed comments.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Fix after rebase.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Don't use returned DB to close resources on TSDB startup error
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
* Add unit test and fix another panic
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
* Fix review comment
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
* Refactor test assertions
This pull request gets rid of assert.True where possible to use
fine-grained assertions.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Close resources after failing to startup TSDB
Signed-off-by: arthursens <arthursens2005@gmail.com>
* Return close error instead of logging
Signed-off-by: arthursens <arthursens2005@gmail.com>
* Change named return's name
Signed-off-by: arthursens <arthursens2005@gmail.com>
This is how much memory we use to load in the on-disk
symbol tables, not the size of the tables themselves.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
As we're looking to expand what's in the WAL,
having old Prometheus servers ignore the new record types
rather than treating them as corruption allows for better
upgrade/downgrade paths.
Adjust some tests accordingly, so they're still testing what they're
meant to test.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
Direct syscalls using syscall.Syscall(SYS_*, ...) should no longer be
used on darwin, see [1]. Instead, use the FcntlFstore libSystem wrapper
provided by the golang.org/x/sys/unix package to implement
preallocFixed.
[1] https://golang.org/doc/go1.12#darwin
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
* Chore: Log segment number when segment read failed
To manually fix the WAL files, it is good to know where the corrupt
happened so we should log the segment number when the read failed.
Related Issue #7506
Signed-off-by: gaston.qiu <gaston.qiu@umbocv.com>
* tsdb: Bug fix for further continued after crash deletions; added more tests.
Additionally: Added log line for block removal.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Addressed comment.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>