This adds a workaround to avoid deadlocks for inconsistent write lock
order across headBlocks.
Things keep working if transactions only append data for the same
timestamp, which is generally the case for Prometheus.
Full behavior should be restored in a subsequent change.
GC is triggered rarely, which may cause unnecessarily high memory
spikes when running several compaction cycles in a row. Explicitly run
GC so we don't have idle bytes marked as used from the previous cycle.
This fixes different race condition encoutnered when running Prometheus.
It reduces the overall performance in the synthetic benchmark a fair bit
but has no indiciations of impacting a real-world setup notably.
This adds the Queryable interface to the Block interface. Head and
persisted blocks now implement their own Querier() method and thus
isolate customization (e.g. remapPostings) more cleanly.
This adds more lower-leve interfaces which are used to compose
to different Block interfaces.
The DB only uses interfaces instead of explicit persistedBlock and
headBlock. The headBlock generation property is dropped as the use-case
can be implemented using block sequence numbers.
With hundreds of concurrent appenders the locking order between the
headBlocks on instantiating appenders and write locking the headmtx
is hard to impossible to get consistent.
Just never instantiate appenders while holding the headmtx lock in any
way.
The position mapper was intended to pre-computed "expensive" ordering
of label sets. It was expensive to update and caused a lot of trouble.
Skipping this optimization entirely actually revelead it was pointless
and even harmful from the e2e perspective.
This adds handling for various corruption scenarios of the WAL.
If corruption is encountered, we truncate the WAL after the last valid
entry transparently and continue appending after the offset.
This adds handling for various corruption scenarios of the WAL.
If corruption is encountered, we truncate the WAL after the last valid
entry transparently and continue appending after the offset.
This has been a common source of hard to debug issues. Its a premature
and unbenchmarked optimization and semantically, we want ChunkMetas to
be references in all changed cases.
This fixes a bug where the last WAL file was closed after consuming it
instead of being left open for further writes.
Reloading of blocks on startup considers loading head blocks now.
This simplifies some of the iterators by loading chunks from the
ChunkReader earlier, filtering of chunks vs filtering or series is
split into separate iterators for easier testing
Introduce a seperate mutex for the head blocks to avoid a race where
a post-compaction reload may run between switching the DB's base mutex
to create a new head block in an appender.