This adds various new locks to replace the single big lock on
the head. All parts now must be COW as they may be held by clients
after initial retrieval.
Series by ID and hashes are now held in a stripe lock to reduce
contention and total holding time during GC. This should reduce
starvation of readers.
This changes the structure to a single WAL backed by a single head
block.
Parts of the head block can be compacted. This relieves us from any head
amangement and greatly simplifies any consistency and isolation concerns
by just having a single head.
This fixes the case where between block creations no compaction
plans are ran. We were not compacting anything in these
cases since the on creation the most recent head block always had
a high timestamp of 0.
This is because we cut a new block from where the snapshotted block ends
if we restore from backups and highTimestamp would be where we should be
starting from.
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
Let older head blocks be compacted once the newest once has samples at
50% of its total range. This allows the memory of the compacted blocks
to be released and garbage collected before a new head block gets
created. Thereby the number of head blocks is 1 or 2 instead of 2 or 3
and memory spikes are reduced.
* Make sure no reads happen on the block when delete is in progress.
* Fix bugs in compaction.
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
This parallelizes commits to prevent deadlocks across inconsistently
locked heads. As commits are currently not fully atomic across
heads, this does decrease our guarantees.
This adds a workaround to avoid deadlocks for inconsistent write lock
order across headBlocks.
Things keep working if transactions only append data for the same
timestamp, which is generally the case for Prometheus.
Full behavior should be restored in a subsequent change.
GC is triggered rarely, which may cause unnecessarily high memory
spikes when running several compaction cycles in a row. Explicitly run
GC so we don't have idle bytes marked as used from the previous cycle.
This fixes different race condition encoutnered when running Prometheus.
It reduces the overall performance in the synthetic benchmark a fair bit
but has no indiciations of impacting a real-world setup notably.
This adds the Queryable interface to the Block interface. Head and
persisted blocks now implement their own Querier() method and thus
isolate customization (e.g. remapPostings) more cleanly.
This adds more lower-leve interfaces which are used to compose
to different Block interfaces.
The DB only uses interfaces instead of explicit persistedBlock and
headBlock. The headBlock generation property is dropped as the use-case
can be implemented using block sequence numbers.
With hundreds of concurrent appenders the locking order between the
headBlocks on instantiating appenders and write locking the headmtx
is hard to impossible to get consistent.
Just never instantiate appenders while holding the headmtx lock in any
way.
This fixes a bug where the last WAL file was closed after consuming it
instead of being left open for further writes.
Reloading of blocks on startup considers loading head blocks now.
Introduce a seperate mutex for the head blocks to avoid a race where
a post-compaction reload may run between switching the DB's base mutex
to create a new head block in an appender.
This addresses an issue where the compaction triggered on cutting
a new block doesn't find anything as the writers are still active on the
block that should be ready for compaction.
This adds write path support for segmented chunk data files.
Files of 512MB are pre-allocated and written to. If the file size
is exceeded, the next file is started. On completion, files
are truncated to their final size.
File locks have a multitude of problems that make them hard to use
correctly. As they are just advisory, they are only meaningful to
prevent accidents like running the same process twice.
A simple PID file lock works reliably in those cases and is simpler.