This is done to prevent the latter operation from blocking/starving the former, as previously, the `tsets` channel was consumed by the same goroutine that consumes and feeds the buffered `n.more` channel, the `tsets` channel was less likely to be ready as it's unbuffered and only fed every `SDManager.updatert` seconds.
See https://github.com/prometheus/prometheus/issues/13676 and https://github.com/prometheus/prometheus/issues/8768
The synchronization with the sendLoop goroutine is managed through the n.mtx mutex.
This uses a similar approach than scrape manager's efbd6e41c5/scrape/manager.go (L115-L117)
The old TestHangingNotifier was replaced by the new one to more closely reflect reality.
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
to show "targets groups update" starvation when the notifications queue is full and an Alertmanager
is down.
The existing `TestHangingNotifier` that was added in https://github.com/prometheus/prometheus/pull/10948 doesn't really reflect the reality as the SD changes are manually fed into `syncCh` in a continuous way, whereas in reality, updates are only resent every `updatert`.
The test added here sets up an SD manager and links it to the notifier. The SD changes will be triggered by that manager as it's done in reality.
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
Co-authored-by: Ethan Hunter <ehunter@hudson-trading.com>
* clarify backup requirements for storage
After reading this (again) recently, I was under the impression that our backup strategy ("just throw Bacula at it") was just not good enough and that our backups were inconsistent. I filed [an issue internally][41627] about this because of that concern.
But reading a conversation with @SuperQ on IRC, I came under the impression that only the WAL files would be lost. This is an attempt at documenting this more clearly.
[41627]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/41627
---------
Signed-off-by: anarcat <anarcat@users.noreply.github.com>
Co-authored-by: Ben Kochie <superq@gmail.com>
* Pass affected labels to MemPostings.Delete
As suggested by @bboreham, we can track the labels of the deleted series
and avoid iterating through all the label/value combinations.
This looks much faster on the MemPostings.Delete call. We don't have a
benchmark on stripeSeries.gc() where we'll pay the price of iterating
the labels of each one of the deleted series.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Reduce the flakiness of TestAsyncRuleEvaluation
This tests sleeps for 15 millisecond per rule group, and then comprares
the entire execution time to be smaller than a multiple of that delay.
The ruleCount is 6, so it assumes that the test will come to the
assertions in less than 90ms.
Meanwhile, the Github's Windows runner:
- ...Huh, oh? What? How much time? milliwhat? Sorry I don't speak that.
TL;DR, this increases the delay to 250 millisecond. This won't prevent
the test from being flaky, but will reduce the flakiness by several
orders of magnitude and hopefully won't be an issue anymore.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Make tests parallel
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
---------
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* add hook to allow head compaction to create multiple output blocks
Signed-off-by: Ben Ye <benye@amazon.com>
* change Compact interface; remove BlockPopulator changes
Signed-off-by: Ben Ye <benye@amazon.com>
* rebase main
Signed-off-by: Ben Ye <benye@amazon.com>
* fix lint
Signed-off-by: Ben Ye <benye@amazon.com>
* fix unit test
Signed-off-by: Ben Ye <benye@amazon.com>
* address feedbacks; add unit test
Signed-off-by: Ben Ye <benye@amazon.com>
* Apply suggestions from code review
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Update tsdb/compact_test.go
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
---------
Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
The only call we have to LabelValuesFor() has an index.Postings, and we
expand it to pass to this method, which will iterate over the values.
That's a waste of resources: we can iterate on the index.Postings
directly.
If there's any downstream implementation that has a slice of series,
they can always do an index.ListPostings from them: doing that is
cheaper than expanding an abstract index.Postings.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Converted string to standarized form
* Added golang.org/x/text in Go dependencies
* Added test cases for FastRegexMatcher
* Added benchmark for toNormalizedLower
Signed-off-by: RA <ranveeravhad777@gmail.com>
* MemPostings.PostingsForLabelMatching: let mutex go
This changes the `MemPostings.PostingsForLabelMatching` implementation
to stop holding the read mutex while matching the label values.
We've seen that this method can be slow when the matcher is expensive,
that's why we even added a context expiration check.
However, there are critical process that might be waiting on this mutex:
writes (adding new series) and compaction (deleting the
garbage-collected ones), so we should avoid holding it for a long period
of time.
Given that we've copied the values to a slice anyway, there's no need to
hold the lock while matching.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* MemPostings: reduce locking/unlocking
MemPostings.Delete is called from Head.gc(), i.e. it gets the IDs of the
series that have churned.
I'd assume that many label values aren't affected by that churn at all,
so it doesn't make sense to touch the lock while checking them.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
It's quite common during the compaction cycle to hold series IDs for
series that aren't in the TSDB head anymore.
We shouldn't fail if that happens, as the caller has no way to figure
out which one of the IDs doesn't exist.
Fixes https://github.com/prometheus/prometheus/issues/14278
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Add ability to assert that a query fails with a particular error message
This also adds documentation for the test scripting language in general,
including the new feature.
Signed-off-by: Charles Korn <charles.korn@grafana.com>
---------
Signed-off-by: Charles Korn <charles.korn@grafana.com>
This also exercises the "fast path" (only decoding count and sum),
i.e. where the counter reset isn't visible at all in the decoded data.
Signed-off-by: beorn7 <beorn@grafana.com>
Implement histogram statistics decoder
This commit speeds up histogram_count and histogram_sum
functions on native histograms. The idea is to have separate decoders which can be
used by the engine to only read count/sum values from histogram objects. This should help
with reducing allocations when decoding histograms, as well as with speeding up aggregations
like sum since they will be done on floats and not on histogram objects.
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
---------
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Co-authored-by: Anthony Mirabella <a9@aneurysm9.com>