Resolves: #15433
When I converted prometheus to use slog in #14906, I update both the
`QueryLogger` interface, as well as how the log calls to the
`QueryLogger` were built up in `promql.Engine.exec()`. The backing
logger for the `QueryLogger` in the engine is a
`util/logging.JSONFileLogger`, and it's implementation of the `With()`
method updates the logger the logger in place with the new keyvals added
onto the underlying `slog.Logger`, which means they get inherited onto
everything after. All subsequent calls to `With()`, even in later
queries, would continue to then append on more and more keyvals for the
various params and fields built up in the logger. In turn, this causes
unbounded growth of the logger, leading to increased memory usage, and
in at least one report was the likely cause of an OOM kill. More
information can be found in the issue and the linked slack thread.
This commit does a few things:
- It was referenced in feedback in #14906 that it would've been better
to not change the `QueryLogger` interface if possible, this PR
proposes changes that bring it closer to alignment with the pre-3.0
`QueryLogger` interface contract
- reverts `promql.Engine.exec()`'s usage of the query logger to the
pattern of building up an array of args to pass at once to the end log
call. Avoiding the repetitious calls to `.With()` are what resolve the
issue with the logger growth/memory usage.
- updates the scrape failure logger to use the update `QueryLogger`
methods in the contract.
- updates tests accordingly
- cleans up unused methods
Builds and passes tests successfully. Tested locally and confirmed I
could no longer reproduce the issue/it resolved the issue.
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Instead of allocating ListPostings pointers one by one, allocate a slice
and take pointers from that. It's faster, and also generates less
garbage (NewListPostings is one of the top offenders in number of
allocations).
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
This converts `TestNativeHistogram_SubOperator` to the promql testing framework. It also removes `TestNativeHistogram_Sum_Count_Add_AvgOperator`, which got converted earlier.
Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>
* tests(promql/testdata): add regression test for and-on
I'd like to use queries of the form "x and on() (vector(y)==1)" to be
able to include and exclude series for dashboards. This helps migration
to native histograms in dashboards by using a dashboard variable to
set "y" to either -1 or 1 to exclude or include the result.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
---------
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Simple follow-up to #13620. Modify `tsdb.PostingsForMatchers` to use the optimized tsdb.IndexReader.PostingsForLabelMatching method also for inverse matching.
Introduce method `PostingsForAllLabelValues`, to avoid changing the existing method.
The performance is much improved for a subset of the cases; there are up to
~60% CPU gains and ~12.5% reduction in memory usage.
Remove `TestReader_InversePostingsForMatcherHonorsContextCancel` since
`inversePostingsForMatcher` only passes `ctx` to `IndexReader` implementations now.
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
* Fix issue where comparison operations with `bool` modifier and native histograms return histograms rather than 0 or 1
* Don't emit anything for comparisons between floats and histograms when `bool` modifier is used
* Don't emit anything for comparisons between floats and histograms when `bool` modifier is used between a vector and a scalar
---------
Signed-off-by: Charles Korn <charles.korn@grafana.com>
When a remote-write is executed towards a host name that is resolved to multiple IP addresses, this PR introduces a possibility to force creation of new connections used for the remote-write request to a randomly chosen IP address from the ones corresponding to the host name. The default behavior remains unchanged, i.s., the IP address used for the connection creation remains the one chosen by Go.
This is an experimental feature, it is disabled by default.
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
PromQL: Correct the behaviour of some operator and aggregators with Native Histograms
---------
Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>
Always return unknown hint for first sample in non-gauge histogram chunk
---------
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Co-authored-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Enable the `auto-gomaxprocs` feature flag by default.
* Add command line flag `--no-auto-gomaxprocs` to disable.
Signed-off-by: SuperQ <superq@gmail.com>
When we had a syntax error but restored the old file, we did not
re-trigger the config reload, so the config reload metric was showing
that config reload was unsucessful.
I made magic to handle logs in cmd/prometheus.
For now it is a separate file so we can backport this easily.
I will generalize the helper in another PR.
Signed-off-by: Julien <roidelapluie@o11y.eu>