It does nothing for standard Prometheus builds with -tags stringlabels,
but the check to see if labels are being replaced has a cost.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Same as #15427 but for the new method added in #14144
Instead of allocating each ListPostings one by one, allocate them all in
one go.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Previously, prometheus_notifications_errors_total was incremented by
one whenever a batch of alerts was affected by an error during sending
to a specific alertmanager. However, the corresponding metric
prometheus_notifications_sent_total, counting all alerts that were
sent (including those where the sent ended in error), is incremented
by the batch size, i.e. the number of alerts.
Therefore, the ratio used in the mixin for the
PrometheusErrorSendingAlertsToSomeAlertmanagers alert is inconsistent.
This commit changes the increment of
prometheus_notifications_errors_total to the number of alerts that
were sent in the attempt that ended in an error. It also adjusts the
metrics help string accordingly and makes the wording in the alert in
the mixin more precise.
Signed-off-by: beorn7 <beorn@grafana.com>
* CHANGELOG - scraping change introduced in 2.52.0
Change introduced in #12933; old behavior partially recoved in #14685
Signed-off-by: Konrad <zuo.zp8@gmail.com>
Previously, the api was evaluating this regex to determine if the label
name was valid or not:
14bac55a99/model/labels.go (L94)
However, I believe that the `IsValid()` function is what ought to be
used in the brave new utf8 era.
**Before**
```
$ curl localhost:9090/api/v1/label/host.name/values
{"status":"error","errorType":"bad_data","error":"invalid label name: \"host.name\""}
```
**After**
```
$ curl localhost:9090/api/v1/label/host.name/values
{"status":"success","data":["localhost"]}
```
It's very likely that I'm missing something here or you were already
planning to do this at some point but I just encountered this issue and
figured I'd give it a go.
Signed-off-by: Owen Williams <owen.williams@grafana.com>
Reduce string manipulation by just cutting off the histogram suffixes from
the series name label once.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Resolves: #15433
When I converted prometheus to use slog in #14906, I update both the
`QueryLogger` interface, as well as how the log calls to the
`QueryLogger` were built up in `promql.Engine.exec()`. The backing
logger for the `QueryLogger` in the engine is a
`util/logging.JSONFileLogger`, and it's implementation of the `With()`
method updates the logger the logger in place with the new keyvals added
onto the underlying slog.Logger, which means they get inherited onto
everything after. All subsequent calls to `With()`, even in later
queries, would continue to then append on more and more keyvals for the
various params and fields built up in the logger. In turn, this causes
unbounded growth of the logger, leading to increased memory usage, and
in at least one report was the likely cause of an OOM kill. More
information can be found in the issue and the linked slack thread.
This commit does a few things:
- It was referenced in feedback in #14906 that it would've been better
to not change the `QueryLogger` interface if possible, this PR
proposes changes that bring it closer to alignment with the pre-3.0
`QueryLogger` interface contract
- reverts `promql.Engine.exec()`'s usage of the query logger to the
pattern of building up an array of args to pass at once to the end log
call. Avoiding the repetitious calls to `.With()` are what resolve the
issue with the logger growth/memory usage.
- updates the scrape failure logger to use the update `QueryLogger`
methods in the contract.
- updates tests accordingly
- cleans up unused methods
Builds and passes tests successfully. Tested locally and confirmed I
could no longer reproduce the issue/it resolved the issue.
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Previously, we managed to get rid of the sample on the left bound
later, so the problem didn't show up in the framework tests. But the
subqueries were still evaluation with the sample on the left bound,
taking space and showing up if returning the subquery result directly
(without further processing through PromQL like in all the framework
tests).
Signed-off-by: beorn7 <beorn@grafana.com>
promqltest: Complete the tests for info annotations
So far, we did not test for the _absence_ of an info annotation
(because many tests triggered info annotations, which we haven't taken
into account so far).
The test for info annotations was also missed for range queries.
This completes the tests for info annotations (and refactors the many
`if` statements into a somewhat more compact `switch` statement).
It fixes most tests to not emit an info annotation anymore. Or it
changes the `eval` to `eval_info` where we actually want to test for
the info annotation.
It also fixes a few spelling errors in comments.
---------
Signed-off-by: beorn7 <beorn@grafana.com>
Signed-off-by: Björn Rabenstein <github@rabenste.in>
Co-authored-by: Arve Knudsen <arve.knudsen@gmail.com>
Instead of allocating ListPostings pointers one by one, allocate a slice
and take pointers from that. It's faster, and also generates less
garbage (NewListPostings is one of the top offenders in number of
allocations).
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
This converts `TestNativeHistogram_SubOperator` to the promql testing framework. It also removes `TestNativeHistogram_Sum_Count_Add_AvgOperator`, which got converted earlier.
Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>
* tests(promql/testdata): add regression test for and-on
I'd like to use queries of the form "x and on() (vector(y)==1)" to be
able to include and exclude series for dashboards. This helps migration
to native histograms in dashboards by using a dashboard variable to
set "y" to either -1 or 1 to exclude or include the result.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
---------
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>