promql: fix some aggregations for histograms
This PR fixes the behaviour of `topk`,`bottomk`, `limitk` and `limit_ratio` with histograms. The fixed behaviour are as follows:
- For `topk` and `bottomk` histograms are ignored and add info annotations added.
- For `limitk` and `limit_ratio` histograms are included in the results(if applicable).
Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>
Resolves: #15433
When I converted prometheus to use slog in #14906, I update both the
`QueryLogger` interface, as well as how the log calls to the
`QueryLogger` were built up in `promql.Engine.exec()`. The backing
logger for the `QueryLogger` in the engine is a
`util/logging.JSONFileLogger`, and it's implementation of the `With()`
method updates the logger the logger in place with the new keyvals added
onto the underlying slog.Logger, which means they get inherited onto
everything after. All subsequent calls to `With()`, even in later
queries, would continue to then append on more and more keyvals for the
various params and fields built up in the logger. In turn, this causes
unbounded growth of the logger, leading to increased memory usage, and
in at least one report was the likely cause of an OOM kill. More
information can be found in the issue and the linked slack thread.
This commit does a few things:
- It was referenced in feedback in #14906 that it would've been better
to not change the `QueryLogger` interface if possible, this PR
proposes changes that bring it closer to alignment with the pre-3.0
`QueryLogger` interface contract
- reverts `promql.Engine.exec()`'s usage of the query logger to the
pattern of building up an array of args to pass at once to the end log
call. Avoiding the repetitious calls to `.With()` are what resolve the
issue with the logger growth/memory usage.
- updates the scrape failure logger to use the update `QueryLogger`
methods in the contract.
- updates tests accordingly
- cleans up unused methods
Builds and passes tests successfully. Tested locally and confirmed I
could no longer reproduce the issue/it resolved the issue.
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Previously, we managed to get rid of the sample on the left bound
later, so the problem didn't show up in the framework tests. But the
subqueries were still evaluation with the sample on the left bound,
taking space and showing up if returning the subquery result directly
(without further processing through PromQL like in all the framework
tests).
Signed-off-by: beorn7 <beorn@grafana.com>
* Fix issue where comparison operations with `bool` modifier and native histograms return histograms rather than 0 or 1
* Don't emit anything for comparisons between floats and histograms when `bool` modifier is used
* Don't emit anything for comparisons between floats and histograms when `bool` modifier is used between a vector and a scalar
---------
Signed-off-by: Charles Korn <charles.korn@grafana.com>
PromQL: Correct the behaviour of some operator and aggregators with Native Histograms
---------
Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>
promql: Fix stddev/stdvar when aggregating histograms, NaNs, and Infs
Native histograms are ignored when calculating stddev or stdvar.
However, for the first series of each group, a `groupedAggregation` is
always created. If the first series that was encountered is a histogram
then it acts as the equivalent of a 0 point.
This change creates the first `groupedAggregation` with the `seen` field set to `false` if the point is a
histogram, thus ignoring it like the rest of the aggregation function does. A new `groupedAggregation`
will then be created once an actual float value is encountered.
This commit also sets the `floatValue` field of the `groupedAggregation` to `NaN`, if the first
float value of a group is `NaN` or `±Inf`, so that the outcome is consistently `NaN` once those
values are in the mix.
(The added tests fail without this change).
Signed-off-by: Joshua Hesketh <josh@nitrotech.org>
Signed-off-by: beorn7 <beorn@grafana.com>
---------
Signed-off-by: Joshua Hesketh <josh@nitrotech.org>
Signed-off-by: beorn7 <beorn@grafana.com>
Co-authored-by: beorn7 <beorn@grafana.com>
The `info` function is an experiment to improve UX
around including labels from info metrics.
`info` has to be enabled via the feature flag `--enable-feature=promql-experimental-functions`.
This MVP of info simplifies the implementation by assuming:
* Only support for the target_info metric
* That target_info's identifying labels are job and instance
Also:
* Encode info samples' original timestamp as sample value
* Deduce info series select hints from top-most VectorSelector
---------
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Co-authored-by: Ying WANG <ying.wang@grafana.com>
Co-authored-by: Augustin Husson <augustin.husson@amadeus.com>
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
Co-authored-by: Bryan Boreham <bjboreham@gmail.com>
promql: corrects binary operators functioning for mixed sample with histogram and float
For invalid pairings of sample types, an annotation is added now.
Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>
---------
Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>
For: #14355
This commit updates Prometheus to adopt stdlib's log/slog package in
favor of go-kit/log. As part of converting to use slog, several other
related changes are required to get prometheus working, including:
- removed unused logging util func `RateLimit()`
- forward ported the util/logging/Deduper logging by implementing a small custom slog.Handler that does the deduping before chaining log calls to the underlying real slog.Logger
- move some of the json file logging functionality to use prom/common package functionality
- refactored some of the new json file logging for scraping
- changes to promql.QueryLogger interface to swap out logging methods for relevant slog sugar wrappers
- updated lots of tests that used/replicated custom logging functionality, attempting to keep the logical goal of the tests consistent after the transition
- added a healthy amount of `if logger == nil { $makeLogger }` type conditional checks amongst various functions where none were provided -- old code that used the go-kit/log.Logger interface had several places where there were nil references when trying to use functions like `With()` to add keyvals on the new *slog.Logger type
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
* PromQL.Engine: Refactor Matrix expansion into a method
Add utility method promql.evaluator.expandSeriesToMatrix, for expanding a slice
of storage.Series into a promql.Matrix.
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
* Rename to generateMatrix
Rename evaluator.expandSeriesToMatrix into generateMatrix, while also dropping
the start, end, interval arguments since they are evaluator fields.
Write more extensive method documentation.
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
* Rename to evalVectorSelector
Rename to evalVectorSelector after discussing with @michahoffmann.
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
---------
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
promql: correctly handle unary negation of native histograms and add tests for multiplication and division of native histograms by negative scalars
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Make rate possible non-counter annotation consistent
Previously a PossibleNonCounterInfo annotation would be left in cases
where a range-vector selects 1 float data point, even if no more points
are selected in order to calculate a rate.
This change ensures an output float exists before emitting such an
annotation.
This fixes an inconsistency where a series with mixed data (ie, a float
and a native histogram) would emit an annotation without any points.
For example,
```
load 1m
series{label="a"} 1 {{schema:1 sum:10 count:5 buckets:[1 2 3]}}
eval instant at 1m rate(series[1m1s])
```
Would have a PossibleNonCounterInfo annotation.
Wheras
```
load 1m
series{label="a"} {{schema:1 sum:10 count:5 buckets:[1 2 3]}} {{schema:1 sum:15 count:10 buckets:[1 2 3]}}
eval instant at 1m rate(series[1m1s])
```
Would not.
---------
Signed-off-by: Joshua Hesketh <josh@nitrotech.org>
Conflicts:
cmd/prometheus/main.go
docs/command-line/prometheus.md
docs/feature_flags.md
web/ui/build_ui.sh
web/web.go
Resolved by dropping the UTF-8 feature flag and adding the
`auto-reload-config` feature flag.
For the new web ui pick all changes from `main`.
For aggregates, operators, calls, show what operation is performed.
Also add an event when series are expanded, typically time spent
accessing TSDB.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
PromQL engine: Delay deletion of __name__ label to the end of the query evaluation
- This change allows optionally preserving the `__name__` label via the `label_replace` and `label_join` functions, and helps prevent the dreaded "vector cannot contain metrics with the same labelset" error.
- The implementation extends the `Series` and `Sample` structs with a boolean flag indicating whether the `__name__` label should be deleted at the end of the query evaluation.
- The `label_replace` and `label_join` functions can still access the value of the `__name__` label, even if it has been previously marked for deletion. If `__name__` is used as target label, it won't be dropped at the end of the query evaluation.
- Fixes https://github.com/prometheus/prometheus/issues/11397
- See https://github.com/jcreixell/prometheus/pull/2 for previous discussion, including the decision to create this PR and benchmark it before considering other alternatives (like refactoring `labels.Labels`).
- See https://github.com/jcreixell/prometheus/pull/1 for an alternative implementation using a special label instead of boolean flags.
- Note: a feature flag `promql-delayed-name-removal` has been added as it changes the behavior of some "weird" queries (see https://github.com/prometheus/prometheus/issues/11397#issuecomment-1451998792)
Example (this always fails, as `__name__` is being dropped by `count_over_time`):
```
count_over_time({__name__!=""}[1m])
=> Error executing query: vector cannot contain metrics with the same labelset
```
Before:
```
label_replace(count_over_time({__name__!=""}[1m]), "__name__", "count_$1", "__name__", "(.+)")
=> Error executing query: vector cannot contain metrics with the same labelset
```
After:
```
label_replace(count_over_time({__name__!=""}[1m]), "__name__", "count_$1", "__name__", "(.+)")
=>
count_go_gc_cycles_automatic_gc_cycles_total{instance="localhost:9090", job="prometheus"} 1
count_go_gc_cycles_forced_gc_cycles_total{instance="localhost:9090", job="prometheus"} 1
...
```
Signed-off-by: Jorge Creixell <jcreixell@gmail.com>
---------
Signed-off-by: Jorge Creixell <jcreixell@gmail.com>
Signed-off-by: Björn Rabenstein <github@rabenste.in>
Several things done here:
- Set `max-issues-per-linter` to 0 so that we actually see all linter
warnings and not just 50 per linter. (As we also set
`max-same-issues` to 0, I assume this was the intention from the
beginning.)
- Stop using the golangci-lint default excludes (by setting
`exclude-use-default: false`. Those are too generous and don't match
our style conventions. (I have re-added some of the excludes
explicitly in this commit. See below.)
- Re-add the `errcheck` exclusion we have used so far via the
defaults.
- Exclude the signature requirement `govet` has for `Seek` methods
because we use non-standard `Seek` methods a lot. (But we keep other
requirements, while the default excludes completely disabled the
check for common method segnatures.)
- Exclude warnings about missing doc comments on exported symbols. (We
used to be pretty adamant about doc comments, but stopped that at
some point in the past. By now, we have about 500 missing doc
comments. We may consider reintroducing this check, but that's
outside of the scope of this commit. The default excludes of
golangci-lint essentially ignore doc comments completely.)
- By stop using the default excludes, we now get warnings back on
malformed doc comments. That's the most impactful change in this
commit. It does not enforce doc comments (again), but _if_ there is
a doc comment, it has to have the recommended form. (Most of the
changes in this commit are fixing this form.)
- Improve wording/spelling of some comments in .golangci.yml, and
remove an outdated comment.
- Leave `package-comments` inactive, but add a TODO asking if we
should change that.
- Add a new sub-linter `comment-spacings` (and fix corresponding
comments), which avoids missing spaces after the leading `//`.
Signed-off-by: beorn7 <beorn@grafana.com>