promql: correctly handle unary negation of native histograms and add tests for multiplication and division of native histograms by negative scalars
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
PromQL: add test for mul and div operator
Also, remove the converted test from the engine_test.go file.
This also includes an extension of the test framework to allow NaN/Inf in histogram buckets.
---------
Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>
PromQL engine: Delay deletion of __name__ label to the end of the query evaluation
- This change allows optionally preserving the `__name__` label via the `label_replace` and `label_join` functions, and helps prevent the dreaded "vector cannot contain metrics with the same labelset" error.
- The implementation extends the `Series` and `Sample` structs with a boolean flag indicating whether the `__name__` label should be deleted at the end of the query evaluation.
- The `label_replace` and `label_join` functions can still access the value of the `__name__` label, even if it has been previously marked for deletion. If `__name__` is used as target label, it won't be dropped at the end of the query evaluation.
- Fixes https://github.com/prometheus/prometheus/issues/11397
- See https://github.com/jcreixell/prometheus/pull/2 for previous discussion, including the decision to create this PR and benchmark it before considering other alternatives (like refactoring `labels.Labels`).
- See https://github.com/jcreixell/prometheus/pull/1 for an alternative implementation using a special label instead of boolean flags.
- Note: a feature flag `promql-delayed-name-removal` has been added as it changes the behavior of some "weird" queries (see https://github.com/prometheus/prometheus/issues/11397#issuecomment-1451998792)
Example (this always fails, as `__name__` is being dropped by `count_over_time`):
```
count_over_time({__name__!=""}[1m])
=> Error executing query: vector cannot contain metrics with the same labelset
```
Before:
```
label_replace(count_over_time({__name__!=""}[1m]), "__name__", "count_$1", "__name__", "(.+)")
=> Error executing query: vector cannot contain metrics with the same labelset
```
After:
```
label_replace(count_over_time({__name__!=""}[1m]), "__name__", "count_$1", "__name__", "(.+)")
=>
count_go_gc_cycles_automatic_gc_cycles_total{instance="localhost:9090", job="prometheus"} 1
count_go_gc_cycles_forced_gc_cycles_total{instance="localhost:9090", job="prometheus"} 1
...
```
Signed-off-by: Jorge Creixell <jcreixell@gmail.com>
---------
Signed-off-by: Jorge Creixell <jcreixell@gmail.com>
Signed-off-by: Björn Rabenstein <github@rabenste.in>
Histogram quantile returns NaN in this case, which might be
surprising, so add a unit test that clarifies that this is
intentional.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Same idea as for the avg aggregator before: Most of the time, there is
no overflow, so we don't have to revert to the more expensive and less
precise incremental calculation of the mean value.
Signed-off-by: beorn7 <beorn@grafana.com>
The basic idea here is that the previous code was always doing
incremental calculation of the mean value, which is more costly and
can be less precise. It protects against overflows, but in most cases,
an overflow doesn't happen anyway.
The other idea applied here is to expand on #14074, where Kahan
summation was applied to sum().
With this commit, the average is calculated in a conventional way
(adding everything up and divide in the end) as long as the sum isn't
overflowing float64. This is combined with Kahan summation so that the
avg aggregation, in most cases, is really equivalent to the sum
aggregation with a following division (which is the user's expectation
as avg is supposed to be syntactic sugar for sum with a following
divison).
If the sum hits ±Inf, the calculation reverts to incremental
calculation of the mean value. Kahan summation is also applied here,
although it cannot fully compensate for the numerical errors
introduced by the incremental mean calculation. (The tests added in
this commit would fail if incremental mean calculation was always
used.)
Signed-off-by: beorn7 <beorn@grafana.com>
The optimizer which detects cases where histogram buckets can be skipped
does not take into account binary expressions. This can lead to buckets
not being decoded if a metric is used with both histogram_fraction/quantile and
histogram_sum/count in the same expression.
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
This also exercises the "fast path" (only decoding count and sum),
i.e. where the counter reset isn't visible at all in the decoded data.
Signed-off-by: beorn7 <beorn@grafana.com>
This can give a more precise result, by keeping a separate running
compensation value to accumulate small errors.
See https://en.wikipedia.org/wiki/Kahan_summation_algorithm
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>