Some test queries need their interval adjusted to account for
https://github.com/prometheus/prometheus/pull/13904. Otherwise the
queries don't return enough samples.
promql/engine_test.go:TestHistogramCopyFromIteratorRegression needed the
same, but also the result needed a fix since `increase` interpolates
over the full range.
Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
Histogram quantile returns NaN in this case, which might be
surprising, so add a unit test that clarifies that this is
intentional.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
chunkenc.Iterator.AtFloatHistogram may do a shallow copy if
it receives nil as input pointer. This can in turn share the
span slice with multiple histograms in the matrixSelectorHPool,
leading to unexpected errors.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
The histogram stats iterator does not fully clear the histogram object
and is not resilient to new fields being added to the histogram type.
To resolve the issue, the commit uses the CopyTo methods which should
be future proof to new fields being added.
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
The histogram stats decoder keeps track of the last seen histogram sample
in order to properly detect counter resets. We are seeing an issue where
a histogram with UnknownResetHint gets treated as a counter reset when it follows
a stale histogram sample.
I believe that this is incorrect since stale samples should be completely ignored
in PromQL. As a result, they should not be stored in the histogram stats iterator
and the counter reset detection needs to be done against the last non-stale sample.
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
chunkenc.Iterator.AtFloatHistogram may do a shallow copy if
it receives nil as input pointer. This can in turn share the
span slice with multiple histograms in the matrixSelectorHPool,
leading to unexpected errors.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
The histogram stats iterator does not fully clear the histogram object
and is not resilient to new fields being added to the histogram type.
To resolve the issue, the commit uses the CopyTo methods which should
be future proof to new fields being added.
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
The histogram stats decoder keeps track of the last seen histogram sample
in order to properly detect counter resets. We are seeing an issue where
a histogram with UnknownResetHint gets treated as a counter reset when it follows
a stale histogram sample.
I believe that this is incorrect since stale samples should be completely ignored
in PromQL. As a result, they should not be stored in the histogram stats iterator
and the counter reset detection needs to be done against the last non-stale sample.
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
The RunBuiltinTests function accepts a concrete type which makes
it hard to exclude certain tests from the suite. It would be great
if we could skip tests which might not be critical in order to unblock
updates.
By accepting an interface instead, we can inject a custom implementation
which would skips select test cases.
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Same idea as for the avg aggregator before: Most of the time, there is
no overflow, so we don't have to revert to the more expensive and less
precise incremental calculation of the mean value.
Signed-off-by: beorn7 <beorn@grafana.com>
The calculation of the mean value in avg_over_time is performed in an
incremental fashion. This introduces additional numerical errors that
even Kahan summation cannot compensate, but at least we can use the
Kahan-corrected mean value when we use the intermediate mean value in
the calculation.
Signed-off-by: beorn7 <beorn@grafana.com>
The basic idea here is that the previous code was always doing
incremental calculation of the mean value, which is more costly and
can be less precise. It protects against overflows, but in most cases,
an overflow doesn't happen anyway.
The other idea applied here is to expand on #14074, where Kahan
summation was applied to sum().
With this commit, the average is calculated in a conventional way
(adding everything up and divide in the end) as long as the sum isn't
overflowing float64. This is combined with Kahan summation so that the
avg aggregation, in most cases, is really equivalent to the sum
aggregation with a following division (which is the user's expectation
as avg is supposed to be syntactic sugar for sum with a following
divison).
If the sum hits ±Inf, the calculation reverts to incremental
calculation of the mean value. Kahan summation is also applied here,
although it cannot fully compensate for the numerical errors
introduced by the incremental mean calculation. (The tests added in
this commit would fail if incremental mean calculation was always
used.)
Signed-off-by: beorn7 <beorn@grafana.com>
The optimizer which detects cases where histogram buckets can be skipped
does not take into account binary expressions. This can lead to buckets
not being decoded if a metric is used with both histogram_fraction/quantile and
histogram_sum/count in the same expression.
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>