Return annotations (warnings and infos) from PromQL queries
This generalizes the warnings we have already used before (but only for problems with remote read) as "annotations".
Annotations can be warnings or infos (the latter could be false positives). We do not treat them different in the API for now and return them all as "warnings". It would be easy to distinguish them and return infos separately, should that appear useful in the future.
The new annotations are then used to create a lot of warnings or infos during PromQL evaluations. Partially these are things we have wanted for a long time (e.g. inform the user that they have applied `rate` to a metric that doesn't look like a counter), but the new native histograms have created even more needs for those annotations (e.g. if a query tries to aggregate float numbers with histograms).
The annotations added here are not yet complete. A prominent example would be a warning about a range too short for a rate calculation. But such a warnings is more tricky to create with good fidelity and we will tackle it later.
Another TODO is to take annotations into account when evaluating recording rules.
---------
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
Make it more likely that contributors will run the benchmark suite.
count_values needs more than 2GB at 1,000 steps, so just run it for 100.
And remove 10-step variant because it doesn't add much to 100 and
1000-step benchmarks.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Otherwise we have a highly unusual situation of over 100 chunks
in the headChunks list of each series, which heavily skews
performance.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
promql: Extend testing framework to support native histograms
This includes both the internal testing framework as well as the rules unit test feature of promtool.
This also adds a bunch of basic tests. Many of the code level tests can now be converted to tests within the framework, and more tests can be added easily.
---------
Signed-off-by: Harold Dost <h.dost@criteo.com>
Signed-off-by: Gregor Zeitlinger <gregor.zeitlinger@grafana.com>
Signed-off-by: Stephen Lang <stephen.lang@grafana.com>
Co-authored-by: Harold Dost <h.dost@criteo.com>
Co-authored-by: Stephen Lang <stephen.lang@grafana.com>
Co-authored-by: Gregor Zeitlinger <gregor.zeitlinger@grafana.com>
So far, `ValidateHistogram` would not detect if the count did not
include the count in the zero bucket. This commit fixes the problem
and updates all the tests that have been undetected offenders so far.
Note that this problem would only ever create false negatives, so we
never falsely rejected to store a histogram because of it.
On the other hand, `ValidateFloatHistogram` has been to strict with
the count being at least as large as the sum of the counts in all the
buckets. Float precision issues could create false positives here, see
products of PromQL evaluations, it's actually quite hard to put an
upper limit no the floating point imprecision. Users could produce the
weirdest expressions, maxing out float precision problems. Therefore,
this commit simply removes that particular check from
`ValidateFloatHistogram`.
Signed-off-by: beorn7 <beorn@grafana.com>
`ContainsSameLabelset` constructs a map with the same hash key as
the one used to compile the output of `rangeEval`, so we can use that
one and save work.
Need to hold the timestamp so we can be sure we saw the same series
in the same evaluation.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
The operator changes the meaning of the metric, so the metric name should
be dropped. Technically this would be a breaking change, but it's also very
obviously a bug and not likely that anyone depends on it.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
Convert QueryOpts to an interface so that downstream projects like
https://github.com/thanos-community/promql-engine could extend the query
options with engine specific options that are not in the original
engine.
Will be used to enable query analysis per-query.
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Handle more arithmetic operators and aggregators for native histograms
This includes operators for multiplication (formerly known as scaling), division, and subtraction. Plus aggregations for average and the avg_over_time function.
Stdvar and stddev will (for now) ignore histograms properly (rather than counting them but adding a 0 for them).
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
So far, if a target exposes a histogram with both classic and native
buckets, a native-histogram enabled Prometheus would ignore the
classic buckets. With the new scrape config option
`scrape_classic_histograms` set, both buckets will be ingested,
creating all the series of a classic histogram in parallel to the
native histogram series. For example, a histogram `foo` would create a
native histogram series `foo` and classic series called `foo_sum`,
`foo_count`, and `foo_bucket`.
This feature can be used in a migration strategy from classic to
native histograms, where it is desired to have a transition period
during which both native and classic histograms are present.
Note that two bugs in classic histogram parsing were found and fixed
as a byproduct of testing the new feature:
1. Series created from classic _gauge_ histograms didn't get the
_sum/_count/_bucket prefix set.
2. Values of classic _float_ histograms weren't parsed properly.
Signed-off-by: beorn7 <beorn@grafana.com>
Wiser coders than myself have come to the conclusion that a `switch`
statement is almost always superior to a statement that includes any
`else if`.
The exceptions that I have found in our codebase are just these two:
* The `if else` is followed by an additional statement before the next
condition (separated by a `;`).
* The whole thing is within a `for` loop and `break` statements are
used. In this case, using `switch` would require tagging the `for`
loop, which probably tips the balance.
Why are `switch` statements more readable?
For one, fewer curly braces. But more importantly, the conditions all
have the same alignment, so the whole thing follows the natural flow
of going down a list of conditions. With `else if`, in contrast, all
conditions but the first are "hidden" behind `} else if `, harder to
spot and (for no good reason) presented differently from the first
condition.
I'm sure the aforemention wise coders can list even more reasons.
In any case, I like it so much that I have found myself recommending
it in code reviews. I would like to make it a habit in our code base,
without making it a hard requirement that we would test on the CI. But
for that, there has to be a role model, so this commit eliminates all
`if else` occurrences, unless it is autogenerated code or fits one of
the exceptions above.
Signed-off-by: beorn7 <beorn@grafana.com>
We haven't updated golint-ci in our CI yet, but this commit prepares
for that.
There are a lot of new warnings, and it is mostly because the "revive"
linter got updated. I agree with most of the new warnings, mostly
around not naming unused function parameters (although it is justified
in some cases for documentation purposes – while things like mocks are
a good example where not naming the parameter is clearer).
I'm pretty upset about the "empty block" warning to include `for`
loops. It's such a common pattern to do something in the head of the
`for` loop and then have an empty block. There is still an open issue
about this: https://github.com/mgechev/revive/issues/810 I have
disabled "revive" altogether in files where empty blocks are used
excessively, and I have made the effort to add individual
`// nolint:revive` where empty blocks are used just once or twice.
It's borderline noisy, though, but let's go with it for now.
I should mention that none of the "empty block" warnings for `for`
loop bodies were legitimate.
Signed-off-by: beorn7 <beorn@grafana.com>