Commit Graph

70 Commits (b2a8657d589331a598c75b3e51b860d0437f20b7)

Author SHA1 Message Date
gotjosh 37b408c6cd
Feature: Allow configuration of a rule evaluation delay (#14061)
* [PATCH] Allow having evaluation delay for rule groups

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* [PATCH] Fix lint

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* [PATCH] Move the option to ManagerOptions

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* [PATCH] Include evaluation_delay in the group config

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix comments

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Add a server configuration option.

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Appease the linter #1

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Add the new server flag documentation

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Improve documentation of the new flag and configuration

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Use named parameters for clarity on the `Rule` interface

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Add `initial` to the flag help

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Change the CHANGELOG area from `ruler` to `rules`

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Rename evaluation_delay to `rule_query_offset`/`query_offset` and make it a global configuration option.

Signed-off-by: gotjosh <josue.abreu@gmail.com>

E Your branch is up to date with 'origin/gotjosh/evaluation-delay'.

* more docs

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Improve wording on CHANGELOG

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Add `RuleQueryOffset` to the default config in tests in case it changes

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Update docs/configuration/recording_rules.md

Co-authored-by: Julius Volz <julius.volz@gmail.com>
Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Rename `RuleQueryOffset` to `QueryOffset` when in the group context.

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Improve docstring and documentation on the `rule_query_offset`

Signed-off-by: gotjosh <josue.abreu@gmail.com>

---------

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: gotjosh <josue.abreu@gmail.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Julius Volz <julius.volz@gmail.com>
2024-05-30 11:49:50 +01:00
Bryan Boreham 8fd96241ab test: add promqltest package references
To packages outside of promql.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-05-08 16:08:04 +01:00
gotjosh 63b09944b8
Use labels.Len() instead of manually counting the labels
Signed-off-by: gotjosh <josue.abreu@gmail.com>
2024-04-30 12:25:48 +01:00
gotjosh ccfafae36d
Rename QueryforStateSeries to QueryForStateSeries
Signed-off-by: gotjosh <josue.abreu@gmail.com>
2024-04-30 12:19:18 +01:00
gotjosh 151f6e0ed6
Add an assertion on the count of alerts before adding an active alert
Signed-off-by: gotjosh <josue.abreu@gmail.com>
2024-04-30 12:17:56 +01:00
gotjosh cc2207148e
fix typo
Signed-off-by: gotjosh <josue.abreu@gmail.com>
2024-04-24 19:20:57 +01:00
gotjosh 2de2fee035
Allow the result map for the series set before hand with a hint.
Signed-off-by: gotjosh <josue.abreu@gmail.com>
2024-04-24 19:10:34 +01:00
gotjosh 276201598c
Fix tests and a bug with the series lookup logic.
Signed-off-by: gotjosh <josue.abreu@gmail.com>
2024-04-24 18:46:05 +01:00
suntala 44f385fd51 Support expansion of native histogram values in alert templates
Co-authored-by: Aleks Fazlieva <britishrum@users.noreply.github.com>
Signed-off-by: suntala <arati.rana@grafana.com>
2024-03-26 22:30:01 +01:00
Bryan Boreham 17f48f2b3b Tests: use replacement DeepEquals in more places
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-02-08 19:32:33 +00:00
Bryan Boreham 39af788dbd Tests: use replacement DeepEquals using go-cmp
Use DeepEqual replacement using go-cmp, which is more flexible.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-02-08 19:30:20 +00:00
Marco Pracucci 5ee3fbe825
Decouple ruler dependency controller from concurrency controller
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2024-02-02 10:06:37 +01:00
Matthieu MOREL 9c4782f1cc
golangci-lint: enable testifylint linter (#13254)
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-12-07 11:35:01 +00:00
Arve Knudsen 6daee89e5f
Add context argument to Querier.Select (#12660)
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2023-09-12 12:37:38 +02:00
Michael Hoffmann 4d8e380269
promql: allow tests to be imported (#12050)
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2023-08-18 20:48:59 +02:00
beorn7 5b53aa1108 style: Replace `else if` cascades with `switch`
Wiser coders than myself have come to the conclusion that a `switch`
statement is almost always superior to a statement that includes any
`else if`.

The exceptions that I have found in our codebase are just these two:

* The `if else` is followed by an additional statement before the next
  condition (separated by a `;`).
* The whole thing is within a `for` loop and `break` statements are
  used. In this case, using `switch` would require tagging the `for`
  loop, which probably tips the balance.

Why are `switch` statements more readable?

For one, fewer curly braces. But more importantly, the conditions all
have the same alignment, so the whole thing follows the natural flow
of going down a list of conditions. With `else if`, in contrast, all
conditions but the first are "hidden" behind `} else if `, harder to
spot and (for no good reason) presented differently from the first
condition.

I'm sure the aforemention wise coders can list even more reasons.

In any case, I like it so much that I have found myself recommending
it in code reviews. I would like to make it a habit in our code base,
without making it a hard requirement that we would test on the CI. But
for that, there has to be a role model, so this commit eliminates all
`if else` occurrences, unless it is autogenerated code or fits one of
the exceptions above.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-04-19 17:22:31 +02:00
beorn7 c0879d64cf promql: Separate `Point` into `FPoint` and `HPoint`
In other words: Instead of having a “polymorphous” `Point` that can
either contain a float value or a histogram value, use an `FPoint` for
floats and an `HPoint` for histograms.

This seemingly small change has a _lot_ of repercussions throughout
the codebase.

The idea here is to avoid the increase in size of `Point` arrays that
happened after native histograms had been added.

The higher-level data structures (`Sample`, `Series`, etc.) are still
“polymorphous”. The same idea could be applied to them, but at each
step the trade-offs needed to be evaluated.

The idea with this change is to do the minimum necessary to get back
to pre-histogram performance for functions that do not touch
histograms. Here are comparisons for the `changes` function. The test
data doesn't include histograms yet. Ideally, there would be no change
in the benchmark result at all.

First runtime v2.39 compared to directly prior to this commit:

```
name                                                  old time/op    new time/op    delta
RangeQuery/expr=changes(a_one[1d]),steps=1-16            391µs ± 2%     542µs ± 1%  +38.58%  (p=0.000 n=9+8)
RangeQuery/expr=changes(a_one[1d]),steps=10-16           452µs ± 2%     617µs ± 2%  +36.48%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_one[1d]),steps=100-16         1.12ms ± 1%    1.36ms ± 2%  +21.58%  (p=0.000 n=8+10)
RangeQuery/expr=changes(a_one[1d]),steps=1000-16        7.83ms ± 1%    8.94ms ± 1%  +14.21%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_ten[1d]),steps=1-16           2.98ms ± 0%    3.30ms ± 1%  +10.67%  (p=0.000 n=9+10)
RangeQuery/expr=changes(a_ten[1d]),steps=10-16          3.66ms ± 1%    4.10ms ± 1%  +11.82%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_ten[1d]),steps=100-16         10.5ms ± 0%    11.8ms ± 1%  +12.50%  (p=0.000 n=8+10)
RangeQuery/expr=changes(a_ten[1d]),steps=1000-16        77.6ms ± 1%    87.4ms ± 1%  +12.63%  (p=0.000 n=9+9)
RangeQuery/expr=changes(a_hundred[1d]),steps=1-16       30.4ms ± 2%    32.8ms ± 1%   +8.01%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=10-16      37.1ms ± 2%    40.6ms ± 2%   +9.64%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=100-16      105ms ± 1%     117ms ± 1%  +11.69%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=1000-16     783ms ± 3%     876ms ± 1%  +11.83%  (p=0.000 n=9+10)
```

And then runtime v2.39 compared to after this commit:

```
name                                                  old time/op    new time/op    delta
RangeQuery/expr=changes(a_one[1d]),steps=1-16            391µs ± 2%     547µs ± 1%  +39.84%  (p=0.000 n=9+8)
RangeQuery/expr=changes(a_one[1d]),steps=10-16           452µs ± 2%     616µs ± 2%  +36.15%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_one[1d]),steps=100-16         1.12ms ± 1%    1.26ms ± 1%  +12.20%  (p=0.000 n=8+10)
RangeQuery/expr=changes(a_one[1d]),steps=1000-16        7.83ms ± 1%    7.95ms ± 1%   +1.59%  (p=0.000 n=10+8)
RangeQuery/expr=changes(a_ten[1d]),steps=1-16           2.98ms ± 0%    3.38ms ± 2%  +13.49%  (p=0.000 n=9+10)
RangeQuery/expr=changes(a_ten[1d]),steps=10-16          3.66ms ± 1%    4.02ms ± 1%   +9.80%  (p=0.000 n=10+9)
RangeQuery/expr=changes(a_ten[1d]),steps=100-16         10.5ms ± 0%    10.8ms ± 1%   +3.08%  (p=0.000 n=8+10)
RangeQuery/expr=changes(a_ten[1d]),steps=1000-16        77.6ms ± 1%    78.1ms ± 1%   +0.58%  (p=0.035 n=9+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=1-16       30.4ms ± 2%    33.5ms ± 4%  +10.18%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=10-16      37.1ms ± 2%    40.0ms ± 1%   +7.98%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=100-16      105ms ± 1%     107ms ± 1%   +1.92%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=1000-16     783ms ± 3%     775ms ± 1%   -1.02%  (p=0.019 n=9+9)
```

In summary, the runtime doesn't really improve with this change for
queries with just a few steps. For queries with many steps, this
commit essentially reinstates the old performance. This is good
because the many-step queries are the one that matter most (longest
absolute runtime).

In terms of allocations, though, this commit doesn't make a dent at
all (numbers not shown). The reason is that most of the allocations
happen in the sampleRingIterator (in the storage package), which has
to be addressed in a separate commit.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-04-13 19:25:16 +02:00
Bryan Boreham 3f7ba22bde rules: two places need to call EmptyLabels
Can't assume nil is a valid value.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-02-22 15:14:07 +00:00
Danny Kopping 98c70e1817
Correcting NewAlertingRule args
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2023-01-26 13:21:50 +02:00
Danny Kopping df078e0a84
Merge branch 'main' into dannykopping/rule-eval
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2023-01-26 13:10:18 +02:00
Julien Pivotto c0724f4e62 New test
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2023-01-19 11:56:04 +01:00
Julien Pivotto 5ad74e6e71 Add tests
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2023-01-19 10:36:01 +01:00
Julien Pivotto ce55e5074d Add 'keep_firing_for' field to alerting rules
This commit adds a new 'keep_firing_for' field to Prometheus alerting
rules. The 'resolve_delay' field specifies the minimum amount of time
that an alert should remain firing, even if the expression does not
return any results.

This feature was discussed at a previous dev summit, and it was
determined that a feature like this would be useful in order to allow
the expression time to stabilize and prevent confusing resolved messages
from being propagated through Alertmanager.

This approach is simpler than having two PromQL queries, as was
sometimes discussed, and it should be easy to implement.

This commit does not include tests for the 'resolve_delay' field.  This
is intentional, as the purpose of this commit is to gather comments on
the proposed design of the 'resolve_delay' field before implementing
tests. Once the design of the 'resolve_delay' field has been finalized,
a follow-up commit will be submitted with tests."

See https://github.com/prometheus/prometheus/issues/11570

Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2023-01-13 12:11:39 +01:00
Danny Kopping 72527b5f12
Refactoring for simplicity
Include labels

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2023-01-09 11:01:46 +02:00
Danny Kopping 79300340af
Adding recording/alerting rule origin context
This will allow correlation of executed rule queries with their associated rule names and type

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2023-01-09 11:01:24 +02:00
Ganesh Vernekar 648be89822
Merge remote-tracking branch 'upstream/main' into fix-conflict
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2022-10-12 14:20:02 +05:30
Ganesh Vernekar 46b26c4f09
Fix notifier relabel changing the labels of active alerts (#11427)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2022-10-07 20:28:17 +05:30
Jesus Vazquez e934d0f011 Merge 'main' into sparsehistogram
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
2022-10-05 22:14:49 +02:00
Bryan Boreham 8297f5cb6b rules: in tests use labels.FromStrings
And a number of `EmptyLabels()` instead of `nil`.
Replacing code which assumes the internal structure of `Labels`.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2022-09-09 13:34:49 +02:00
beorn7 c9fd3c235d Merge branch 'main' into sparsehistogram 2022-08-10 17:54:37 +02:00
Jimmie Han a5fea2cdd0
Use atomic field avoid (*AlertingRule).mtx wait when template expanding (#10858)
* Use atomic field avoid (*AlertingRule).mtx wait when template expanding (#10703)

Signed-off-by: hanjm <hanjinming@outlook.com>
2022-07-19 12:58:37 +02:00
beorn7 28f028e938 Merge branch 'main' into sparsehistogram 2022-07-12 19:07:13 +02:00
Matthieu MOREL ddfa9a7cc5
refactor (rules): move from github.com/pkg/errors to 'errors' and 'fmt' (#10855)
* refactor (rules): move from github.com/pkg/errors to 'errors' and 'fmt'

Signed-off-by: Matthieu MOREL <mmorel-35@users.noreply.github.com>
2022-06-17 09:54:25 +02:00
beorn7 40ad5e284a Merge branch 'main' into beorn7/sparsehistogram 2022-06-09 20:50:30 +02:00
Julien Pivotto 0d94cdf107
rules: remove classic UI code (#10730)
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2022-05-23 16:21:50 +02:00
beorn7 7ee1836ef5 Merge branch 'main' into sparsehistogram 2022-04-05 18:31:19 +02:00
Wilbert Guo 83a2e52bc2
Add SyncForState Implementation for Ruler HA (#10070)
* continuously syncing activeAt for alerts

Signed-off-by: Yijie Qin <qinyijie@amazon.com>
Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

* add import

Signed-off-by: Yijie Qin <qinyijie@amazon.com>
Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

* Refactor SyncForState and add unit tests

Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

* Format code

Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

* Add hook for syncForState

Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

Fix go lint

Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

Refactor syncForState override implementation

Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

Add syncForState override func as argument to Update()

Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

Fix go formatting

Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

Fix circleci test errors

Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

Remove overrideFunc as argument to run()

Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

* remove the syncForState

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* use the override function to decide if need to replace the activeAt or not

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fix test case

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fix format

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* Trigger build

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fixing comments

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* return the result of map of alerts instead of single one

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* upper case the QueryforStateSeries

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* use a more generic rule group post process function type

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fix indentation

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fix gofmt

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fix lint

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fixing naming

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fix comments

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* add the lastEvalTimestamp as parameter

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fmt

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* change funcType to func

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

Co-authored-by: Yijie Qin <qinyijie@amazon.com>
Co-authored-by: Yijie Qin <63399121+qinxx108@users.noreply.github.com>
2022-03-29 02:16:46 +02:00
beorn7 5d4db805ac Merge branch 'main' into sparsehistogram 2021-11-17 19:57:31 +01:00
beorn7 c954cd9d1d Move packages out of deprecated pkg directory
This creates a new `model` directory and moves all data-model related
packages over there:
  exemplar labels relabel rulefmt textparse timestamp value

All the others are more or less utilities and have been moved to `util`:
  gate logging modetimevfs pool runtime

Signed-off-by: beorn7 <beorn@grafana.com>
2021-11-09 08:03:10 +01:00
Levi Harrison d81bbe154d
Rule alerts/series limit updates (#9541)
* Add docs and do not limit inactive alerts.

Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-10-21 23:14:17 +02:00
Levi Harrison dc2f1993d8
Limit number of alerts or series produced by a rule (#9260)
* Add limit to rules

Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-09-15 09:48:26 +02:00
Levi Harrison b5f6f8fb36 Switched to go-kit/log
Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-06-11 12:28:36 -04:00
Levi Harrison 26274527df
Updated/added tests
Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-05-30 23:35:41 -04:00
Julien Pivotto 6c56a1faaa
Testify: move to require (#8122)
* Testify: move to require

Moving testify to require to fail tests early in case of errors.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* More moves

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-29 09:43:23 +00:00
Julien Pivotto 1282d1b39c
Refactor test assertions (#8110)
* Refactor test assertions

This pull request gets rid of assert.True where possible to use
fine-grained assertions.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-27 11:06:53 +01:00
Julien Pivotto 4e5b1722b3
Move away from testutil, refactor imports (#8087)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-22 11:00:08 +02:00
Guangwen Feng 487f1e07ff
Add unit test case for func State in alerting.go (#7476)
Signed-off-by: Guangwen Feng <fenggw-fnst@cn.fujitsu.com>
2020-06-29 13:16:52 +01:00
Tobias Guggenmos 6c00f2ffcb Comment fixes
Signed-off-by: Tobias Guggenmos <tguggenm@redhat.com>
2020-02-17 16:09:23 +01:00
Julien Pivotto 9adad8ad30 Remove MaxConcurrent from the PromQL engine opts (#6712)
Since we use ActiveQueryTracker to check for concurrency in
d992c36b3a it does not make sense to keep
the MaxConcurrent value as an option of the PromQL engine.

This pull request removes it from the PromQL engine options, sets the
max concurrent metric to -1 if there is no active query tracker, and use
the value of the active query tracker otherwise.

It removes dead code and also will inform people who import the promql
package that we made that change, as it breaks the EngineOpts struct.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-28 20:38:49 +00:00
Julien Pivotto cf42888e4d Fix order of testutil.Equals (#6695)
Equals takes the expected value as first parameter, and the actual value
as second parameter.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-27 12:21:59 +00:00