prometheus

Commit Graph

Author	SHA1	Message	Date
beorn7	c0879d64cf	promql: Separate `Point` into `FPoint` and `HPoint` In other words: Instead of having a “polymorphous” `Point` that can either contain a float value or a histogram value, use an `FPoint` for floats and an `HPoint` for histograms. This seemingly small change has a _lot_ of repercussions throughout the codebase. The idea here is to avoid the increase in size of `Point` arrays that happened after native histograms had been added. The higher-level data structures (`Sample`, `Series`, etc.) are still “polymorphous”. The same idea could be applied to them, but at each step the trade-offs needed to be evaluated. The idea with this change is to do the minimum necessary to get back to pre-histogram performance for functions that do not touch histograms. Here are comparisons for the `changes` function. The test data doesn't include histograms yet. Ideally, there would be no change in the benchmark result at all. First runtime v2.39 compared to directly prior to this commit: ``` name old time/op new time/op delta RangeQuery/expr=changes(a_one[1d]),steps=1-16 391µs ± 2% 542µs ± 1% +38.58% (p=0.000 n=9+8) RangeQuery/expr=changes(a_one[1d]),steps=10-16 452µs ± 2% 617µs ± 2% +36.48% (p=0.000 n=10+10) RangeQuery/expr=changes(a_one[1d]),steps=100-16 1.12ms ± 1% 1.36ms ± 2% +21.58% (p=0.000 n=8+10) RangeQuery/expr=changes(a_one[1d]),steps=1000-16 7.83ms ± 1% 8.94ms ± 1% +14.21% (p=0.000 n=10+10) RangeQuery/expr=changes(a_ten[1d]),steps=1-16 2.98ms ± 0% 3.30ms ± 1% +10.67% (p=0.000 n=9+10) RangeQuery/expr=changes(a_ten[1d]),steps=10-16 3.66ms ± 1% 4.10ms ± 1% +11.82% (p=0.000 n=10+10) RangeQuery/expr=changes(a_ten[1d]),steps=100-16 10.5ms ± 0% 11.8ms ± 1% +12.50% (p=0.000 n=8+10) RangeQuery/expr=changes(a_ten[1d]),steps=1000-16 77.6ms ± 1% 87.4ms ± 1% +12.63% (p=0.000 n=9+9) RangeQuery/expr=changes(a_hundred[1d]),steps=1-16 30.4ms ± 2% 32.8ms ± 1% +8.01% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=10-16 37.1ms ± 2% 40.6ms ± 2% +9.64% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=100-16 105ms ± 1% 117ms ± 1% +11.69% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=1000-16 783ms ± 3% 876ms ± 1% +11.83% (p=0.000 n=9+10) ``` And then runtime v2.39 compared to after this commit: ``` name old time/op new time/op delta RangeQuery/expr=changes(a_one[1d]),steps=1-16 391µs ± 2% 547µs ± 1% +39.84% (p=0.000 n=9+8) RangeQuery/expr=changes(a_one[1d]),steps=10-16 452µs ± 2% 616µs ± 2% +36.15% (p=0.000 n=10+10) RangeQuery/expr=changes(a_one[1d]),steps=100-16 1.12ms ± 1% 1.26ms ± 1% +12.20% (p=0.000 n=8+10) RangeQuery/expr=changes(a_one[1d]),steps=1000-16 7.83ms ± 1% 7.95ms ± 1% +1.59% (p=0.000 n=10+8) RangeQuery/expr=changes(a_ten[1d]),steps=1-16 2.98ms ± 0% 3.38ms ± 2% +13.49% (p=0.000 n=9+10) RangeQuery/expr=changes(a_ten[1d]),steps=10-16 3.66ms ± 1% 4.02ms ± 1% +9.80% (p=0.000 n=10+9) RangeQuery/expr=changes(a_ten[1d]),steps=100-16 10.5ms ± 0% 10.8ms ± 1% +3.08% (p=0.000 n=8+10) RangeQuery/expr=changes(a_ten[1d]),steps=1000-16 77.6ms ± 1% 78.1ms ± 1% +0.58% (p=0.035 n=9+10) RangeQuery/expr=changes(a_hundred[1d]),steps=1-16 30.4ms ± 2% 33.5ms ± 4% +10.18% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=10-16 37.1ms ± 2% 40.0ms ± 1% +7.98% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=100-16 105ms ± 1% 107ms ± 1% +1.92% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=1000-16 783ms ± 3% 775ms ± 1% -1.02% (p=0.019 n=9+9) ``` In summary, the runtime doesn't really improve with this change for queries with just a few steps. For queries with many steps, this commit essentially reinstates the old performance. This is good because the many-step queries are the one that matter most (longest absolute runtime). In terms of allocations, though, this commit doesn't make a dent at all (numbers not shown). The reason is that most of the allocations happen in the sampleRingIterator (in the storage package), which has to be addressed in a separate commit. Signed-off-by: beorn7 <beorn@grafana.com>	2023-04-13 19:25:16 +02:00
Soon-Ping	6cecb87941	Generalized rule group iteration evaluation hook (#11885 ) Signed-off-by: Soon-Ping Phang <soonping@amazon.com>	2023-04-04 20:21:13 +02:00
Trevor Whitney	c3e0a83725	rules: no longer force CounterResetHint to Gauge Signed-off-by: Trevor Whitney <trevorjwhitney@gmail.com>	2023-03-14 14:22:07 -06:00
Justin Lei	af1d9e01c7	Refactor tsdbutil for tests/native histograms (#11948 ) * Add float histograms to ChunkFromSamplesGeneric Signed-off-by: Justin Lei <justin.lei@grafana.com> * Add GenerateSamples functions to tsdbutil Signed-off-by: Justin Lei <justin.lei@grafana.com> PR responses Signed-off-by: Justin Lei <justin.lei@grafana.com> --------- Signed-off-by: Justin Lei <justin.lei@grafana.com>	2023-02-10 17:09:33 +05:30
Julien Pivotto	ce55e5074d	Add 'keep_firing_for' field to alerting rules This commit adds a new 'keep_firing_for' field to Prometheus alerting rules. The 'resolve_delay' field specifies the minimum amount of time that an alert should remain firing, even if the expression does not return any results. This feature was discussed at a previous dev summit, and it was determined that a feature like this would be useful in order to allow the expression time to stabilize and prevent confusing resolved messages from being propagated through Alertmanager. This approach is simpler than having two PromQL queries, as was sometimes discussed, and it should be easy to implement. This commit does not include tests for the 'resolve_delay' field. This is intentional, as the purpose of this commit is to gather comments on the proposed design of the 'resolve_delay' field before implementing tests. Once the design of the 'resolve_delay' field has been finalized, a follow-up commit will be submitted with tests." See https://github.com/prometheus/prometheus/issues/11570 Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>	2023-01-13 12:11:39 +01:00
Ganesh Vernekar	98a0523e4a	rules: Test native histograms in recording rules Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2023-01-11 18:27:57 +05:30
Bryan Boreham	3c7de69059	storage: allow re-use of iterators Patterned after `Chunk.Iterator()`: pass the old iterator in so it can be re-used to avoid allocating a new object. (This commit does not do any re-use; it is just changing all the method signatures so re-use is possible in later commits.) Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-12-15 18:32:45 +00:00
Jesus Vazquez	e934d0f011	Merge 'main' into sparsehistogram Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>	2022-10-05 22:14:49 +02:00
Bryan Boreham	8297f5cb6b	rules: in tests use labels.FromStrings And a number of `EmptyLabels()` instead of `nil`. Replacing code which assumes the internal structure of `Labels`. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-09-09 13:34:49 +02:00
Cosrider	bef6556ca5	delete redundant alias (#11180 ) Signed-off-by: Cosrider <cosrider7@gmail.com> Signed-off-by: Cosrider <cosrider7@gmail.com>	2022-08-31 15:50:38 +02:00
beorn7	c9fd3c235d	Merge branch 'main' into sparsehistogram	2022-08-10 17:54:37 +02:00
Jimmie Han	a5fea2cdd0	Use atomic field avoid (AlertingRule).mtx wait when template expanding (#10858 ) Use atomic field avoid (*AlertingRule).mtx wait when template expanding (#10703) Signed-off-by: hanjm <hanjinming@outlook.com>	2022-07-19 12:58:37 +02:00
beorn7	3bc711e333	Merge branch 'main' into sparsehistogram	2022-05-04 13:37:13 +02:00
Matthieu MOREL	e2ede285a2	refactor: move from io/ioutil to io and os packages (#10528 ) * refactor: move from io/ioutil to io and os packages * use fs.DirEntry instead of os.FileInfo after os.ReadDir Signed-off-by: MOREL Matthieu <matthieu.morel@cnp.fr>	2022-04-27 11:24:36 +02:00
beorn7	7ee1836ef5	Merge branch 'main' into sparsehistogram	2022-04-05 18:31:19 +02:00
Wilbert Guo	83a2e52bc2	Add SyncForState Implementation for Ruler HA (#10070 ) * continuously syncing activeAt for alerts Signed-off-by: Yijie Qin <qinyijie@amazon.com> Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> * add import Signed-off-by: Yijie Qin <qinyijie@amazon.com> Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> * Refactor SyncForState and add unit tests Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> * Format code Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> * Add hook for syncForState Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> Fix go lint Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> Refactor syncForState override implementation Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> Add syncForState override func as argument to Update() Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> Fix go formatting Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> Fix circleci test errors Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> Remove overrideFunc as argument to run() Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> * remove the syncForState Signed-off-by: Yijie Qin <qinyijie@amazon.com> * use the override function to decide if need to replace the activeAt or not Signed-off-by: Yijie Qin <qinyijie@amazon.com> * fix test case Signed-off-by: Yijie Qin <qinyijie@amazon.com> * fix format Signed-off-by: Yijie Qin <qinyijie@amazon.com> * Trigger build Signed-off-by: Yijie Qin <qinyijie@amazon.com> * fixing comments Signed-off-by: Yijie Qin <qinyijie@amazon.com> * return the result of map of alerts instead of single one Signed-off-by: Yijie Qin <qinyijie@amazon.com> * upper case the QueryforStateSeries Signed-off-by: Yijie Qin <qinyijie@amazon.com> * use a more generic rule group post process function type Signed-off-by: Yijie Qin <qinyijie@amazon.com> * fix indentation Signed-off-by: Yijie Qin <qinyijie@amazon.com> * fix gofmt Signed-off-by: Yijie Qin <qinyijie@amazon.com> * fix lint Signed-off-by: Yijie Qin <qinyijie@amazon.com> * fixing naming Signed-off-by: Yijie Qin <qinyijie@amazon.com> * fix comments Signed-off-by: Yijie Qin <qinyijie@amazon.com> * add the lastEvalTimestamp as parameter Signed-off-by: Yijie Qin <qinyijie@amazon.com> * fmt Signed-off-by: Yijie Qin <qinyijie@amazon.com> * change funcType to func Signed-off-by: Yijie Qin <qinyijie@amazon.com> Co-authored-by: Yijie Qin <qinyijie@amazon.com> Co-authored-by: Yijie Qin <63399121+qinxx108@users.noreply.github.com>	2022-03-29 02:16:46 +02:00
Björn Rabenstein	7e42acd3b1	tsdb: Rework iterators (#9877 ) - Pick At... method via return value of Next/Seek. - Do not clobber returned buckets. - Add partial FloatHistogram suppert. Note that the promql package is now _only_ dealing with FloatHistograms, following the idea that PromQL only knows float values. As a byproduct, I have removed the histogramSeries metric. In my understanding, series can have both float and histogram samples, so that metric doesn't make sense anymore. As another byproduct, I have converged the sampleBuf and the histogramSampleBuf in memSeries into one. The sample type stored in the sampleBuf has been extended to also contain histograms even before this commit. Signed-off-by: beorn7 <beorn@grafana.com>	2021-11-29 13:24:23 +05:30
beorn7	5d4db805ac	Merge branch 'main' into sparsehistogram	2021-11-17 19:57:31 +01:00
beorn7	c954cd9d1d	Move packages out of deprecated pkg directory This creates a new `model` directory and moves all data-model related packages over there: exemplar labels relabel rulefmt textparse timestamp value All the others are more or less utilities and have been moved to `util`: gate logging modetimevfs pool runtime Signed-off-by: beorn7 <beorn@grafana.com>	2021-11-09 08:03:10 +01:00
Mateusz Gozdek	1a6c2283a3	Format Go source files using 'gofumpt -w -s -extra' Part of #9557 Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>	2021-11-02 19:52:34 +01:00
Levi Harrison	dc2f1993d8	Limit number of alerts or series produced by a rule (#9260 ) * Add limit to rules Signed-off-by: Levi Harrison <git@leviharrison.dev>	2021-09-15 09:48:26 +02:00
Levi Harrison	b5f6f8fb36	Switched to go-kit/log Signed-off-by: Levi Harrison <git@leviharrison.dev>	2021-06-11 12:28:36 -04:00
Levi Harrison	26274527df	Updated/added tests Signed-off-by: Levi Harrison <git@leviharrison.dev>	2021-05-30 23:35:41 -04:00
Goutham Veeramachaneni	4b5ab80ca6	[rule] Update rule health for append/commit fails (#8619 ) * [rule] Update rule health for append/commit fails Similar to https://github.com/prometheus/prometheus/pull/8410 will provide more context. Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> * Add test for updating health on append fails Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>	2021-03-18 15:44:33 +01:00
Tom Wilkie	7369561305	Combine Appender.Add and AddFast into a single Append method. (#8489 ) This moves the label lookup into TSDB, whilst still keeping the cached-ref optimisation for repeated Appends. This makes the API easier to consume and implement. In particular this change is motivated by the scrape-time-aggregation work, which I don't think is possible to implement without it as it needs access to label values. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2021-02-18 17:37:00 +05:30
Julien Pivotto	6c56a1faaa	Testify: move to require (#8122 ) * Testify: move to require Moving testify to require to fail tests early in case of errors. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu> * More moves Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-29 09:43:23 +00:00
Julien Pivotto	1282d1b39c	Refactor test assertions (#8110 ) * Refactor test assertions This pull request gets rid of assert.True where possible to use fine-grained assertions. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-27 11:06:53 +01:00
Julien Pivotto	4e5b1722b3	Move away from testutil, refactor imports (#8087 ) Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-22 11:00:08 +02:00
johncming	2f2a51a43a	web/api/v1: make names consistent. (#7841 ) Signed-off-by: johncming <johncming@yahoo.com>	2020-08-25 11:38:06 +01:00
johncming	362080ba28	rules: add evaluationTimestamp when copy state. (#7775 ) Signed-off-by: johncming <johncming@yahoo.com>	2020-08-14 09:42:13 +01:00
Annanay	9bba8a6eae	Merge branch 'master' into appender-context Signed-off-by: Annanay <annanayagarwal@gmail.com>	2020-07-30 16:43:18 +05:30
Julien Pivotto	e76c436e9c	Goleak in discoveries, scrape, rules (#7662 ) * Add go leak tests for discoveries with goroutines Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu> * Add go leak tests in rules Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu> * Add go leak tests in scrape tests Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-07-27 09:38:08 +01:00
Annanay	7f98a744e5	Add context to Appender interface Signed-off-by: Annanay <annanayagarwal@gmail.com>	2020-07-24 19:40:51 +05:30
Julien Pivotto	b83cbacbdd	Rule manager: remove blocking channel in mail (#7631 ) Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-07-22 00:13:24 +02:00
Guangwen Feng	ad9449d1f1	Add unit test case for func HasAlertingRules in manager.go (#7502 ) Signed-off-by: Guangwen Feng <fenggw-fnst@cn.fujitsu.com>	2020-07-06 10:35:16 +01:00
Frederic Branczyk	d17d88935c	rules: Use narrower interface for rule manager loading of for state (#7472 ) To load ALERT_FOR_STATE only `storage.Queryable` interface is required, so this patch uses this narrower interface for to perform this. Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>	2020-06-26 19:06:36 +01:00
Kemal Akkoyun	66dfb951c4	: Consistent Error/Warning handling for SeriesSet iterator: Allowing Async Select (#7251 ) Add errors and Warnings to SeriesSet Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Change Querier interface and refactor accordingly Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Refactor promql/engine to propagate warnings at eval stage Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Address review issues Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Make sure all the series from all Selects are pre-advanced Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Address review issues Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Separate merge series sets Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Clean Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Refactor merge querier failure handling Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Refactored and simplified fanout with improvements from incoming chunk iterator PRs. * Secondary logic is hidden, instead of weird failed series set logic we had. * Fanout is well commented * Fanout closing record all errors * MergeQuerier improved API (clearer) * deferredGenericMergeSeriesSet is not needed as we return no samples anyway for failed series sets (next = false). Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fix formatting Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Fix CI issues Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Added final tests for error handling. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed Brian's comments. * Moved hints in populate to be allocated only when needed. * Used sync.Once in secondary Querier to achieve all-or-nothing partial response logic. * Select after first Next is done will panic. NOTE: in lazySeriesSet in theory we could just panic, I think however we can totally just return error, it will panic in expand anyway. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Utilize errWithWarnings Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Fix recently introduced expansion issue Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Add tests for secondary querier error handling Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Implement lazy merge Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Add name to test cases Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Reorganize Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Address review comments Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Address review comments Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Remove redundant warnings Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Fix rebase mistake Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-06-09 17:57:31 +01:00
ZouYu	2b7437d60e	Fix some warnings: 'redundant type from array, slice, or map composite literal' (#7109 ) Signed-off-by: ZouYu <zouy.fnst@cn.fujitsu.com>	2020-04-15 11:17:41 +01:00
Ben Ye	00730bfee7	add rule_group label to rule evaluation metrics (#7094 ) Signed-off-by: yeya24 <yb532204897@gmail.com>	2020-04-08 23:21:37 +02:00
Bartlomiej Plotka	fe802f29c9	storage: Removed SelectSorted method; Simplified interface; Added requirement for remote read to sort response. This is technically BREAKING CHANGE, but it was like this from the beginning: I just notice that we rely in Prometheus on remote read being sorted. This is because we use selected data from remote reads in MergeSeriesSet which rely on sorting. I found during work on https://github.com/prometheus/prometheus/pull/5882 that we do so many repetitions because of this, for not good reason. I think I found a good balance between convenience and readability with just one method. Smaller the interface = better. Also I don't know what TestSelectSorted was testing, but now it's testing sorting. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-03-13 13:06:25 +00:00
Tobias Guggenmos	4835bbf376	Merge branch 'master' into split_parser	2020-02-19 15:18:13 +01:00
Bartlomiej Plotka	34426766d8	Unify Iterator interfaces. All point to storage now. This is part of https://github.com/prometheus/prometheus/pull/5882 that can be done to simplify things. All todos I added will be fixed in follow up PRs. * querier.Querier, querier.Appender, querier.SeriesSet, and querier.Series interfaces merged with storage interface.go. All imports that. * querier.SeriesIterator replaced by chunkenc.Iterator * Added chunkenc.Iterator.Seek method and tests for xor implementation (?) * Since we properly handle SelectParams for Select methods I adjusted min max based on that. This should help in terms of performance for queries with functions like offset. * added Seek to deletedIterator and test. * storage/tsdb was removed as it was only a unnecessary glue with incompatible structs. No logic was changed, only different source of abstractions, so no need for benchmarks. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-02-17 18:03:54 +00:00
Tobias Guggenmos	6c00f2ffcb	Comment fixes Signed-off-by: Tobias Guggenmos <tguggenm@redhat.com>	2020-02-17 16:09:23 +01:00
Julien Pivotto	135cc30063	rules: Make deleted rule series as stale after a reload (#6745 ) * rules: Make deleted rule series as stale after a reload Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-02-12 16:22:18 +01:00
Julien Pivotto	9adad8ad30	Remove MaxConcurrent from the PromQL engine opts (#6712 ) Since we use ActiveQueryTracker to check for concurrency in `d992c36b3a` it does not make sense to keep the MaxConcurrent value as an option of the PromQL engine. This pull request removes it from the PromQL engine options, sets the max concurrent metric to -1 if there is no active query tracker, and use the value of the active query tracker otherwise. It removes dead code and also will inform people who import the promql package that we made that change, as it breaks the EngineOpts struct. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-01-28 20:38:49 +00:00
Julien Pivotto	56ebd5afde	Delete prometheus_rule_group metrics when groups are removed (#6693 ) * Delete prometheus_rule_group metrics when groups are removed Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-01-27 12:41:32 +00:00
Harkishen Singh	84e6459c4d	Adds support for line-column numbers for invalid rules, promtool (#6533 ) Signed-off-by: Harkishen Singh <harkishensingh@hotmail.com>	2020-01-15 18:07:54 +00:00
Julien Pivotto	e079c9ed45	manager: add full stops on comments Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2019-12-19 11:46:22 +01:00
alburthoffman	156dcb8cca	avoid stopping rule groups if new rule groups are as same as old rule… (#6450 ) * avoid stopping rule groups if new rule groups are as same as old rule groups Signed-off-by: alburthoffman <alburthoffman@gmail.com>	2019-12-19 10:41:11 +00:00
Chris Marchbanks	0685eb5395	Refactor testutil.NewStorage into a new package This avoids a circular dependency between the testutil and storage packages. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2019-08-08 19:43:04 -06:00
Brian Brazil	2184b79763	Mark deleted rule's series as stale on next evaluation. (#5759 ) Fixes #5755 Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	2019-08-07 16:11:05 +01:00
Simon Pasquier	45506841e6	*: enable all default linters (#5504 ) Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-05-03 15:11:28 +02:00
Bjoern Rabenstein	38d518c0fe	Rework #5009 after comments Signed-off-by: Bjoern Rabenstein <bjoern@rabenste.in>	2019-04-17 01:40:10 +02:00
James Ravn	e15d8c5802	reload: copy state on both name and labels (#5368 ) * reload: copy state on both name and labels Fix https://github.com/prometheus/prometheus/issues/5193 Using just name causes the linked issue - if new rules are inserted with the same name (but different labels), the reordering will cause stale markers to be inserted in the next eval for all shifted rules, despite them not being stale. Ideally we want to avoid stale markers for time series that still exist in the new rules, with name and labels being the unique identifer. This change adds labels to the internal map when copying the old rule data to the new rule data. This prevents the problem of staling rules that simply shifted order. If labels change, it is a new time series and the old series will stale regardless. So it should be safe to always match on name and labels when copying state. Signed-off-by: James Ravn <james@r-vn.org>	2019-03-15 15:23:36 +00:00
Matt Layher	302148fd69	*: apply gofmt -s Signed-off-by: Matt Layher <mdlayher@gmail.com>	2019-01-16 17:28:14 -05:00
Tom Wilkie	6e08029b56	Move err to be the last return value from storage.Select. (#5054 ) Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-01-02 11:10:13 +00:00
mknapphrt	f0e9196dca	Return warnings on a remote read fail (#4832 ) Signed-off-by: Mark Knapp <mknapp@hudson-trading.com>	2018-11-30 14:27:12 +00:00
Ben Kochie	c6399296dc	Fix spelling/typos (#4921 ) * Fix spelling/typos Fix spelling/typos reported by codespell/misspell. * UK -> US spelling changes. Signed-off-by: Ben Kochie <superq@gmail.com>	2018-11-27 17:44:29 +01:00
Callum Styan	9bca041285	WIP: keep track of samples per query, set a max # of samples (#4513 ) * keep track of samples per query, set a max # of samples that can be in memory at once Signed-off-by: Callum Styan <callumstyan@gmail.com>	2018-10-02 12:59:19 +01:00
Chris Marchbanks	63ed9d1b70	Send EndsAt along with alerts (#4550 ) Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2018-08-28 16:05:00 +01:00
Chris Marchbanks	87f1dad16d	throttle resends of alerts to 1 minute by default (#4538 ) Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2018-08-27 17:41:42 +01:00
Ganesh Vernekar	c663477688	Fixed TestUpdate in rules/manager_test.go (#4516 ) Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2018-08-20 18:21:05 +05:30
Ganesh Vernekar	a0a9e7df91	Fix TestForStateRestore (#4476 ) (#4512 ) Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2018-08-16 22:56:15 +05:30
Julius Volz	90521a65f8	Remove error return value from NotifyFunc() (#4459 ) It's always nil and we also forgot to check it. Signed-off-by: Julius Volz <julius.volz@gmail.com>	2018-08-04 21:31:12 +02:00
Ganesh Vernekar	f1db699dff	Persist alert 'for' state across restarts (#4061 ) Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2018-08-02 11:18:24 +01:00
Bryan Boreham	afdb66dfac	Expose Group.CopyState() (#4304 ) This makes the `rules` package more useful to projects that use Prometheus as a library. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2018-07-18 15:14:38 +02:00
Julius Volz	9e3171f6e3	rules: Minor naming/comment cleanups (#4328 ) Signed-off-by: Julius Volz <julius.volz@gmail.com>	2018-07-18 04:54:33 +01:00
Bryan Boreham	2bd510a63e	Make TestUpdate() do some work (#4306 ) Previously it would set no preconditions and check no postconditions, as the `groups` member was empty. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2018-06-22 15:21:04 +01:00
Warren Fernandes	58e2a31db8	Cleans up test by removing unused function (#3969 )	2018-03-15 08:59:19 +00:00
Fabian Reinartz	7ccd4b39b8	*: implement query params This adds a parameter to the storage selection interface which allows query engine(s) to pass information about the operations surrounding a data selection. This can for example be used by remote storage backends to infer the correct downsampling aggregates that need to be provided.	2018-02-13 12:17:22 +01:00
Brian Brazil	0a42a9fc8f	Copy over rule group duration on reload. This is currently getting lost, this will soon be in a metric and we don't want it dropping to 0 on every reload.	2017-12-04 11:44:38 +00:00
Fabian Reinartz	62461379b7	rules: decouple notifier packages The dependency on the notifier packages caused a transitive dependency on discovery and with that all client libraries our service discovery uses.	2017-11-27 16:38:14 +01:00
Fabian Reinartz	bd9f7460eb	rules: remove config package dependency	2017-11-24 07:57:54 +01:00
Fabian Reinartz	2d0e3746ac	rules: remove dependency on promql.Engine	2017-11-24 07:57:54 +01:00
Fabian Reinartz	83cd270ea4	*: adapt to storage interface changes	2017-11-23 19:05:04 +01:00
Jorge Hernández	6cd0f63eb1	Use testutil in rules subpackage (#3278 ) * Use testutil in rules subpackage * Fix manager test * Use testutil in rules subpackage * Fix manager test * Fix rebase * Change to testutil for applyConfig tests	2017-11-11 11:29:47 +01:00
Krasi Georgiev	e86d82ad2d	Fix regression of alert rules state loss on config reload. (#3382 ) * incorrect map name for the group prevented copying state from existing alert rules on config reload * applyConfig test * few nits * nits 2	2017-11-01 12:58:00 +01:00
Brian Brazil	cc5499fcad	Only close after checking for err.	2017-10-09 19:44:03 +01:00
Julius Volz	f7e8348a88	Re-add contexts to storage.Storage.Querier() (#3230 ) * Re-add contexts to storage.Storage.Querier() These are needed when replacing the storage by a multi-tenant implementation where the tenant is stored in the context. The 1.x query interfaces already had contexts, but they got lost in 2.x. * Convert promql.Engine to use native contexts	2017-10-04 21:04:15 +02:00
Fabian Reinartz	d21f149745	*: migrate to go-kit/log	2017-09-08 22:01:51 +05:30
Goutham Veeramachaneni	37e7b69f56	Merge remote-tracking branch 'upstream/dev-2.0' into rulegroups Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-06-19 16:34:55 +05:30
Goutham Veeramachaneni	507790a357	Rework logging to use explicitly passed logger Mostly cleaned up the global logger use. Still some uses in discovery package. Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-06-16 15:52:44 +05:30
Goutham Veeramachaneni	8cca666cf2	Add file name to group. Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-06-14 15:18:39 +05:30
Brian Brazil	dcea3e4773	Don't append a 0 when alert is no longer pending/firing With staleness we no longer need this behaviour.	2017-05-24 13:52:45 +01:00
Brian Brazil	cc867dae60	Copy previous series and alert state more intelligently. Usually rules don't more around, and if they do it's likely that rules/alerts with the same name stay in the same order. If rules/alerts with the same name are added/removed this could cause a blip for one cycle, but this is unavoidable without requiring rule and alert names to be unique - which we don't want to do.	2017-05-24 13:52:45 +01:00
Brian Brazil	0451d6d31b	Add unittest for rule staleness, and rules generally.	2017-05-24 13:52:45 +01:00
Fabian Reinartz	06c2b76cd4	Merge branch 'master' into uptsdb	2017-05-16 16:48:37 +02:00
Julius Volz	ac203ef0ee	Add externalURL template function (#2716 ) This allows users to e.g. add links back to the generating Prometheus right in their alert templates.	2017-05-13 15:47:04 +02:00
Fabian Reinartz	e94b0899ee	rules: fix tests, remove model types	2016-12-29 17:31:14 +01:00
Jonathan Lange	2a2da40223	Make rule evaluation publicly available Means that a third-party can parse rules and run them with their own execution model.	2016-11-18 16:12:50 +00:00
Julius Volz	c187308366	storage: Contextify storage interfaces. This is based on https://github.com/prometheus/prometheus/pull/1997. This adds contexts to the relevant Storage methods and already passes PromQL's new per-query context into the storage's query methods. The immediate motivation supporting multi-tenancy in Frankenstein, but this could also be used by Prometheus's normal local storage to support cancellations and timeouts at some point.	2016-09-19 16:29:07 +02:00
Julius Volz	ed5a0f0abe	promql: Allow per-query contexts. For Weaveworks' Frankenstein, we need to support multitenancy. In Frankenstein, we initially solved this without modifying the promql package at all: we constructed a new promql.Engine for every query and injected a storage implementation into that engine which would be primed to only collect data for a given user. This is problematic to upstream, however. Prometheus assumes that there is only one engine: the query concurrency gate is part of the engine, and the engine contains one central cancellable context to shut down all queries. Also, creating a new engine for every query seems like overkill. Thus, we want to be able to pass per-query contexts into a single engine. This change gets rid of the promql.Engine's built-in base context and allows passing in a per-query context instead. Central cancellation of all queries is still possible by deriving all passed-in contexts from one central one, but this is now the responsibility of the caller. The central query context is now created in main() and passed into the relevant components (web handler / API, rule manager). In a next step, the per-query context would have to be passed to the storage implementation, so that the storage can implement multi-tenancy or other features based on the contextual information.	2016-09-19 15:38:17 +02:00
Brian Brazil	6fc88d4b4d	Remove __name__ from alerts sent to AM. Fixes #1861	2016-08-01 23:32:41 +01:00
Brian Brazil	0509b0f2db	Expand alert templates at eval time. Fixes #1678 #1677	2016-07-12 17:13:55 +01:00
Tobias Schmidt	f1f8317fa5	Fix detection of flapping alerts Alerts in the resolve retention period must be transitioned to the active state again when their condition is met.	2016-02-04 23:55:12 -05:00
Fabian Reinartz	62075aa037	Reduce noisy no-alertmanager warning	2015-12-17 15:42:26 +01:00
Fabian Reinartz	52e5224f5a	Refactor rules/ package	2015-12-17 15:42:25 +01:00
Fabian Reinartz	7c90db22ed	Use annotation based alerts in rules/ This commit breaks the previously used alert format.	2015-12-14 10:16:07 +01:00
Fabian Reinartz	306e8468a0	Switch from client_golang/model to common/model	2015-08-21 13:33:38 +02:00
Fabian Reinartz	f06cf664e1	rules: cleanup alerting test	2015-06-30 18:22:24 +02:00

1 2 3 4

151 Commits (48287b15d4fe3a2c9578982bdffeccdc0e14d1ef)