prometheus

Commit Graph

Author	SHA1	Message	Date
Kemal Akkoyun	66dfb951c4	: Consistent Error/Warning handling for SeriesSet iterator: Allowing Async Select (#7251 ) Add errors and Warnings to SeriesSet Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Change Querier interface and refactor accordingly Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Refactor promql/engine to propagate warnings at eval stage Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Address review issues Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Make sure all the series from all Selects are pre-advanced Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Address review issues Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Separate merge series sets Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Clean Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Refactor merge querier failure handling Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Refactored and simplified fanout with improvements from incoming chunk iterator PRs. * Secondary logic is hidden, instead of weird failed series set logic we had. * Fanout is well commented * Fanout closing record all errors * MergeQuerier improved API (clearer) * deferredGenericMergeSeriesSet is not needed as we return no samples anyway for failed series sets (next = false). Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fix formatting Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Fix CI issues Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Added final tests for error handling. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed Brian's comments. * Moved hints in populate to be allocated only when needed. * Used sync.Once in secondary Querier to achieve all-or-nothing partial response logic. * Select after first Next is done will panic. NOTE: in lazySeriesSet in theory we could just panic, I think however we can totally just return error, it will panic in expand anyway. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Utilize errWithWarnings Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Fix recently introduced expansion issue Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Add tests for secondary querier error handling Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Implement lazy merge Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Add name to test cases Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Reorganize Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Address review comments Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Address review comments Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Remove redundant warnings Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Fix rebase mistake Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>	5 years ago
Brian Brazil	5368066b58	Give a bit more slack for alertmanager send failures. (#7228 ) Fixes #5277 Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	5 years ago
Julien Pivotto	fc3fb3265a	Merge pull request #7145 from prometheus/release-2.17 Backport release 2.17 into master	5 years ago
Chris Marchbanks	a7b449320d	Fix updating rule manager never finishing (#7138 ) Rather than sending a value to the done channel on a group to indicate whether or not to add stale markers to a closing rule group use an explicit boolean. This allows more functions than just run() to read from the done channel and fixes an issue where Eval() could consume the channel during an update, causing run() to never return. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	5 years ago
ZouYu	2b7437d60e	Fix some warnings: 'redundant type from array, slice, or map composite literal' (#7109 ) Signed-off-by: ZouYu <zouy.fnst@cn.fujitsu.com>	5 years ago
Marek Slabicki	8224ddec23	Capitalizing first letter of all log lines (#7043 ) Signed-off-by: Marek Slabicki <thaniri@gmail.com>	5 years ago
Ben Ye	00730bfee7	add rule_group label to rule evaluation metrics (#7094 ) Signed-off-by: yeya24 <yb532204897@gmail.com>	5 years ago
Muhammad Falak R Wani	2d1a80aa82	rules: manager: clarify doc string for NewGroupMetrics (#7084 ) * rules: manager: clarify doc string for NewGroupMetrics Signed-off-by: Muhammad Falak R Wani <falakreyaz@gmail.com>	5 years ago
Brian Brazil	7646cbca32	Use .UTC everywhere we use time.Unix (#7066 ) time.Unix attaches the local timezone, which can then leak out (e.g. in the alert json). While this is harmless, we should be consistent. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	5 years ago
Bartlomiej Plotka	c4eefd1b3a	storage: Removed SelectSorted method; Simplified interface; Added requirement for remote read to sort response. This is technically BREAKING CHANGE, but it was like this from the beginning: I just notice that we rely in Prometheus on remote read being sorted. This is because we use selected data from remote reads in MergeSeriesSet which rely on sorting. I found during work on https://github.com/prometheus/prometheus/pull/5882 that we do so many repetitions because of this, for not good reason. I think I found a good balance between convenience and readability with just one method. Smaller the interface = better. Also I don't know what TestSelectSorted was testing, but now it's testing sorting. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	5 years ago
Björn Rabenstein	1da83305be	Merge pull request #7009 from prometheus/release-2.17 Merge release-2.17 into master	5 years ago
Julien Pivotto	8907ba6235	Make TSDB use storage errors This fixes #6992, which was introduced by #6777. There was an intermediate component which translated TSDB errors into storage errors, but that component was deleted and this bug went unnoticed, until we were watching at the Prombench results. Without this, scrape will fail instead of dropping samples or using "Add" when the series have been garbage collected. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	5 years ago
Björn Rabenstein	d80b0810c1	Move crucial actions to defer (#6918 ) With defer having less of a performance penalty, there is no reason not to do those crucial operations via defer. Context: With isolation in place, if we forget to Commit/Rollback, the low watermark will get stuck forever. The current code should not have any bugs, but moving to defer helps to avoid future bugs. This is also moving the `closeAppend` in the `Commit` implementation itself to defer. If logging to the WAL fails, we would have missed the `closeAppend`. Signed-off-by: beorn7 <beorn@grafana.com>	5 years ago
Bartlomiej Plotka	fe802f29c9	storage: Removed SelectSorted method; Simplified interface; Added requirement for remote read to sort response. This is technically BREAKING CHANGE, but it was like this from the beginning: I just notice that we rely in Prometheus on remote read being sorted. This is because we use selected data from remote reads in MergeSeriesSet which rely on sorting. I found during work on https://github.com/prometheus/prometheus/pull/5882 that we do so many repetitions because of this, for not good reason. I think I found a good balance between convenience and readability with just one method. Smaller the interface = better. Also I don't know what TestSelectSorted was testing, but now it's testing sorting. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	5 years ago
Julien Pivotto	2051ae2e6a	Revert "Fix race condition in Rule manager.Update() function" This reverts commit `8b11d2cfb6`. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	5 years ago
fuling	8b11d2cfb6	Fix race condition in Rule manager.Update() function Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>	5 years ago
Tobias Guggenmos	4835bbf376	Merge branch 'master' into split_parser	5 years ago
Bartlomiej Plotka	34426766d8	Unify Iterator interfaces. All point to storage now. This is part of https://github.com/prometheus/prometheus/pull/5882 that can be done to simplify things. All todos I added will be fixed in follow up PRs. * querier.Querier, querier.Appender, querier.SeriesSet, and querier.Series interfaces merged with storage interface.go. All imports that. * querier.SeriesIterator replaced by chunkenc.Iterator * Added chunkenc.Iterator.Seek method and tests for xor implementation (?) * Since we properly handle SelectParams for Select methods I adjusted min max based on that. This should help in terms of performance for queries with functions like offset. * added Seek to deletedIterator and test. * storage/tsdb was removed as it was only a unnecessary glue with incompatible structs. No logic was changed, only different source of abstractions, so no need for benchmarks. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	5 years ago
Tobias Guggenmos	6c00f2ffcb	Comment fixes Signed-off-by: Tobias Guggenmos <tguggenm@redhat.com>	5 years ago
Tobias Guggenmos	20b1f596f6	Fix build errors in rest of prometheus Signed-off-by: Tobias Guggenmos <tguggenm@redhat.com>	5 years ago
Julien Pivotto	135cc30063	rules: Make deleted rule series as stale after a reload (#6745 ) * rules: Make deleted rule series as stale after a reload Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	5 years ago
Julien Pivotto	0e912faf4f	rules: Cleanup unused function alert.Duration (#6734 ) The function HoldDuration and Duration did the exact same thing. Let's only keep HoldDuration() as Duration() is more confusing. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	5 years ago
Julien Pivotto	f82d55e79f	Refactor and simplify rule_group_interval_seconds (#6711 ) Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	5 years ago
Julien Pivotto	9adad8ad30	Remove MaxConcurrent from the PromQL engine opts (#6712 ) Since we use ActiveQueryTracker to check for concurrency in `d992c36b3a` it does not make sense to keep the MaxConcurrent value as an option of the PromQL engine. This pull request removes it from the PromQL engine options, sets the max concurrent metric to -1 if there is no active query tracker, and use the value of the active query tracker otherwise. It removes dead code and also will inform people who import the promql package that we made that change, as it breaks the EngineOpts struct. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	5 years ago
Julien Pivotto	56ebd5afde	Delete prometheus_rule_group metrics when groups are removed (#6693 ) * Delete prometheus_rule_group metrics when groups are removed Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	5 years ago
Julien Pivotto	cf42888e4d	Fix order of testutil.Equals (#6695 ) Equals takes the expected value as first parameter, and the actual value as second parameter. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	5 years ago
Julien Pivotto	5f27ac3583	Refactor query log fields (#6694 ) * Refactor query log fields Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	5 years ago
Harkishen Singh	84e6459c4d	Adds support for line-column numbers for invalid rules, promtool (#6533 ) Signed-off-by: Harkishen Singh <harkishensingh@hotmail.com>	5 years ago
Julien Pivotto	0011bba19b	Query Log: add origin of the rules (#6592 ) * Query Log: add origin of the rules We don't set rule name and rule kind because the added value would be quite low, given we have now the file, the group and the query. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	5 years ago
Julien Pivotto	e079c9ed45	manager: add full stops on comments Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	5 years ago
alburthoffman	156dcb8cca	avoid stopping rule groups if new rule groups are as same as old rule… (#6450 ) * avoid stopping rule groups if new rule groups are as same as old rule groups Signed-off-by: alburthoffman <alburthoffman@gmail.com>	5 years ago
Julien Pivotto	2d7c8069d0	Check that rules don't contain metrics with the same labelset (#6469 ) Closes #5529 Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	5 years ago
Simon Pasquier	06066a3619	*: improve error messages when parsing bad rules (#5965 ) Signed-off-by: Simon Pasquier <spasquie@redhat.com>	5 years ago
Brian Brazil	e62f30d497	Correctly handle empty labels from alert templates. (#5845 ) Fixes https://github.com/prometheus/common/issues/36 Move logic handling this into the labels package, so all the cases are handled in one place and we're less likely to have this come up again. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	5 years ago
Chris Marchbanks	0685eb5395	Refactor testutil.NewStorage into a new package This avoids a circular dependency between the testutil and storage packages. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	5 years ago
Brian Brazil	2184b79763	Mark deleted rule's series as stale on next evaluation. (#5759 ) Fixes #5755 Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	5 years ago
AllenZMC	5c2c9a03e9	fix word 'consequentally' to 'consequently' (#5827 ) Signed-off-by: czm <zhongming.chang@daocloud.io>	5 years ago
beorn7	dd81912554	Add objectives to Summaries With the next release of client_golang, Summaries will not have objectives by default. To not lose the objectives we have right now, explicitly state the current default objectives. Signed-off-by: beorn7 <beorn@grafana.com>	6 years ago
pbhudiaBAE	43953b105b	Sorting alerts by group name in /alerts (#5448 ) * Working group name Signed-off-by: Pritam Bhudia <pritam.bhudia@baesystems.com> * Working categorised by group name Signed-off-by: Pritam Bhudia <pritam.bhudia@baesystems.com> * Changed group sorting in web Signed-off-by: Pritam Bhudia <pritam.bhudia@baesystems.com> * Fixed group sorting and comments Signed-off-by: Pritam Bhudia <pritam.bhudia@baesystems.com> * Fixed group sorting and comments with gofmt Signed-off-by: Pritam Bhudia <pritam.bhudia@baesystems.com> * Added file and group name Signed-off-by: Pritam Bhudia <pritam.bhudia@baesystems.com> * reverted back to full path to yml file Signed-off-by: Pritam Bhudia <pritam.bhudia@baesystems.com>	6 years ago
Yao Zengzeng	5544cb252a	fix some mistakes in comments (#5533 ) Signed-off-by: YaoZengzeng <yaozengzeng@zju.edu.cn>	6 years ago
Simon Pasquier	45506841e6	*: enable all default linters (#5504 ) Signed-off-by: Simon Pasquier <spasquie@redhat.com>	6 years ago
Bjoern Rabenstein	76102d570c	Add test for external labels in label template Signed-off-by: Bjoern Rabenstein <bjoern@rabenste.in>	6 years ago
Bjoern Rabenstein	38d518c0fe	Rework #5009 after comments Signed-off-by: Bjoern Rabenstein <bjoern@rabenste.in>	6 years ago
Sylvain Rabot	335a34486e	Add external labels to template expansion This affects the expansion of templates in alert labels and annotations and console templates. Signed-off-by: Sylvain Rabot <sylvain@abstraction.fr>	6 years ago
Tariq Ibrahim	8fdfa8abea	refine error handling in prometheus (#5388 ) i) Uses the more idiomatic Wrap and Wrapf methods for creating nested errors. ii) Fixes some incorrect usages of fmt.Errorf where the error messages don't have any formatting directives. iii) Does away with the use of fmt package for errors in favour of pkg/errors Signed-off-by: tariqibrahim <tariq181290@gmail.com>	6 years ago
James Ravn	e15d8c5802	reload: copy state on both name and labels (#5368 ) * reload: copy state on both name and labels Fix https://github.com/prometheus/prometheus/issues/5193 Using just name causes the linked issue - if new rules are inserted with the same name (but different labels), the reordering will cause stale markers to be inserted in the next eval for all shifted rules, despite them not being stale. Ideally we want to avoid stale markers for time series that still exist in the new rules, with name and labels being the unique identifer. This change adds labels to the internal map when copying the old rule data to the new rule data. This prevents the problem of staling rules that simply shifted order. If labels change, it is a new time series and the old series will stale regardless. So it should be safe to always match on name and labels when copying state. Signed-off-by: James Ravn <james@r-vn.org>	6 years ago
David Symonds	46361a7c85	rules: Fix sorting of result from (*Manager).RuleGroups (#5260 ) The previous code was defective in that it never sorted groups within a file due to doing a multi-key sort incorrectly. Signed-off-by: David Symonds <dsymonds@gmail.com>	6 years ago
beorn7	2db1eeb4ec	Fix prometheus_rule_group_last_evaluation_timestamp_seconds It should be a unix timestamp, not the seconds in the minute. Signed-off-by: beorn7 <beorn@soundcloud.com>	6 years ago
Ganesh Vernekar	787eb1e904	Set rule_group_last_duration_seconds to seconds (#5153 ) Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	6 years ago
Matt Layher	302148fd69	*: apply gofmt -s Signed-off-by: Matt Layher <mdlayher@gmail.com>	6 years ago
Vishnunarayan K I	fd3ef6ba34	Add metric rule_group_rules_loaded to get the number of rules loaded (#5090 ) Signed-off-by: Vishnunarayan K I <appukuttancr@gmail.com>	6 years ago
Simon Pasquier	f678e27eb6	: use latest release of staticcheck (#5057 ) : use latest release of staticcheck It also fixes a couple of things in the code flagged by the additional checks. Signed-off-by: Simon Pasquier <spasquie@redhat.com> Use official release of staticcheck Also run 'go list' before staticcheck to avoid failures when downloading packages. Signed-off-by: Simon Pasquier <spasquie@redhat.com>	6 years ago
Tom Wilkie	121603c417	Expose rules.NewGroupMetrics and rules.Metrics. (#5059 ) Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	6 years ago
Tom Wilkie	6e08029b56	Move err to be the last return value from storage.Select. (#5054 ) Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	6 years ago
Bartek Płotka	de213d4a5e	rule manager: Moved metric registration to custom registerer which is already available. (#4961 ) Signed-off-by: Bartek Plotka <bwplotka@gmail.com>	6 years ago
AixesHunter	fb8479a677	Variable 'labels' collides with imported package name (#5012 ) Signed-off-by: aixeshunter <aixeshunter@gmail.com>	6 years ago
mknapphrt	f0e9196dca	Return warnings on a remote read fail (#4832 ) Signed-off-by: Mark Knapp <mknapp@hudson-trading.com>	6 years ago
Krasi Georgiev	0754e5334b	querier for RestoreForState not closed. (#4922 ) Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>	6 years ago
Ben Kochie	c6399296dc	Fix spelling/typos (#4921 ) * Fix spelling/typos Fix spelling/typos reported by codespell/misspell. * UK -> US spelling changes. Signed-off-by: Ben Kochie <superq@gmail.com>	6 years ago
Wei Guo	e329cbf673	Add metric prometheus_rule_group_last_evaluation for recording and alerting (#4852 ) * add metric prometheus_rule_group_last_evaluation for recording and alerting Signed-off-by: Wei Guo <me@imkira.com> * fix issues from comments Signed-off-by: Wei Guo <me@imkira.com>	6 years ago
Will Hegedus	193ebe7e34	Updates to /targets and /rules (scrape duration, last evaluation time) (#4722 ) * Add evaluationTimestamp (Last Evaluation) column to display on /rules Signed-off-by: Will Hegedus <wbhegedus@liberty.edu> * Add lastScrapeDuration ("Scrape Duration") to display on /targets Signed-off-by: Will Hegedus <wbhegedus@liberty.edu> * Updates based on Julius' feedback Signed-off-by: Will Hegedus <wbhegedus@liberty.edu> * Update to set timestamp to when eval started (after eval completes) Signed-off-by: Will Hegedus <wbhegedus@liberty.edu> * Update /rules to display time since last evaluation Signed-off-by: Will Hegedus <wbhegedus@liberty.edu> * Re-order Last Eval/Eval Time to be consistent with targets page Signed-off-by: Will Hegedus <wbhegedus@liberty.edu>	6 years ago
Callum Styan	9bca041285	WIP: keep track of samples per query, set a max # of samples (#4513 ) * keep track of samples per query, set a max # of samples that can be in memory at once Signed-off-by: Callum Styan <callumstyan@gmail.com>	6 years ago
Ganesh Vernekar	5790d23fd8	Unit testing for rules (#4350 ) * Unit testing for rules * Specifying order of group evaluation in unit tests Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	6 years ago
Ganesh Vernekar	05726c5ea2	Test template expansion while loading groups (#4537 ) Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	6 years ago
Chris Marchbanks	63ed9d1b70	Send EndsAt along with alerts (#4550 ) Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	6 years ago
Chris Marchbanks	87f1dad16d	throttle resends of alerts to 1 minute by default (#4538 ) Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	6 years ago
Goutham Veeramachaneni	f3b7c22827	rules: add comment about lock taking (#4525 ) Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>	6 years ago
Ganesh Vernekar	c663477688	Fixed TestUpdate in rules/manager_test.go (#4516 ) Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	6 years ago
Julius Volz	8fbe1b5133	Handle a bunch of unchecked errors (#4461 ) There are many more (mostly finalizers like Close/Stop/etc.), but most of the others seemed like one couldn't do much about them anyway. Signed-off-by: Julius Volz <julius.volz@gmail.com>	6 years ago
Ganesh Vernekar	a0a9e7df91	Fix TestForStateRestore (#4476 ) (#4512 ) Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	6 years ago
Julien Pivotto	0b4d22b245	rules/manager: remove a no-longer-relevant comment (#4503 ) Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	6 years ago
Chris Marchbanks	11155c7028	Existing alert labels will update based on templates (#4500 ) Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	6 years ago
Fabian Reinartz	b7e2f407de	rules: Fix double-locking of mutex Signed-off-by: Fabian Reinartz <freinartz@google.com>	6 years ago
Benji Visser	8bb6e0dd6e	Show rule evaluation errors on rules page (#4457 ) * adding information about the health and errors for Rules adding Health() and LastError() to the Rule interface. This will allow us to easily surface information about rules. Signed-off-by: noqcks <benny@noqcks.io> * updating rules.html with fields for Rule errors and health state Signed-off-by: noqcks <benny@noqcks.io> * fix code comment grammar & access Rule health/error info using a mutex Signed-off-by: noqcks <benny@noqcks.io> * s/Errors/Error/ in rules.html to remain consistent with targets.html Signed-off-by: noqcks <benny@noqcks.io> * adding periods to code comments in reporting/alerting Signed-off-by: noqcks <benny@noqcks.io> * putting health/error below mutex in struct field Signed-off-by: noqcks <benny@noqcks.io>	6 years ago
Julius Volz	2b8fc062a8	rules: HTML-escape rule YAML marshal errors (#4464 ) This was pointed out by `gosec`. Signed-off-by: Julius Volz <julius.volz@gmail.com>	6 years ago
Julius Volz	90521a65f8	Remove error return value from NotifyFunc() (#4459 ) It's always nil and we also forgot to check it. Signed-off-by: Julius Volz <julius.volz@gmail.com>	6 years ago
Ganesh Vernekar	f1db699dff	Persist alert 'for' state across restarts (#4061 ) Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	6 years ago
Max Leonard Inden	71fafad099	api/v1: Coninue work exposing rules and alerts Signed-off-by: Max Leonard Inden <IndenML@gmail.com>	6 years ago
mg03	31f8ca0dfb	api v1 alerts/rules json endpoint Signed-off-by: mg03 <mgeng03@gmail.com>	6 years ago
Bryan Boreham	afdb66dfac	Expose Group.CopyState() (#4304 ) This makes the `rules` package more useful to projects that use Prometheus as a library. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	6 years ago
Julius Volz	9e3171f6e3	rules: Minor naming/comment cleanups (#4328 ) Signed-off-by: Julius Volz <julius.volz@gmail.com>	6 years ago
Bryan Boreham	2bd510a63e	Make TestUpdate() do some work (#4306 ) Previously it would set no preconditions and check no postconditions, as the `groups` member was empty. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	7 years ago
Alin Sinpalean	9dc763cc03	Run rule evaluation with timestamps precisely evaluation_interval apart (#4201 ) * Run rule evaluation with timestamps precisely evaluation_interval apart from one another. Signed-off-by: Alin Sinpalean <alin.sinpalean@gmail.com>	7 years ago
Mario Trangoni	464e747f1e	fix some comments typos (#4059 )	7 years ago
Bryan Boreham	93494d8b7e	Add an OpenTracing span for each rule (#4027 ) * Add an OpenTracing span for each rule So that tags and child spans can be traced back to the rule that they refer to.	7 years ago
ferhat elmas	ec8e4d8a7c	all: remove unnecessary type conversions (#3992 ) excep promql due to not to create conflict with #3966.	7 years ago
Warren Fernandes	58e2a31db8	Cleans up test by removing unused function (#3969 )	7 years ago
ferhat elmas	ffa673f7d8	General simplifications (#3887 ) Another try as in #1516	7 years ago
Fabian Reinartz	7ccd4b39b8	*: implement query params This adds a parameter to the storage selection interface which allows query engine(s) to pass information about the operations surrounding a data selection. This can for example be used by remote storage backends to infer the correct downsampling aggregates that need to be provided.	7 years ago
Simon Pasquier	81c0ab69e0	Don't reset FiredAt for inactive alerts Otherwise AlertManager receives resolved alerts where StartsAt is zero which fails the validation.	7 years ago
Brian Brazil	30b4439bbd	Remove rule_type label from rule metrics. This is not really needed now that we have rule groups to distinguish rules.	7 years ago
Brian Brazil	b97f4cf48c	Add metrics for rule group interval and last duration.	7 years ago
Brian Brazil	0a42a9fc8f	Copy over rule group duration on reload. This is currently getting lost, this will soon be in a metric and we don't want it dropping to 0 on every reload.	7 years ago
Brian Brazil	aa370fa568	Clarify metric names around rule groups. Make it clear they're about overall rule groups.	7 years ago
Fabian Reinartz	62461379b7	rules: decouple notifier packages The dependency on the notifier packages caused a transitive dependency on discovery and with that all client libraries our service discovery uses.	7 years ago
Fabian Reinartz	4d964a0a0d	rules: make glob expansion a concern of main	7 years ago
Fabian Reinartz	bd9f7460eb	rules: remove config package dependency	7 years ago
Fabian Reinartz	2d0e3746ac	rules: remove dependency on promql.Engine	7 years ago
Fabian Reinartz	2ec5965b75	Merge pull request #3508 from prometheus/uptsdb update TSDB	7 years ago
Fabian Reinartz	83cd270ea4	*: adapt to storage interface changes	7 years ago
Goutham Veeramachaneni	a880c86375	Fix unexported method on exported interface. Also move to model.Duration Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	7 years ago
conorbroderick	55aaece116	Add rule evaluation time	7 years ago
Goutham Veeramachaneni	e1117715fe	rules: remove skipped iterations cuz no throttling Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	7 years ago
Jorge Hernández	6cd0f63eb1	Use testutil in rules subpackage (#3278 ) * Use testutil in rules subpackage * Fix manager test * Use testutil in rules subpackage * Fix manager test * Fix rebase * Change to testutil for applyConfig tests	7 years ago
Krasi Georgiev	e86d82ad2d	Fix regression of alert rules state loss on config reload. (#3382 ) * incorrect map name for the group prevented copying state from existing alert rules on config reload * applyConfig test * few nits * nits 2	7 years ago
Julius Volz	099df0c5f0	Migrate "golang.org/x/net/context" -> "context" (#3333 ) In some places, where ctxhttp or gRPC are concerned, we still need to use the old contexts.	7 years ago
Brian Brazil	cc5499fcad	Only close after checking for err.	7 years ago
Brian Brazil	ee88f0d222	Ensure all values are used or _	7 years ago
Fabian Reinartz	2d0b8e8b94	Merge branch 'master' into dev-2.0	7 years ago
Julius Volz	f7e8348a88	Re-add contexts to storage.Storage.Querier() (#3230 ) * Re-add contexts to storage.Storage.Querier() These are needed when replacing the storage by a multi-tenant implementation where the tenant is stored in the context. The 1.x query interfaces already had contexts, but they got lost in 2.x. * Convert promql.Engine to use native contexts	7 years ago
beorn7	c2e9a151ab	Make all rule links link to the "Console" tab rather than "Graph" Clicking on a rule, either the name or the expression, opens the rule result (or the corresponding expression, repsectively) in the expression browser. This should by default happen in the console tab, as, more often than not, displaying it in the graph tab runs into a timeout.	7 years ago
Fabian Reinartz	d21f149745	*: migrate to go-kit/log	7 years ago
Goutham Veeramachaneni	e1fc9dc78d	Move /rules to new format (#2901 ) Fixes #2891 Signed-off-by: Goutham Veeramachaneni <goutham@boomerangcommerce.com>	7 years ago
Goutham Veeramachaneni	37e7b69f56	Merge remote-tracking branch 'upstream/dev-2.0' into rulegroups Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	8 years ago
Goutham Veeramachaneni	c472316fb3	Check done before every rule evaluation. Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	8 years ago
Goutham Veeramachaneni	6b70a4d850	Incorporate PR feedback * Move fingerprint to Hash() * Move away from tsdb.MultiError * 0777 -> 0666 for files * checkOverflow of extra fields Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	8 years ago
Goutham Veeramachaneni	507790a357	Rework logging to use explicitly passed logger Mostly cleaned up the global logger use. Still some uses in discovery package. Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	8 years ago
Goutham Veeramachaneni	dc69645e92	Move back to go-yaml Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	8 years ago
Goutham Veeramachaneni	5ff283a7b7	Reflect the grouping in the UI Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	8 years ago
Goutham Veeramachaneni	8cca666cf2	Add file name to group. Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	8 years ago
Goutham Veeramachaneni	e893c89333	Validate labels and annotations Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	8 years ago
Goutham Veeramachaneni	a48a018368	Make sure groups are unique in a single file Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	8 years ago
Goutham Veeramachaneni	cea1e99f78	Add update-rules command to promtool Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	8 years ago
Goutham Veeramachaneni	e8f55669ea	Move rules to new format Signed-off-by: Goutham Veeramachaneni <goutham@boomerangcommerce.com>	8 years ago
Brian Brazil	dcea3e4773	Don't append a 0 when alert is no longer pending/firing With staleness we no longer need this behaviour.	8 years ago
Brian Brazil	cc867dae60	Copy previous series and alert state more intelligently. Usually rules don't more around, and if they do it's likely that rules/alerts with the same name stay in the same order. If rules/alerts with the same name are added/removed this could cause a blip for one cycle, but this is unavoidable without requiring rule and alert names to be unique - which we don't want to do.	8 years ago
Brian Brazil	9bc68db7e6	Track staleness per rule rather than per group.	8 years ago
Brian Brazil	0451d6d31b	Add unittest for rule staleness, and rules generally.	8 years ago
Brian Brazil	0400f3cfd2	Very basic staleness handling for rules.	8 years ago
Fabian Reinartz	06c2b76cd4	Merge branch 'master' into uptsdb	8 years ago
Alexey Palazhchenko	b0e1ea7c6c	Simplify code, fix typos. (#2719 )	8 years ago
Julius Volz	ac203ef0ee	Add externalURL template function (#2716 ) This allows users to e.g. add links back to the generating Prometheus right in their alert templates.	8 years ago
Julius Volz	fe11c5933a	Fix mutation of active alert elements by notifier (#2656 ) This caused the external label application in the notifier to bleed back into the rule manager's active alerting elements.	8 years ago
Fabian Reinartz	8ffc851147	Merge branch 'master' into dev-2.0	8 years ago
Tobias Schmidt	eaf33759fb	Register forgotten prometheus_evaluator_iterations_total metric	8 years ago
Tobias Schmidt	aaaba57184	Export number of missed rule evaluations In case the execution of all rules takes longer than the configured rule evaluation interval, one or more iterations will be skipped. This needs to be visible to the opterator.	8 years ago
Fabian Reinartz	5772f1a7ba	retrieval/storage: adapt to new interface This simplifies the interface to two add methods for appends with labels or faster reference numbers.	8 years ago
Fabian Reinartz	ad9bc62e4c	storage: extend appender and adapt it	8 years ago
Fabian Reinartz	e94b0899ee	rules: fix tests, remove model types	8 years ago
Fabian Reinartz	f8fc1f5bb2	*: migrate ingestion to new batch Appender	8 years ago
Fabian Reinartz	fecf9532b9	*: fix misc compile errors	8 years ago
Fabian Reinartz	622ece6273	*: fix recording tests, migrate matcher types	8 years ago
Fabian Reinartz	5817cb5bde	: migrate from model. to promql.* types	8 years ago
Fabian Reinartz	e68a3cf21f	rules: update annotations on each iteration	8 years ago
Jonathan Lange	d78dd3593d	Set evaluation interval on Group construction Prevents having object in invalid state, and allows users of public API to construct valid Groups.	8 years ago
Jonathan Lange	31fc357cd8	Make NewGroup and Group.Eval public Allows callers to execute evaluate lists of rules without first writing them to disk.	8 years ago
Jonathan Lange	2a2da40223	Make rule evaluation publicly available Means that a third-party can parse rules and run them with their own execution model.	8 years ago
Matt Bostock	926a5ab3dd	rules/manager.go: Fix race between reload and stop On one relatively large Prometheus instance (1.7M series), I noticed that upgrades were frequently resulting in Prometheus undergoing crash recovery on start-up. On closer examination, I found that Prometheus was panicking on shutdown. It seems that our configuration management (or misconfiguration thereof) is reloading Prometheus then immediately restarting it, which I suspect is causing this race: Sep 21 15:12:42 host systemd[1]: Reloading prometheus monitoring system. Sep 21 15:12:42 host prometheus[18734]: time="2016-09-21T15:12:42Z" level=info msg="Loading configuration file /etc/prometheus/config.yaml" source="main.go:221" Sep 21 15:12:42 host systemd[1]: Reloaded prometheus monitoring system. Sep 21 15:12:44 host systemd[1]: Stopping prometheus monitoring system... Sep 21 15:12:44 host prometheus[18734]: time="2016-09-21T15:12:44Z" level=warning msg="Received SIGTERM, exiting gracefully..." source="main.go:203" Sep 21 15:12:44 host prometheus[18734]: time="2016-09-21T15:12:44Z" level=info msg="See you next time!" source="main.go:210" Sep 21 15:12:44 host prometheus[18734]: time="2016-09-21T15:12:44Z" level=info msg="Stopping target manager..." source="targetmanager.go:90" Sep 21 15:12:52 host prometheus[18734]: time="2016-09-21T15:12:52Z" level=info msg="Checkpointing in-memory metrics and chunks..." source="persistence.go:548" Sep 21 15:12:56 host prometheus[18734]: time="2016-09-21T15:12:56Z" level=warning msg="Error on ingesting out-of-order samples" numDropped=1 source="scrape.go:467" Sep 21 15:12:56 host prometheus[18734]: time="2016-09-21T15:12:56Z" level=error msg="Error adding file watch for \"/etc/prometheus/targets\": no such file or directory" source="file.go:84" Sep 21 15:12:56 host prometheus[18734]: time="2016-09-21T15:12:56Z" level=error msg="Error adding file watch for \"/etc/prometheus/targets\": no such file or directory" source="file.go:84" Sep 21 15:13:01 host prometheus[18734]: time="2016-09-21T15:13:01Z" level=info msg="Stopping rule manager..." source="manager.go:366" Sep 21 15:13:01 host prometheus[18734]: time="2016-09-21T15:13:01Z" level=info msg="Rule manager stopped." source="manager.go:372" Sep 21 15:13:01 host prometheus[18734]: time="2016-09-21T15:13:01Z" level=info msg="Stopping notification handler..." source="notifier.go:325" Sep 21 15:13:01 host prometheus[18734]: time="2016-09-21T15:13:01Z" level=info msg="Stopping local storage..." source="storage.go:381" Sep 21 15:13:01 host prometheus[18734]: time="2016-09-21T15:13:01Z" level=info msg="Stopping maintenance loop..." source="storage.go:383" Sep 21 15:13:01 host prometheus[18734]: panic: close of closed channel Sep 21 15:13:01 host prometheus[18734]: goroutine 7686074 [running]: Sep 21 15:13:01 host prometheus[18734]: panic(0xba57a0, 0xc60c42b500) Sep 21 15:13:01 host prometheus[18734]: /usr/local/go/src/runtime/panic.go:500 +0x1a1 Sep 21 15:13:01 host prometheus[18734]: github.com/prometheus/prometheus/rules.(Manager).ApplyConfig.func1(0xc6645a9901, 0xc420271ef0, 0xc420338ed0, 0xc60c42b4f0, 0xc6645a9900) Sep 21 15:13:01 host prometheus[18734]: /home/build/packages/prometheus/tmp/build/gopath/src/github.com/prometheus/prometheus/rules/manager.go:412 +0x3c Sep 21 15:13:01 host prometheus[18734]: created by github.com/prometheus/prometheus/rules.(Manager).ApplyConfig Sep 21 15:13:01 host prometheus[18734]: /home/build/packages/prometheus/tmp/build/gopath/src/github.com/prometheus/prometheus/rules/manager.go:423 +0x56b Sep 21 15:13:03 host systemd[1]: prometheus.service: main process exited, code=exited, status=2/INVALIDARGUMENT	8 years ago
Julius Volz	c187308366	storage: Contextify storage interfaces. This is based on https://github.com/prometheus/prometheus/pull/1997. This adds contexts to the relevant Storage methods and already passes PromQL's new per-query context into the storage's query methods. The immediate motivation supporting multi-tenancy in Frankenstein, but this could also be used by Prometheus's normal local storage to support cancellations and timeouts at some point.	8 years ago
Julius Volz	ed5a0f0abe	promql: Allow per-query contexts. For Weaveworks' Frankenstein, we need to support multitenancy. In Frankenstein, we initially solved this without modifying the promql package at all: we constructed a new promql.Engine for every query and injected a storage implementation into that engine which would be primed to only collect data for a given user. This is problematic to upstream, however. Prometheus assumes that there is only one engine: the query concurrency gate is part of the engine, and the engine contains one central cancellable context to shut down all queries. Also, creating a new engine for every query seems like overkill. Thus, we want to be able to pass per-query contexts into a single engine. This change gets rid of the promql.Engine's built-in base context and allows passing in a per-query context instead. Central cancellation of all queries is still possible by deriving all passed-in contexts from one central one, but this is now the responsibility of the caller. The central query context is now created in main() and passed into the relevant components (web handler / API, rule manager). In a next step, the per-query context would have to be passed to the storage implementation, so that the storage can implement multi-tenancy or other features based on the contextual information.	8 years ago

1 2 3 4 5 ...

522 Commits (3711339a7d4f787b136a2ef1b282dd9b9ea161b9)