prometheus

Commit Graph

Author	SHA1	Message	Date
Bryan Boreham	2bd510a63e	Make TestUpdate() do some work (#4306 ) Previously it would set no preconditions and check no postconditions, as the `groups` member was empty. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2018-06-22 15:21:04 +01:00
Alin Sinpalean	9dc763cc03	Run rule evaluation with timestamps precisely evaluation_interval apart (#4201 ) * Run rule evaluation with timestamps precisely evaluation_interval apart from one another. Signed-off-by: Alin Sinpalean <alin.sinpalean@gmail.com>	2018-06-01 15:23:07 +01:00
Mario Trangoni	464e747f1e	fix some comments typos (#4059 )	2018-04-08 10:51:54 +01:00
Bryan Boreham	93494d8b7e	Add an OpenTracing span for each rule (#4027 ) * Add an OpenTracing span for each rule So that tags and child spans can be traced back to the rule that they refer to.	2018-03-30 21:29:19 +01:00
ferhat elmas	ec8e4d8a7c	all: remove unnecessary type conversions (#3992 ) excep promql due to not to create conflict with #3966.	2018-03-21 09:25:22 +00:00
Warren Fernandes	58e2a31db8	Cleans up test by removing unused function (#3969 )	2018-03-15 08:59:19 +00:00
ferhat elmas	ffa673f7d8	General simplifications (#3887 ) Another try as in #1516	2018-02-26 07:58:10 +00:00
Fabian Reinartz	7ccd4b39b8	*: implement query params This adds a parameter to the storage selection interface which allows query engine(s) to pass information about the operations surrounding a data selection. This can for example be used by remote storage backends to infer the correct downsampling aggregates that need to be provided.	2018-02-13 12:17:22 +01:00
Simon Pasquier	81c0ab69e0	Don't reset FiredAt for inactive alerts Otherwise AlertManager receives resolved alerts where StartsAt is zero which fails the validation.	2018-01-22 17:17:33 +01:00
Brian Brazil	30b4439bbd	Remove rule_type label from rule metrics. This is not really needed now that we have rule groups to distinguish rules.	2017-12-04 11:44:38 +00:00
Brian Brazil	b97f4cf48c	Add metrics for rule group interval and last duration.	2017-12-04 11:44:38 +00:00
Brian Brazil	0a42a9fc8f	Copy over rule group duration on reload. This is currently getting lost, this will soon be in a metric and we don't want it dropping to 0 on every reload.	2017-12-04 11:44:38 +00:00
Brian Brazil	aa370fa568	Clarify metric names around rule groups. Make it clear they're about overall rule groups.	2017-12-04 11:44:38 +00:00
Fabian Reinartz	62461379b7	rules: decouple notifier packages The dependency on the notifier packages caused a transitive dependency on discovery and with that all client libraries our service discovery uses.	2017-11-27 16:38:14 +01:00
Fabian Reinartz	4d964a0a0d	rules: make glob expansion a concern of main	2017-11-24 08:22:57 +01:00
Fabian Reinartz	bd9f7460eb	rules: remove config package dependency	2017-11-24 07:57:54 +01:00
Fabian Reinartz	2d0e3746ac	rules: remove dependency on promql.Engine	2017-11-24 07:57:54 +01:00
Fabian Reinartz	2ec5965b75	Merge pull request #3508 from prometheus/uptsdb update TSDB	2017-11-23 19:11:54 +01:00
Fabian Reinartz	83cd270ea4	*: adapt to storage interface changes	2017-11-23 19:05:04 +01:00
Goutham Veeramachaneni	a880c86375	Fix unexported method on exported interface. Also move to model.Duration Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-11-23 19:13:57 +05:30
conorbroderick	55aaece116	Add rule evaluation time	2017-11-22 15:22:02 +00:00
Goutham Veeramachaneni	e1117715fe	rules: remove skipped iterations cuz no throttling Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-11-14 17:33:00 +05:30
Jorge Hernández	6cd0f63eb1	Use testutil in rules subpackage (#3278 ) * Use testutil in rules subpackage * Fix manager test * Use testutil in rules subpackage * Fix manager test * Fix rebase * Change to testutil for applyConfig tests	2017-11-11 11:29:47 +01:00
Krasi Georgiev	e86d82ad2d	Fix regression of alert rules state loss on config reload. (#3382 ) * incorrect map name for the group prevented copying state from existing alert rules on config reload * applyConfig test * few nits * nits 2	2017-11-01 12:58:00 +01:00
Julius Volz	099df0c5f0	Migrate "golang.org/x/net/context" -> "context" (#3333 ) In some places, where ctxhttp or gRPC are concerned, we still need to use the old contexts.	2017-10-24 21:21:42 -07:00
Brian Brazil	cc5499fcad	Only close after checking for err.	2017-10-09 19:44:03 +01:00
Brian Brazil	ee88f0d222	Ensure all values are used or _	2017-10-09 19:44:03 +01:00
Fabian Reinartz	2d0b8e8b94	Merge branch 'master' into dev-2.0	2017-10-05 13:09:18 +02:00
Julius Volz	f7e8348a88	Re-add contexts to storage.Storage.Querier() (#3230 ) * Re-add contexts to storage.Storage.Querier() These are needed when replacing the storage by a multi-tenant implementation where the tenant is stored in the context. The 1.x query interfaces already had contexts, but they got lost in 2.x. * Convert promql.Engine to use native contexts	2017-10-04 21:04:15 +02:00
beorn7	c2e9a151ab	Make all rule links link to the "Console" tab rather than "Graph" Clicking on a rule, either the name or the expression, opens the rule result (or the corresponding expression, repsectively) in the expression browser. This should by default happen in the console tab, as, more often than not, displaying it in the graph tab runs into a timeout.	2017-09-21 18:28:00 +02:00
Fabian Reinartz	d21f149745	*: migrate to go-kit/log	2017-09-08 22:01:51 +05:30
Goutham Veeramachaneni	e1fc9dc78d	Move /rules to new format (#2901 ) Fixes #2891 Signed-off-by: Goutham Veeramachaneni <goutham@boomerangcommerce.com>	2017-07-08 11:38:02 +02:00
Goutham Veeramachaneni	37e7b69f56	Merge remote-tracking branch 'upstream/dev-2.0' into rulegroups Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-06-19 16:34:55 +05:30
Goutham Veeramachaneni	c472316fb3	Check done before every rule evaluation. Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-06-16 16:57:22 +05:30
Goutham Veeramachaneni	6b70a4d850	Incorporate PR feedback * Move fingerprint to Hash() * Move away from tsdb.MultiError * 0777 -> 0666 for files * checkOverflow of extra fields Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-06-16 16:44:33 +05:30
Goutham Veeramachaneni	507790a357	Rework logging to use explicitly passed logger Mostly cleaned up the global logger use. Still some uses in discovery package. Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-06-16 15:52:44 +05:30
Goutham Veeramachaneni	dc69645e92	Move back to go-yaml Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-06-16 10:46:21 +05:30
Goutham Veeramachaneni	5ff283a7b7	Reflect the grouping in the UI Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-06-14 16:09:14 +05:30
Goutham Veeramachaneni	8cca666cf2	Add file name to group. Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-06-14 15:18:39 +05:30
Goutham Veeramachaneni	e893c89333	Validate labels and annotations Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-06-14 15:07:54 +05:30
Goutham Veeramachaneni	a48a018368	Make sure groups are unique in a single file Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-06-14 12:19:21 +05:30
Goutham Veeramachaneni	cea1e99f78	Add update-rules command to promtool Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-06-14 11:38:54 +05:30
Goutham Veeramachaneni	e8f55669ea	Move rules to new format Signed-off-by: Goutham Veeramachaneni <goutham@boomerangcommerce.com>	2017-06-12 18:14:39 +05:30
Brian Brazil	dcea3e4773	Don't append a 0 when alert is no longer pending/firing With staleness we no longer need this behaviour.	2017-05-24 13:52:45 +01:00
Brian Brazil	cc867dae60	Copy previous series and alert state more intelligently. Usually rules don't more around, and if they do it's likely that rules/alerts with the same name stay in the same order. If rules/alerts with the same name are added/removed this could cause a blip for one cycle, but this is unavoidable without requiring rule and alert names to be unique - which we don't want to do.	2017-05-24 13:52:45 +01:00
Brian Brazil	9bc68db7e6	Track staleness per rule rather than per group.	2017-05-24 13:52:45 +01:00
Brian Brazil	0451d6d31b	Add unittest for rule staleness, and rules generally.	2017-05-24 13:52:45 +01:00
Brian Brazil	0400f3cfd2	Very basic staleness handling for rules.	2017-05-24 13:52:45 +01:00
Fabian Reinartz	06c2b76cd4	Merge branch 'master' into uptsdb	2017-05-16 16:48:37 +02:00
Alexey Palazhchenko	b0e1ea7c6c	Simplify code, fix typos. (#2719 )	2017-05-15 09:56:09 +01:00
Julius Volz	ac203ef0ee	Add externalURL template function (#2716 ) This allows users to e.g. add links back to the generating Prometheus right in their alert templates.	2017-05-13 15:47:04 +02:00
Julius Volz	fe11c5933a	Fix mutation of active alert elements by notifier (#2656 ) This caused the external label application in the notifier to bleed back into the rule manager's active alerting elements.	2017-04-26 10:29:42 -05:00
Fabian Reinartz	8ffc851147	Merge branch 'master' into dev-2.0	2017-04-04 15:17:56 +02:00
Tobias Schmidt	eaf33759fb	Register forgotten prometheus_evaluator_iterations_total metric	2017-04-02 20:32:56 -03:00
Tobias Schmidt	aaaba57184	Export number of missed rule evaluations In case the execution of all rules takes longer than the configured rule evaluation interval, one or more iterations will be skipped. This needs to be visible to the opterator.	2017-04-02 20:03:28 -03:00
Fabian Reinartz	5772f1a7ba	retrieval/storage: adapt to new interface This simplifies the interface to two add methods for appends with labels or faster reference numbers.	2017-02-02 13:05:46 +01:00
Fabian Reinartz	ad9bc62e4c	storage: extend appender and adapt it	2017-01-13 14:48:01 +01:00
Fabian Reinartz	e94b0899ee	rules: fix tests, remove model types	2016-12-29 17:31:14 +01:00
Fabian Reinartz	f8fc1f5bb2	*: migrate ingestion to new batch Appender	2016-12-29 11:03:56 +01:00
Fabian Reinartz	fecf9532b9	*: fix misc compile errors	2016-12-25 11:42:57 +01:00
Fabian Reinartz	622ece6273	*: fix recording tests, migrate matcher types	2016-12-25 11:12:57 +01:00
Fabian Reinartz	5817cb5bde	: migrate from model. to promql.* types	2016-12-25 00:37:46 +01:00
Fabian Reinartz	e68a3cf21f	rules: update annotations on each iteration	2016-11-22 15:43:07 +01:00
Jonathan Lange	d78dd3593d	Set evaluation interval on Group construction Prevents having object in invalid state, and allows users of public API to construct valid Groups.	2016-11-18 16:32:30 +00:00
Jonathan Lange	31fc357cd8	Make NewGroup and Group.Eval public Allows callers to execute evaluate lists of rules without first writing them to disk.	2016-11-18 16:25:58 +00:00
Jonathan Lange	2a2da40223	Make rule evaluation publicly available Means that a third-party can parse rules and run them with their own execution model.	2016-11-18 16:12:50 +00:00
Matt Bostock	926a5ab3dd	rules/manager.go: Fix race between reload and stop On one relatively large Prometheus instance (1.7M series), I noticed that upgrades were frequently resulting in Prometheus undergoing crash recovery on start-up. On closer examination, I found that Prometheus was panicking on shutdown. It seems that our configuration management (or misconfiguration thereof) is reloading Prometheus then immediately restarting it, which I suspect is causing this race: Sep 21 15:12:42 host systemd[1]: Reloading prometheus monitoring system. Sep 21 15:12:42 host prometheus[18734]: time="2016-09-21T15:12:42Z" level=info msg="Loading configuration file /etc/prometheus/config.yaml" source="main.go:221" Sep 21 15:12:42 host systemd[1]: Reloaded prometheus monitoring system. Sep 21 15:12:44 host systemd[1]: Stopping prometheus monitoring system... Sep 21 15:12:44 host prometheus[18734]: time="2016-09-21T15:12:44Z" level=warning msg="Received SIGTERM, exiting gracefully..." source="main.go:203" Sep 21 15:12:44 host prometheus[18734]: time="2016-09-21T15:12:44Z" level=info msg="See you next time!" source="main.go:210" Sep 21 15:12:44 host prometheus[18734]: time="2016-09-21T15:12:44Z" level=info msg="Stopping target manager..." source="targetmanager.go:90" Sep 21 15:12:52 host prometheus[18734]: time="2016-09-21T15:12:52Z" level=info msg="Checkpointing in-memory metrics and chunks..." source="persistence.go:548" Sep 21 15:12:56 host prometheus[18734]: time="2016-09-21T15:12:56Z" level=warning msg="Error on ingesting out-of-order samples" numDropped=1 source="scrape.go:467" Sep 21 15:12:56 host prometheus[18734]: time="2016-09-21T15:12:56Z" level=error msg="Error adding file watch for \"/etc/prometheus/targets\": no such file or directory" source="file.go:84" Sep 21 15:12:56 host prometheus[18734]: time="2016-09-21T15:12:56Z" level=error msg="Error adding file watch for \"/etc/prometheus/targets\": no such file or directory" source="file.go:84" Sep 21 15:13:01 host prometheus[18734]: time="2016-09-21T15:13:01Z" level=info msg="Stopping rule manager..." source="manager.go:366" Sep 21 15:13:01 host prometheus[18734]: time="2016-09-21T15:13:01Z" level=info msg="Rule manager stopped." source="manager.go:372" Sep 21 15:13:01 host prometheus[18734]: time="2016-09-21T15:13:01Z" level=info msg="Stopping notification handler..." source="notifier.go:325" Sep 21 15:13:01 host prometheus[18734]: time="2016-09-21T15:13:01Z" level=info msg="Stopping local storage..." source="storage.go:381" Sep 21 15:13:01 host prometheus[18734]: time="2016-09-21T15:13:01Z" level=info msg="Stopping maintenance loop..." source="storage.go:383" Sep 21 15:13:01 host prometheus[18734]: panic: close of closed channel Sep 21 15:13:01 host prometheus[18734]: goroutine 7686074 [running]: Sep 21 15:13:01 host prometheus[18734]: panic(0xba57a0, 0xc60c42b500) Sep 21 15:13:01 host prometheus[18734]: /usr/local/go/src/runtime/panic.go:500 +0x1a1 Sep 21 15:13:01 host prometheus[18734]: github.com/prometheus/prometheus/rules.(Manager).ApplyConfig.func1(0xc6645a9901, 0xc420271ef0, 0xc420338ed0, 0xc60c42b4f0, 0xc6645a9900) Sep 21 15:13:01 host prometheus[18734]: /home/build/packages/prometheus/tmp/build/gopath/src/github.com/prometheus/prometheus/rules/manager.go:412 +0x3c Sep 21 15:13:01 host prometheus[18734]: created by github.com/prometheus/prometheus/rules.(Manager).ApplyConfig Sep 21 15:13:01 host prometheus[18734]: /home/build/packages/prometheus/tmp/build/gopath/src/github.com/prometheus/prometheus/rules/manager.go:423 +0x56b Sep 21 15:13:03 host systemd[1]: prometheus.service: main process exited, code=exited, status=2/INVALIDARGUMENT	2016-09-21 22:03:02 +01:00
Julius Volz	c187308366	storage: Contextify storage interfaces. This is based on https://github.com/prometheus/prometheus/pull/1997. This adds contexts to the relevant Storage methods and already passes PromQL's new per-query context into the storage's query methods. The immediate motivation supporting multi-tenancy in Frankenstein, but this could also be used by Prometheus's normal local storage to support cancellations and timeouts at some point.	2016-09-19 16:29:07 +02:00
Julius Volz	ed5a0f0abe	promql: Allow per-query contexts. For Weaveworks' Frankenstein, we need to support multitenancy. In Frankenstein, we initially solved this without modifying the promql package at all: we constructed a new promql.Engine for every query and injected a storage implementation into that engine which would be primed to only collect data for a given user. This is problematic to upstream, however. Prometheus assumes that there is only one engine: the query concurrency gate is part of the engine, and the engine contains one central cancellable context to shut down all queries. Also, creating a new engine for every query seems like overkill. Thus, we want to be able to pass per-query contexts into a single engine. This change gets rid of the promql.Engine's built-in base context and allows passing in a per-query context instead. Central cancellation of all queries is still possible by deriving all passed-in contexts from one central one, but this is now the responsibility of the caller. The central query context is now created in main() and passed into the relevant components (web handler / API, rule manager). In a next step, the per-query context would have to be passed to the storage implementation, so that the storage can implement multi-tenancy or other features based on the contextual information.	2016-09-19 15:38:17 +02:00
beorn7	75bae065fd	Revert "Modify tests to adjust to reverting the /graph changes" This reverts commit `f1ea5bf232`. Part two necessary for reverting the /graph revert.	2016-09-03 21:08:33 +02:00
beorn7	f1ea5bf232	Modify tests to adjust to reverting the /graph changes These tests have been added after the /graph changes and therefore already test the new syntax. This commit has to be reverted together with the previous one to get back to the old new state. sigh	2016-09-02 14:12:31 +02:00
Julius Volz	fe7b8b7fd1	Add missing license header to alerting_test.go	2016-08-13 00:11:52 +02:00
Julius Volz	da7206ec29	Fix rule HTML escaping issues This was mentioned as part of https://github.com/prometheus/alertmanager/issues/452	2016-08-12 02:59:41 +02:00
Brian Brazil	6fc88d4b4d	Remove __name__ from alerts sent to AM. Fixes #1861	2016-08-01 23:32:41 +01:00
Dmitry Vorobev	273e457da4	web: return status code and error message for config resource	2016-07-15 10:15:24 +02:00
Brian Brazil	0509b0f2db	Expand alert templates at eval time. Fixes #1678 #1677	2016-07-12 17:13:55 +01:00
beorn7	064b57858e	Consistently use the `Seconds()` method for conversion of durations This also fixes one remaining case of recording integral numbers of seconds only for a metric, i.e. this will probably fix #1796.	2016-07-07 15:24:35 +02:00
Fabian Reinartz	f7ed2ff706	Merge pull request #1644 from prometheus/beorn7/logging Add missing logging of out-of-order samples	2016-05-20 05:52:00 -07:00
beorn7	b95c096a45	Fix style issues in rules/...	2016-05-19 16:59:53 +02:00
beorn7	45e5775f9b	Add missing logging of out-of-order samples So far, out-of-order samples during rule evaluation were not logged, and neither scrape health samples. The latter are unlikely to cause any errors. That's why I'm logging them always now. (It's alway highly irregular should it happen.) For rules, I have used the same plumbing as for samples, just with a different wording in the message to mark them as a result of rule evaluation.	2016-05-19 16:22:53 +02:00
beorn7	4b574e8a61	Switch chunk encoding to type 2 where it was hardcoded type 1 before The chunk encoding was hardcoded there because it mostly doesn't matter what encoding is chosen in that test. Since type 1 is battle-hardened enough, I'm switching to type 2 here so that we can catch unexpected problems as a byproduct. My expectation is that the chunk encoding doesn't matter anyway, as said, but then "unexpected problems" contains the word "unexpected".	2016-03-20 23:32:20 +01:00
Fabian Reinartz	d89c254849	Make copying alerting state safer. This considers static labels in the equality of alerts to avoid falsely copying state from a different alert definition with the same name across reloads. To be safe, it also copies the state map rather than just its pointer so that remaining collisions disappear after one evaluation interval.	2016-03-02 12:21:54 +01:00
Fabian Reinartz	bfa8aaa017	Rename notification to notifier	2016-03-01 12:39:08 +01:00
beorn7	663a1550d0	Fix the instrumentation fixes	2016-02-17 15:50:55 +01:00
Tobias Schmidt	f1f8317fa5	Fix detection of flapping alerts Alerts in the resolve retention period must be transitioned to the active state again when their condition is met.	2016-02-04 23:55:12 -05:00
Björn Rabenstein	9ea3897ea7	Merge pull request #1354 from prometheus/beorn7/storage Rework the way to communicate backpressure (AKA suspended ingestion)	2016-02-01 15:10:13 +01:00
beorn7	ec08c9a391	Rework the way to communicate backpressure (AKA suspended ingestion) This gives up on the idea to communicate throuh the Append() call (by either not returning as it is now or returning an error as suggested/explored elsewhere). Here I have added a Throttled() call, which has the advantage that it can be called before a whole _batch_ of Append()'s. Scrapes will happen completely or not at all. Same for rule group evaluations. That's a highly desired behavior (as discussed elsewhere). The code is even simpler now as the whole ingestion buffer could be removed. Logging of throttled mode has been streamlined and will create at most one message per minute.	2016-02-01 14:45:44 +01:00
beorn7	a7408bfb47	Unify duration parsing It's actually happening in several places (and for flags, we use the standard Go time.Duration...). This at least reduces all our home-grown parsing to one place (in model).	2016-01-29 15:41:50 +01:00
Fabian Reinartz	a6935024e1	Remove old WITH clause in alert printing	2016-01-26 15:45:27 +01:00
Fabian Reinartz	b0adfea8d5	Fix swapped constants, improve instrumentation	2016-01-21 12:15:29 +01:00
Fabian Reinartz	a8c38c3ac5	Don't log rule evaluation failure on shutdown	2016-01-18 17:34:25 +01:00
Fabian Reinartz	6eee86dce8	Terminate rule groups during initial sleep When an evaluation group runs initially, it waits a deterministic amount of time. During that time it also has to accept a termination singnal so shutdown doesn't hang during the first evaluation iteration after a configuration reload. Fixes #1307	2016-01-12 10:54:09 +01:00
Fabian Reinartz	26eb3ac2f8	Don't skip recording rule errors	2016-01-12 10:26:06 +01:00
Fabian Reinartz	37d80c4b25	Fix premature rule evaluation This commit prevents rule evaluation from starting until after the storage is ready.	2016-01-08 17:51:22 +01:00
Fabian Reinartz	0cf3c6a9ef	Add comments, rename a method	2015-12-23 12:29:28 +01:00
Fabian Reinartz	bf6abac8f4	Send resolved notifications	2015-12-17 15:42:26 +01:00
Fabian Reinartz	f69e668fc4	Improve rules/ instrumentation This commit adds a counter for the total number of rule evaluations and standardizes the units to seconds.	2015-12-17 15:42:26 +01:00
Fabian Reinartz	62075aa037	Reduce noisy no-alertmanager warning	2015-12-17 15:42:26 +01:00
Fabian Reinartz	52e5224f5a	Refactor rules/ package	2015-12-17 15:42:25 +01:00
Fabian Reinartz	e4fabe135a	Set StartsAt to time of first firing state	2015-12-17 11:36:58 +01:00

1 2 3 4 5 ...

391 Commits (f174ae1f0913dd6f5183cbfb2a1fff3c1f294ddc)