mirror of https://github.com/prometheus/prometheus
Consolidate configuration and rules docs in docs/configuration/
parent
4d30a11ab6
commit
f432b8176d
|
@ -0,0 +1,98 @@
|
|||
---
|
||||
title: Alerting rules
|
||||
sort_rank: 3
|
||||
---
|
||||
|
||||
# Alerting rules
|
||||
|
||||
Alerting rules allow you to define alert conditions based on Prometheus
|
||||
expression language expressions and to send notifications about firing alerts
|
||||
to an external service. Whenever the alert expression results in one or more
|
||||
vector elements at a given point in time, the alert counts as active for these
|
||||
elements' label sets.
|
||||
|
||||
Alerting rules are configured in Prometheus in the same way as [recording
|
||||
rules](recording_rules.md).
|
||||
|
||||
### Defining alerting rules
|
||||
|
||||
Alerting rules are defined in the following syntax:
|
||||
|
||||
ALERT <alert name>
|
||||
IF <expression>
|
||||
[ FOR <duration> ]
|
||||
[ LABELS <label set> ]
|
||||
[ ANNOTATIONS <label set> ]
|
||||
|
||||
The alert name must be a valid metric name.
|
||||
|
||||
The optional `FOR` clause causes Prometheus to wait for a certain duration
|
||||
between first encountering a new expression output vector element (like an
|
||||
instance with a high HTTP error rate) and counting an alert as firing for this
|
||||
element. Elements that are active, but not firing yet, are in pending state.
|
||||
|
||||
The `LABELS` clause allows specifying a set of additional labels to be attached
|
||||
to the alert. Any existing conflicting labels will be overwritten. The label
|
||||
values can be templated.
|
||||
|
||||
The `ANNOTATIONS` clause specifies another set of labels that are not
|
||||
identifying for an alert instance. They are used to store longer additional
|
||||
information such as alert descriptions or runbook links. The annotation values
|
||||
can be templated.
|
||||
|
||||
#### Templating
|
||||
|
||||
Label and annotation values can be templated using [console templates](https://prometheus.io/docs/visualization/consoles).
|
||||
The `$labels` variable holds the label key/value pairs of an alert instance
|
||||
and `$value` holds the evaluated value of an alert instance.
|
||||
|
||||
# To insert a firing element's label values:
|
||||
{{ $labels.<labelname> }}
|
||||
# To insert the numeric expression value of the firing element:
|
||||
{{ $value }}
|
||||
|
||||
Examples:
|
||||
|
||||
# Alert for any instance that is unreachable for >5 minutes.
|
||||
ALERT InstanceDown
|
||||
IF up == 0
|
||||
FOR 5m
|
||||
LABELS { severity = "page" }
|
||||
ANNOTATIONS {
|
||||
summary = "Instance {{ $labels.instance }} down",
|
||||
description = "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes.",
|
||||
}
|
||||
|
||||
# Alert for any instance that have a median request latency >1s.
|
||||
ALERT APIHighRequestLatency
|
||||
IF api_http_request_latencies_second{quantile="0.5"} > 1
|
||||
FOR 1m
|
||||
ANNOTATIONS {
|
||||
summary = "High request latency on {{ $labels.instance }}",
|
||||
description = "{{ $labels.instance }} has a median request latency above 1s (current value: {{ $value }}s)",
|
||||
}
|
||||
|
||||
### Inspecting alerts during runtime
|
||||
|
||||
To manually inspect which alerts are active (pending or firing), navigate to
|
||||
the "Alerts" tab of your Prometheus instance. This will show you the exact
|
||||
label sets for which each defined alert is currently active.
|
||||
|
||||
For pending and firing alerts, Prometheus also stores synthetic time series of
|
||||
the form `ALERTS{alertname="<alert name>", alertstate="pending|firing", <additional alert labels>}`.
|
||||
The sample value is set to `1` as long as the alert is in the indicated active
|
||||
(pending or firing) state, and a single `0` value gets written out when an alert
|
||||
transitions from active to inactive state. Once inactive, the time series does
|
||||
not get further updates.
|
||||
|
||||
### Sending alert notifications
|
||||
|
||||
Prometheus's alerting rules are good at figuring what is broken *right now*, but
|
||||
they are not a fully-fledged notification solution. Another layer is needed to
|
||||
add summarization, notification rate limiting, silencing and alert dependencies
|
||||
on top of the simple alert definitions. In Prometheus's ecosystem, the
|
||||
[Alertmanager](https://prometheus.io/docs/alerting/alertmanager/) takes on this
|
||||
role. Thus, Prometheus may be configured to periodically send information about
|
||||
alert states to an Alertmanager instance, which then takes care of dispatching
|
||||
the right notifications. The Alertmanager instance may be configured via the
|
||||
`-alertmanager.url` command line flag.
|
|
@ -1,6 +1,6 @@
|
|||
---
|
||||
title: Configuration
|
||||
sort_rank: 3
|
||||
sort_rank: 1
|
||||
---
|
||||
|
||||
# Configuration
|
||||
|
@ -10,7 +10,7 @@ the command-line flags configure immutable system parameters (such as storage
|
|||
locations, amount of data to keep on disk and in memory, etc.), the
|
||||
configuration file defines everything related to scraping [jobs and their
|
||||
instances](https://prometheus.io/docs/concepts/jobs_instances/), as well as
|
||||
which [rule files to load](querying/rules.md#configuring-rules).
|
||||
which [rule files to load](recording_rules.md#configuring-rules).
|
||||
|
||||
To view all available command-line flags, run `prometheus -h`.
|
||||
|
|
@ -0,0 +1,4 @@
|
|||
---
|
||||
title: Configuration
|
||||
sort_rank: 3
|
||||
---
|
|
@ -1,6 +1,6 @@
|
|||
---
|
||||
title: Recording rules
|
||||
sort_rank: 6
|
||||
sort_rank: 2
|
||||
---
|
||||
|
||||
# Defining recording rules
|
||||
|
@ -9,10 +9,9 @@ sort_rank: 6
|
|||
|
||||
Prometheus supports two types of rules which may be configured and then
|
||||
evaluated at regular intervals: recording rules and [alerting
|
||||
rules](https://prometheus.io/docs/alerting/rules/). To include rules in
|
||||
Prometheus, create a file containing the necessary rule statements and have
|
||||
Prometheus load the file via the `rule_files` field in the [Prometheus
|
||||
configuration](../configuration.md).
|
||||
rules](alerting_rules.md). To include rules in Prometheus, create a file
|
||||
containing the necessary rule statements and have Prometheus load the file via
|
||||
the `rule_files` field in the [Prometheus configuration](configuration.md).
|
||||
|
||||
The rule files can be reloaded at runtime by sending `SIGHUP` to the Prometheus
|
||||
process. The changes are only applied if all rule files are well-formatted.
|
|
@ -56,7 +56,7 @@ scrape_configs:
|
|||
```
|
||||
|
||||
For a complete specification of configuration options, see the
|
||||
[configuration documentation](configuration.md).
|
||||
[configuration documentation](configuration/configuration.md).
|
||||
|
||||
## Starting Prometheus
|
||||
|
||||
|
|
|
@ -13,7 +13,7 @@ The documentation is available alongside all the project documentation at
|
|||
|
||||
- [Installing](install.md)
|
||||
- [Getting started](getting_started.md)
|
||||
- [Configuration](configuration.md)
|
||||
- [Configuration](configuration/configuration.md)
|
||||
- [Querying](querying/basics.md)
|
||||
- [Storage](storage.md)
|
||||
- [Federation](federation.md)
|
||||
|
|
|
@ -204,7 +204,7 @@ Prometheus's expression browser until the result set seems reasonable
|
|||
(hundreds, not thousands, of time series at most). Only when you have filtered
|
||||
or aggregated your data sufficiently, switch to graph mode. If the expression
|
||||
still takes too long to graph ad-hoc, pre-record it via a [recording
|
||||
rule](rules.md#recording-rules).
|
||||
rule](../configuration/recording_rules.md#recording-rules).
|
||||
|
||||
This is especially relevant for Prometheus's query language, where a bare
|
||||
metric name selector like `api_http_requests_total` could expand to thousands
|
||||
|
|
|
@ -160,7 +160,7 @@ in the next section.
|
|||
|
||||
Case (3) depends on the targets you monitor. To mitigate an unplanned explosion
|
||||
of the number of series, you can limit the number of samples per individual
|
||||
scrape (see `sample_limit` in the [scrape config](configuration.md#scrape_config)).
|
||||
scrape (see `sample_limit` in the [scrape config](configuration/configuration.md#scrape_config)).
|
||||
If the number of active time series exceeds the number of memory chunks the
|
||||
Prometheus server can afford, the server will quickly throttle ingestion as
|
||||
described above. The only way out of this is to give Prometheus more RAM or
|
||||
|
|
Loading…
Reference in New Issue