diff --git a/docs/configuration.md b/docs/configuration.md index 0fcca8578..4efd392c7 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -1,5 +1,6 @@ --- title: Configuration +sort_rank: 3 --- # Configuration diff --git a/docs/getting_started.md b/docs/getting_started.md index 112b4b1b7..a2518bd43 100644 --- a/docs/getting_started.md +++ b/docs/getting_started.md @@ -1,6 +1,6 @@ --- title: Getting started -sort_rank: 10 +sort_rank: 1 --- # Getting started diff --git a/docs/index.md b/docs/index.md index 8f4e3aabc..8641cd1b0 100644 --- a/docs/index.md +++ b/docs/index.md @@ -14,3 +14,4 @@ The documentation is available alongside all the project documentation at - [Installing](install.md) - [Getting started](getting_started.md) - [Configuration](configuration.md) +- [Querying](querying/basics.md) diff --git a/docs/installation.md b/docs/installation.md index 1f7648cf9..4d00edea6 100644 --- a/docs/installation.md +++ b/docs/installation.md @@ -1,8 +1,9 @@ --- -title: Installing +title: Installation +sort_rank: 2 --- -# Installing +# Installation ## Using pre-compiled binaries diff --git a/docs/querying/api.md b/docs/querying/api.md new file mode 100644 index 000000000..c23677a5a --- /dev/null +++ b/docs/querying/api.md @@ -0,0 +1,417 @@ +--- +title: HTTP API +sort_rank: 7 +--- + +# HTTP API + +The current stable HTTP API is reachable under `/api/v1` on a Prometheus +server. Any non-breaking additions will be added under that endpoint. + +## Format overview + +The API response format is JSON. Every successful API request returns a `2xx` +status code. + +Invalid requests that reach the API handlers return a JSON error object +and one of the following HTTP response codes: + +- `400 Bad Request` when parameters are missing or incorrect. +- `422 Unprocessable Entity` when an expression can't be executed + ([RFC4918](http://tools.ietf.org/html/rfc4918#page-78)). +- `503 Service Unavailable` when queries time out or abort. + +Other non-`2xx` codes may be returned for errors occurring before the API +endpoint is reached. + +The JSON response envelope format is as follows: + +``` +{ + "status": "success" | "error", + "data": , + + // Only set if status is "error". The data field may still hold + // additional data. + "errorType": "", + "error": "" +} +``` + +Input timestamps may be provided either in +[RFC3339](https://www.ietf.org/rfc/rfc3339.txt) format or as a Unix timestamp +in seconds, with optional decimal places for sub-second precision. Output +timestamps are always represented as Unix timestamps in seconds. + +Names of query parameters that may be repeated end with `[]`. + +`` placeholders refer to Prometheus [time series +selectors](basics.md#time-series-selectors) like `http_requests_total` or +`http_requests_total{method=~"^GET|POST$"}` and need to be URL-encoded. + +`` placeholders refer to Prometheus duration strings of the form +`[0-9]+[smhdwy]`. For example, `5m` refers to a duration of 5 minutes. + +## Expression queries + +Query language expressions may be evaluated at a single instant or over a range +of time. The sections below describe the API endpoints for each type of +expression query. + +### Instant queries + +The following endpoint evaluates an instant query at a single point in time: + +``` +GET /api/v1/query +``` + +URL query parameters: + +- `query=`: Prometheus expression query string. +- `time=`: Evaluation timestamp. Optional. +- `timeout=`: Evaluation timeout. Optional. Defaults to and + is capped by the value of the `-query.timeout` flag. + +The current server time is used if the `time` parameter is omitted. + +The `data` section of the query result has the following format: + +``` +{ + "resultType": "matrix" | "vector" | "scalar" | "string", + "result": +} +``` + +`` refers to the query result data, which has varying formats +depending on the `resultType`. See the [expression query result +formats](#expression-query-result-formats). + +The following example evaluates the expression `up` at the time +`2015-07-01T20:10:51.781Z`: + +```json +$ curl 'http://localhost:9090/api/v1/query?query=up&time=2015-07-01T20:10:51.781Z' +{ + "status" : "success", + "data" : { + "resultType" : "vector", + "result" : [ + { + "metric" : { + "__name__" : "up", + "job" : "prometheus", + "instance" : "localhost:9090" + }, + "value": [ 1435781451.781, "1" ] + }, + { + "metric" : { + "__name__" : "up", + "job" : "node", + "instance" : "localhost:9100" + }, + "value" : [ 1435781451.781, "0" ] + } + ] + } +} +``` + +### Range queries + +The following endpoint evaluates an expression query over a range of time: + +``` +GET /api/v1/query_range +``` + +URL query parameters: + +- `query=`: Prometheus expression query string. +- `start=`: Start timestamp. +- `end=`: End timestamp. +- `step=`: Query resolution step width. +- `timeout=`: Evaluation timeout. Optional. Defaults to and + is capped by the value of the `-query.timeout` flag. + +The `data` section of the query result has the following format: + +``` +{ + "resultType": "matrix", + "result": +} +``` + +For the format of the `` placeholder, see the [range-vector result +format](#range-vectors). + +The following example evaluates the expression `up` over a 30-second range with +a query resolution of 15 seconds. + +```json +$ curl 'http://localhost:9090/api/v1/query_range?query=up&start=2015-07-01T20:10:30.781Z&end=2015-07-01T20:11:00.781Z&step=15s' +{ + "status" : "success", + "data" : { + "resultType" : "matrix", + "result" : [ + { + "metric" : { + "__name__" : "up", + "job" : "prometheus", + "instance" : "localhost:9090" + }, + "values" : [ + [ 1435781430.781, "1" ], + [ 1435781445.781, "1" ], + [ 1435781460.781, "1" ] + ] + }, + { + "metric" : { + "__name__" : "up", + "job" : "node", + "instance" : "localhost:9091" + }, + "values" : [ + [ 1435781430.781, "0" ], + [ 1435781445.781, "0" ], + [ 1435781460.781, "1" ] + ] + } + ] + } +} +``` + +## Querying metadata + +### Finding series by label matchers + +The following endpoint returns the list of time series that match a certain label set. + +``` +GET /api/v1/series +``` + +URL query parameters: + +- `match[]=`: Repeated series selector argument that selects the + series to return. At least one `match[]` argument must be provided. +- `start=`: Start timestamp. +- `end=`: End timestamp. + +The `data` section of the query result consists of a list of objects that +contain the label name/value pairs which identify each series. + +The following example returns all series that match either of the selectors +`up` or `process_start_time_seconds{job="prometheus"}`: + +```json +$ curl -g 'http://localhost:9090/api/v1/series?match[]=up&match[]=process_start_time_seconds{job="prometheus"}' +{ + "status" : "success", + "data" : [ + { + "__name__" : "up", + "job" : "prometheus", + "instance" : "localhost:9090" + }, + { + "__name__" : "up", + "job" : "node", + "instance" : "localhost:9091" + }, + { + "__name__" : "process_start_time_seconds", + "job" : "prometheus", + "instance" : "localhost:9090" + } + ] +} +``` + +### Querying label values + +The following endpoint returns a list of label values for a provided label name: + +``` +GET /api/v1/label//values +``` + +The `data` section of the JSON response is a list of string label names. + +This example queries for all label values for the `job` label: + +```json +$ curl http://localhost:9090/api/v1/label/job/values +{ + "status" : "success", + "data" : [ + "node", + "prometheus" + ] +} +``` + +## Deleting series + +The following endpoint deletes matched series entirely from a Prometheus server: + +``` +DELETE /api/v1/series +``` + +URL query parameters: + +- `match[]=`: Repeated label matcher argument that selects the + series to delete. At least one `match[]` argument must be provided. + +The `data` section of the JSON response has the following format: + +``` +{ + "numDeleted": +} +``` + +The following example deletes all series that match either of the selectors +`up` or `process_start_time_seconds{job="prometheus"}`: + +```json +$ curl -XDELETE -g 'http://localhost:9090/api/v1/series?match[]=up&match[]=process_start_time_seconds{job="prometheus"}' +{ + "status" : "success", + "data" : { + "numDeleted" : 3 + } +} +``` + +## Expression query result formats + +Expression queries may return the following response values in the `result` +property of the `data` section. `` placeholders are numeric +sample values. JSON does not support special float values such as `NaN`, `Inf`, +and `-Inf`, so sample values are transferred as quoted JSON strings rather than +raw numbers. + +### Range vectors + +Range vectors are returned as result type `matrix`. The corresponding +`result` property has the following format: + +``` +[ + { + "metric": { "": "", ... }, + "values": [ [ , "" ], ... ] + }, + ... +] +``` + +### Instant vectors + +Instant vectors are returned as result type `vector`. The corresponding +`result` property has the following format: + +``` +[ + { + "metric": { "": "", ... }, + "value": [ , "" ] + }, + ... +] +``` + +### Scalars + +Scalar results are returned as result type `scalar`. The corresponding +`result` property has the following format: + +``` +[ , "" ] +``` + +### Strings + +String results are returned as result type `string`. The corresponding +`result` property has the following format: + +``` +[ , "" ] +``` + +## Targets + +> This API is experimental as it is intended to be extended with targets +> dropped due to relabelling in the future. + +The following endpoint returns an overview of the current state of the +Prometheus target discovery: + +``` +GET /api/v1/targets +``` + +Currently only the active targets are part of the response. + +```json +$ curl http://localhost:9090/api/v1/targets +{ + "status": "success", [3/11] + "data": { + "activeTargets": [ + { + "discoveredLabels": { + "__address__": "127.0.0.1:9090", + "__metrics_path__": "/metrics", + "__scheme__": "http", + "job": "prometheus" + }, + "labels": { + "instance": "127.0.0.1:9090", + "job": "prometheus" + }, + "scrapeUrl": "http://127.0.0.1:9090/metrics", + "lastError": "", + "lastScrape": "2017-01-17T15:07:44.723715405+01:00", + "health": "up" + } + ] + } +} +``` + +## Alertmanagers + +> This API is experimental as it is intended to be extended with Alertmanagers +> dropped due to relabelling in the future. + +The following endpoint returns an overview of the current state of the +Prometheus alertmanager discovery: + +``` +GET /api/v1/alertmanagers +``` + +Currently only the active Alertmanagers are part of the response. + +```json +$ curl http://localhost:9090/api/v1/alertmanagers +{ + "status": "success", + "data": { + "activeAlertmanagers": [ + { + "url": "http://127.0.0.1:9090/api/v1/alerts" + } + ] + } +} +``` diff --git a/docs/querying/basics.md b/docs/querying/basics.md new file mode 100644 index 000000000..f001c6d0d --- /dev/null +++ b/docs/querying/basics.md @@ -0,0 +1,215 @@ +--- +title: Querying basics +nav_title: Basics +sort_rank: 1 +--- + +# Querying Prometheus + +Prometheus provides a functional expression language that lets the user select +and aggregate time series data in real time. The result of an expression can +either be shown as a graph, viewed as tabular data in Prometheus's expression +browser, or consumed by external systems via the [HTTP API](api.md). + +## Examples + +This document is meant as a reference. For learning, it might be easier to +start with a couple of [examples](examples.md). + +## Expression language data types + +In Prometheus's expression language, an expression or sub-expression can +evaluate to one of four types: + +* **Instant vector** - a set of time series containing a single sample for each time series, all sharing the same timestamp +* **Range vector** - a set of time series containing a range of data points over time for each time series +* **Scalar** - a simple numeric floating point value +* **String** - a simple string value; currently unused + +Depending on the use-case (e.g. when graphing vs. displaying the output of an +expression), only some of these types are legal as the result from a +user-specified expression. For example, an expression that returns an instant +vector is the only type that can be directly graphed. + +## Literals + +### String literals + +Strings may be specified as literals in single quotes, double quotes or +backticks. + +PromQL follows the same [escaping rules as +Go](https://golang.org/ref/spec#String_literals). In single or double quotes a +backslash begins an escape sequence, which may be followed by `a`, `b`, `f`, +`n`, `r`, `t`, `v` or `\`. Specific characters can be provided using octal +(`\nnn`) or hexadecimal (`\xnn`, `\unnnn` and `\Unnnnnnnn`). + +No escaping is processed inside backticks. Unlike Go, Prometheus does not discard newlines inside backticks. + +Example: + + "this is a string" + 'these are unescaped: \n \\ \t' + `these are not unescaped: \n ' " \t` + +### Float literals + +Scalar float values can be literally written as numbers of the form +`[-](digits)[.(digits)]`. + + -2.43 + +## Time series Selectors + +### Instant vector selectors + +Instant vector selectors allow the selection of a set of time series and a +single sample value for each at a given timestamp (instant): in the simplest +form, only a metric name is specified. This results in an instant vector +containing elements for all time series that have this metric name. + +This example selects all time series that have the `http_requests_total` metric +name: + + http_requests_total + +It is possible to filter these time series further by appending a set of labels +to match in curly braces (`{}`). + +This example selects only those time series with the `http_requests_total` +metric name that also have the `job` label set to `prometheus` and their +`group` label set to `canary`: + + http_requests_total{job="prometheus",group="canary"} + +It is also possible to negatively match a label value, or to match label values +against regular expressions. The following label matching operators exist: + +* `=`: Select labels that are exactly equal to the provided string. +* `!=`: Select labels that are not equal to the provided string. +* `=~`: Select labels that regex-match the provided string (or substring). +* `!~`: Select labels that do not regex-match the provided string (or substring). + +For example, this selects all `http_requests_total` time series for `staging`, +`testing`, and `development` environments and HTTP methods other than `GET`. + + http_requests_total{environment=~"staging|testing|development",method!="GET"} + +Label matchers that match empty label values also select all time series that do +not have the specific label set at all. Regex-matches are fully anchored. + +Vector selectors must either specify a name or at least one label matcher +that does not match the empty string. The following expression is illegal: + + {job=~".*"} # Bad! + +In contrast, these expressions are valid as they both have a selector that does not +match empty label values. + + {job=~".+"} # Good! + {job=~".*",method="get"} # Good! + +Label matchers can also be applied to metric names by matching against the internal +`__name__` label. For example, the expression `http_requests_total` is equivalent to +`{__name__="http_requests_total"}`. Matchers other than `=` (`!=`, `=~`, `!~`) may also be used. +The following expression selects all metrics that have a name starting with `job:`: + + {__name__=~"^job:.*"} + +### Range Vector Selectors + +Range vector literals work like instant vector literals, except that they +select a range of samples back from the current instant. Syntactically, a range +duration is appended in square brackets (`[]`) at the end of a vector selector +to specify how far back in time values should be fetched for each resulting +range vector element. + +Time durations are specified as a number, followed immediately by one of the +following units: + +* `s` - seconds +* `m` - minutes +* `h` - hours +* `d` - days +* `w` - weeks +* `y` - years + +In this example, we select all the values we have recorded within the last 5 +minutes for all time series that have the metric name `http_requests_total` and +a `job` label set to `prometheus`: + + http_requests_total{job="prometheus"}[5m] + +### Offset modifier + +The `offset` modifier allows changing the time offset for individual +instant and range vectors in a query. + +For example, the following expression returns the value of +`http_requests_total` 5 minutes in the past relative to the current +query evaluation time: + + http_requests_total offset 5m + +Note that the `offset` modifier always needs to follow the selector +immediately, i.e. the following would be correct: + + sum(http_requests_total{method="GET"} offset 5m) // GOOD. + +While the following would be *incorrect*: + + sum(http_requests_total{method="GET"}) offset 5m // INVALID. + +The same works for range vectors. This returns the 5-minutes rate that +`http_requests_total` had a week ago: + + rate(http_requests_total[5m] offset 1w) + +## Operators + +Prometheus supports many binary and aggregation operators. These are described +in detail in the [expression language operators](operators.md) page. + +## Functions + +Prometheus supports several functions to operate on data. These are described +in detail in the [expression language functions](functions.md) page. + +## Gotchas + +### Interpolation and staleness + +When queries are run, timestamps at which to sample data are selected +independently of the actual present time series data. This is mainly to support +cases like aggregation (`sum`, `avg`, and so on), where multiple aggregated +time series do not exactly align in time. Because of their independence, +Prometheus needs to assign a value at those timestamps for each relevant time +series. It does so by simply taking the newest sample before this timestamp. + +If no stored sample is found (by default) 5 minutes before a sampling timestamp, +no value is assigned for this time series at this point in time. This +effectively means that time series "disappear" from graphs at times where their +latest collected sample is older than 5 minutes. + +NOTE: NOTE: Staleness and interpolation handling might change. See +https://github.com/prometheus/prometheus/issues/398 and +https://github.com/prometheus/prometheus/issues/581. + +### Avoiding slow queries and overloads + +If a query needs to operate on a very large amount of data, graphing it might +time out or overload the server or browser. Thus, when constructing queries +over unknown data, always start building the query in the tabular view of +Prometheus's expression browser until the result set seems reasonable +(hundreds, not thousands, of time series at most). Only when you have filtered +or aggregated your data sufficiently, switch to graph mode. If the expression +still takes too long to graph ad-hoc, pre-record it via a [recording +rule](rules.md#recording-rules). + +This is especially relevant for Prometheus's query language, where a bare +metric name selector like `api_http_requests_total` could expand to thousands +of time series with different labels. Also keep in mind that expressions which +aggregate over many time series will generate load on the server even if the +output is only a small number of time series. This is similar to how it would +be slow to sum all values of a column in a relational database, even if the +output value is only a single number. diff --git a/docs/querying/examples.md b/docs/querying/examples.md new file mode 100644 index 000000000..4e522ab85 --- /dev/null +++ b/docs/querying/examples.md @@ -0,0 +1,83 @@ +--- +title: Querying examples +nav_title: Examples +sort_rank: 4 +--- + +# Query examples + +## Simple time series selection + +Return all time series with the metric `http_requests_total`: + + http_requests_total + +Return all time series with the metric `http_requests_total` and the given +`job` and `handler` labels: + + http_requests_total{job="apiserver", handler="/api/comments"} + +Return a whole range of time (in this case 5 minutes) for the same vector, +making it a range vector: + + http_requests_total{job="apiserver", handler="/api/comments"}[5m] + +Note that an expression resulting in a range vector cannot be graphed directly, +but viewed in the tabular ("Console") view of the expression browser. + +Using regular expressions, you could select time series only for jobs whose +name match a certain pattern, in this case, all jobs that end with `server`. +Note that this does a substring match, not a full string match: + + http_requests_total{job=~"server$"} + +To select all HTTP status codes except 4xx ones, you could run: + + http_requests_total{status!~"^4..$"} + +## Using functions, operators, etc. + +Return the per-second rate for all time series with the `http_requests_total` +metric name, as measured over the last 5 minutes: + + rate(http_requests_total[5m]) + +Assuming that the `http_requests_total` time series all have the labels `job` +(fanout by job name) and `instance` (fanout by instance of the job), we might +want to sum over the rate of all instances, so we get fewer output time series, +but still preserve the `job` dimension: + + sum(rate(http_requests_total[5m])) by (job) + +If we have two different metrics with the same dimensional labels, we can apply +binary operators to them and elements on both sides with the same label set +will get matched and propagated to the output. For example, this expression +returns the unused memory in MiB for every instance (on a fictional cluster +scheduler exposing these metrics about the instances it runs): + + (instance_memory_limit_bytes - instance_memory_usage_bytes) / 1024 / 1024 + +The same expression, but summed by application, could be written like this: + + sum( + instance_memory_limit_bytes - instance_memory_usage_bytes + ) by (app, proc) / 1024 / 1024 + +If the same fictional cluster scheduler exposed CPU usage metrics like the +following for every instance: + + instance_cpu_time_ns{app="lion", proc="web", rev="34d0f99", env="prod", job="cluster-manager"} + instance_cpu_time_ns{app="elephant", proc="worker", rev="34d0f99", env="prod", job="cluster-manager"} + instance_cpu_time_ns{app="turtle", proc="api", rev="4d3a513", env="prod", job="cluster-manager"} + instance_cpu_time_ns{app="fox", proc="widget", rev="4d3a513", env="prod", job="cluster-manager"} + ... + +...we could get the top 3 CPU users grouped by application (`app`) and process +type (`proc`) like this: + + topk(3, sum(rate(instance_cpu_time_ns[5m])) by (app, proc)) + +Assuming this metric contains one time series per running instance, you could +count the number of running instances per application like this: + + count(instance_cpu_time_ns) by (app) diff --git a/docs/querying/functions.md b/docs/querying/functions.md new file mode 100644 index 000000000..74e674028 --- /dev/null +++ b/docs/querying/functions.md @@ -0,0 +1,408 @@ +--- +title: Query functions +nav_title: Functions +sort_rank: 3 +--- + +# Functions + +Some functions have default arguments, e.g. `year(v=vector(time()) +instant-vector)`. This means that there is one argument `v` which is an instant +vector, which if not provided it will default to the value of the expression +`vector(time())`. + +## `abs()` + +`abs(v instant-vector)` returns the input vector with all sample values converted to +their absolute value. + +## `absent()` + +`absent(v instant-vector)` returns an empty vector if the vector passed to it +has any elements and a 1-element vector with the value 1 if the vector passed to +it has no elements. + +This is useful for alerting on when no time series exist for a given metric name +and label combination. + +``` +absent(nonexistent{job="myjob"}) +# => {job="myjob"} + +absent(nonexistent{job="myjob",instance=~".*"}) +# => {job="myjob"} + +absent(sum(nonexistent{job="myjob"})) +# => {} +``` + +In the second example, `absent()` tries to be smart about deriving labels of the +1-element output vector from the input vector. + +## `ceil()` + +`ceil(v instant-vector)` rounds the sample values of all elements in `v` up to +the nearest integer. + +## `changes()` + +For each input time series, `changes(v range-vector)` returns the number of +times its value has changed within the provided time range as an instant +vector. + +## `clamp_max()` + +`clamp_max(v instant-vector, max scalar)` clamps the sample values of all +elements in `v` to have an upper limit of `max`. + +## `clamp_min()` + +`clamp_min(v instant-vector, min scalar)` clamps the sample values of all +elements in `v` to have a lower limit of `min`. + +## `count_scalar()` + +`count_scalar(v instant-vector)` returns the number of elements in a time series +vector as a scalar. This is in contrast to the `count()` +[aggregation operator](operators.md#aggregation-operators), which +always returns a vector (an empty one if the input vector is empty) and allows +grouping by labels via a `by` clause. + +## `day_of_month()` + +`day_of_month(v=vector(time()) instant-vector)` returns the day of the month +for each of the given times in UTC. Returned values are from 1 to 31. + +## `day_of_week()` + +`day_of_week(v=vector(time()) instant-vector)` returns the day of the week for +each of the given times in UTC. Returned values are from 0 to 6, where 0 means +Sunday etc. + +## `days_in_month()` + +`days_in_month(v=vector(time()) instant-vector)` returns number of days in the +month for each of the given times in UTC. Returned values are from 28 to 31. + +## `delta()` + +`delta(v range-vector)` calculates the difference between the +first and last value of each time series element in a range vector `v`, +returning an instant vector with the given deltas and equivalent labels. +The delta is extrapolated to cover the full time range as specified in +the range vector selector, so that it is possible to get a non-integer +result even if the sample values are all integers. + +The following example expression returns the difference in CPU temperature +between now and 2 hours ago: + +``` +delta(cpu_temp_celsius{host="zeus"}[2h]) +``` + +`delta` should only be used with gauges. + +## `deriv()` + +`deriv(v range-vector)` calculates the per-second derivative of the time series in a range +vector `v`, using [simple linear regression](http://en.wikipedia.org/wiki/Simple_linear_regression). + +`deriv` should only be used with gauges. + +## `drop_common_labels()` + +`drop_common_labels(instant-vector)` drops all labels that have the same name +and value across all series in the input vector. + +## `exp()` + +`exp(v instant-vector)` calculates the exponential function for all elements in `v`. +Special cases are: + +* `Exp(+Inf) = +Inf` +* `Exp(NaN) = NaN` + +## `floor()` + +`floor(v instant-vector)` rounds the sample values of all elements in `v` down +to the nearest integer. + +## `histogram_quantile()` + +`histogram_quantile(φ float, b instant-vector)` calculates the φ-quantile (0 ≤ φ +≤ 1) from the buckets `b` of a +[histogram](https://prometheus.io/docs/concepts/metric_types/#histogram). (See +[histograms and summaries](https://prometheus.io/docs/practices/histograms) for +a detailed explanation of φ-quantiles and the usage of the histogram metric type +in general.) The samples in `b` are the counts of observations in each bucket. +Each sample must have a label `le` where the label value denotes the inclusive +upper bound of the bucket. (Samples without such a label are silently ignored.) +The [histogram metric type](https://prometheus.io/docs/concepts/metric_types/#histogram) +automatically provides time series with the `_bucket` suffix and the appropriate +labels. + +Use the `rate()` function to specify the time window for the quantile +calculation. + +Example: A histogram metric is called `http_request_duration_seconds`. To +calculate the 90th percentile of request durations over the last 10m, use the +following expression: + + histogram_quantile(0.9, rate(http_request_duration_seconds_bucket[10m])) + +The quantile is calculated for each label combination in +`http_request_duration_seconds`. To aggregate, use the `sum()` aggregator +around the `rate()` function. Since the `le` label is required by +`histogram_quantile()`, it has to be included in the `by` clause. The following +expression aggregates the 90th percentile by `job`: + + histogram_quantile(0.9, sum(rate(http_request_duration_seconds_bucket[10m])) by (job, le)) + +To aggregate everything, specify only the `le` label: + + histogram_quantile(0.9, sum(rate(http_request_duration_seconds_bucket[10m])) by (le)) + +The `histogram_quantile()` function interpolates quantile values by +assuming a linear distribution within a bucket. The highest bucket +must have an upper bound of `+Inf`. (Otherwise, `NaN` is returned.) If +a quantile is located in the highest bucket, the upper bound of the +second highest bucket is returned. A lower limit of the lowest bucket +is assumed to be 0 if the upper bound of that bucket is greater than +0. In that case, the usual linear interpolation is applied within that +bucket. Otherwise, the upper bound of the lowest bucket is returned +for quantiles located in the lowest bucket. + +If `b` contains fewer than two buckets, `NaN` is returned. For φ < 0, `-Inf` is +returned. For φ > 1, `+Inf` is returned. + +## `holt_winters()` + +`holt_winters(v range-vector, sf scalar, tf scalar)` produces a smoothed value +for time series based on the range in `v`. The lower the smoothing factor `sf`, +the more importance is given to old data. The higher the trend factor `tf`, the +more trends in the data is considered. Both `sf` and `tf` must be between 0 and +1. + +`holt_winters` should only be used with gauges. + +## `hour()` + +`hour(v=vector(time()) instant-vector)` returns the hour of the day +for each of the given times in UTC. Returned values are from 0 to 23. + +## `idelta()` + +`idelta(v range-vector)` + +`idelta(v range-vector)` calculates the difference between the last two samples +in the range vector `v`, returning an instant vector with the given deltas and +equivalent labels. + +`idelta` should only be used with gauges. + +## `increase()` + +`increase(v range-vector)` calculates the increase in the +time series in the range vector. Breaks in monotonicity (such as counter +resets due to target restarts) are automatically adjusted for. The +increase is extrapolated to cover the full time range as specified +in the range vector selector, so that it is possible to get a +non-integer result even if a counter increases only by integer +increments. + +The following example expression returns the number of HTTP requests as measured +over the last 5 minutes, per time series in the range vector: + +``` +increase(http_requests_total{job="api-server"}[5m]) +``` + +`increase` should only be used with counters. It is syntactic sugar +for `rate(v)` multiplied by the number of seconds under the specified +time range window, and should be used primarily for human readability. +Use `rate` in recording rules so that increases are tracked consistently +on a per-second basis. + +## `irate()` + +`irate(v range-vector)` calculates the per-second instant rate of increase of +the time series in the range vector. This is based on the last two data points. +Breaks in monotonicity (such as counter resets due to target restarts) are +automatically adjusted for. + +The following example expression returns the per-second rate of HTTP requests +looking up to 5 minutes back for the two most recent data points, per time +series in the range vector: + +``` +irate(http_requests_total{job="api-server"}[5m]) +``` + +`irate` should only be used when graphing volatile, fast-moving counters. +Use `rate` for alerts and slow-moving counters, as brief changes +in the rate can reset the `FOR` clause and graphs consisting entirely of rare +spikes are hard to read. + +Note that when combining `irate()` with an +[aggregation operator](operators.md#aggregation-operators) (e.g. `sum()`) +or a function aggregating over time (any function ending in `_over_time`), +always take a `irate()` first, then aggregate. Otherwise `irate()` cannot detect +counter resets when your target restarts. + +## `label_join()` + +For each timeseries in `v`, `label_join(v instant-vector, dst_label string, separator string, src_label_1 string, src_label_2 string, ...)` joins all the values of all the `src_labels` +using `separator` and returns the timeseries with the label `dst_label` containing the joined value. +There can be any number of `src_labels` in this function. + +This example will return a vector with each time series having a `foo` label with the value `a,b,c` added to it: + +``` +label_join(up{job="api-server",src1="a",src2="b",src3="c"}, "foo", ",", "src1", "src2", "src3") +``` + +## `label_replace()` + +For each timeseries in `v`, `label_replace(v instant-vector, dst_label string, +replacement string, src_label string, regex string)` matches the regular +expression `regex` against the label `src_label`. If it matches, then the +timeseries is returned with the label `dst_label` replaced by the expansion of +`replacement`. `$1` is replaced with the first matching subgroup, `$2` with the +second etc. If the regular expression doesn't match then the timeseries is +returned unchanged. + +This example will return a vector with each time series having a `foo` +label with the value `a` added to it: + +``` +label_replace(up{job="api-server",service="a:c"}, "foo", "$1", "service", "(.*):.*") +``` + +## `ln()` + +`ln(v instant-vector)` calculates the natural logarithm for all elements in `v`. +Special cases are: + +* `ln(+Inf) = +Inf` +* `ln(0) = -Inf` +* `ln(x < 0) = NaN` +* `ln(NaN) = NaN` + +## `log2()` + +`log2(v instant-vector)` calculates the binary logarithm for all elements in `v`. +The special cases are equivalent to those in `ln`. + +## `log10()` + +`log10(v instant-vector)` calculates the decimal logarithm for all elements in `v`. +The special cases are equivalent to those in `ln`. + +## `minute()` + +`minute(v=vector(time()) instant-vector)` returns the minute of the hour for each +of the given times in UTC. Returned values are from 0 to 59. + +## `month()` + +`month(v=vector(time()) instant-vector)` returns the month of the year for each +of the given times in UTC. Returned values are from 1 to 12, where 1 means +January etc. + +## `predict_linear()` + +`predict_linear(v range-vector, t scalar)` predicts the value of time series +`t` seconds from now, based on the range vector `v`, using [simple linear +regression](http://en.wikipedia.org/wiki/Simple_linear_regression). + +`predict_linear` should only be used with gauges. + +## `rate()` + +`rate(v range-vector)` calculates the per-second average rate of increase of the +time series in the range vector. Breaks in monotonicity (such as counter +resets due to target restarts) are automatically adjusted for. Also, the +calculation extrapolates to the ends of the time range, allowing for missed +scrapes or imperfect alignment of scrape cycles with the range's time period. + +The following example expression returns the per-second rate of HTTP requests as measured +over the last 5 minutes, per time series in the range vector: + +``` +rate(http_requests_total{job="api-server"}[5m]) +``` + +`rate` should only be used with counters. It is best suited for alerting, +and for graphing of slow-moving counters. + +Note that when combining `rate()` with an aggregation operator (e.g. `sum()`) +or a function aggregating over time (any function ending in `_over_time`), +always take a `rate()` first, then aggregate. Otherwise `rate()` cannot detect +counter resets when your target restarts. + +## `resets()` + +For each input time series, `resets(v range-vector)` returns the number of +counter resets within the provided time range as an instant vector. Any +decrease in the value between two consecutive samples is interpreted as a +counter reset. + +`resets` should only be used with counters. + +## `round()` + +`round(v instant-vector, to_nearest=1 scalar)` rounds the sample values of all +elements in `v` to the nearest integer. Ties are resolved by rounding up. The +optional `to_nearest` argument allows specifying the nearest multiple to which +the sample values should be rounded. This multiple may also be a fraction. + +## `scalar()` + +Given a single-element input vector, `scalar(v instant-vector)` returns the +sample value of that single element as a scalar. If the input vector does not +have exactly one element, `scalar` will return `NaN`. + +## `sort()` + +`sort(v instant-vector)` returns vector elements sorted by their sample values, +in ascending order. + +## `sort_desc()` + +Same as `sort`, but sorts in descending order. + +## `sqrt()` + +`sqrt(v instant-vector)` calculates the square root of all elements in `v`. + +## `time()` + +`time()` returns the number of seconds since January 1, 1970 UTC. Note that +this does not actually return the current time, but the time at which the +expression is to be evaluated. + +## `vector()` + +`vector(s scalar)` returns the scalar `s` as a vector with no labels. + +## `year()` + +`year(v=vector(time()) instant-vector)` returns the year +for each of the given times in UTC. + +## `_over_time()` + +The following functions allow aggregating each series of a given range vector +over time and return an instant vector with per-series aggregation results: + +* `avg_over_time(range-vector)`: the average value of all points in the specified interval. +* `min_over_time(range-vector)`: the minimum value of all points in the specified interval. +* `max_over_time(range-vector)`: the maximum value of all points in the specified interval. +* `sum_over_time(range-vector)`: the sum of all values in the specified interval. +* `count_over_time(range-vector)`: the count of all values in the specified interval. +* `quantile_over_time(scalar, range-vector)`: the φ-quantile (0 ≤ φ ≤ 1) of the values in the specified interval. +* `stddev_over_time(range-vector)`: the population standard deviation of the values in the specified interval. +* `stdvar_over_time(range-vector)`: the population standard variance of the values in the specified interval. + +Note that all values in the specified interval have the same weight in the +aggregation even if the values are not equally spaced throughout the interval. diff --git a/docs/querying/index.md b/docs/querying/index.md new file mode 100644 index 000000000..1566750e8 --- /dev/null +++ b/docs/querying/index.md @@ -0,0 +1,4 @@ +--- +title: Querying +sort_rank: 4 +--- diff --git a/docs/querying/operators.md b/docs/querying/operators.md new file mode 100644 index 000000000..7aa7a6b79 --- /dev/null +++ b/docs/querying/operators.md @@ -0,0 +1,250 @@ +--- +title: Operators +sort_rank: 2 +--- + +# Operators + +## Binary operators + +Prometheus's query language supports basic logical and arithmetic operators. +For operations between two instant vectors, the [matching behavior](#vector-matching) +can be modified. + +### Arithmetic binary operators + +The following binary arithmetic operators exist in Prometheus: + +* `+` (addition) +* `-` (subtraction) +* `*` (multiplication) +* `/` (division) +* `%` (modulo) +* `^` (power/exponentiation) + +Binary arithmetic operators are defined between scalar/scalar, vector/scalar, +and vector/vector value pairs. + +**Between two scalars**, the behavior is obvious: they evaluate to another +scalar that is the result of the operator applied to both scalar operands. + +**Between an instant vector and a scalar**, the operator is applied to the +value of every data sample in the vector. E.g. if a time series instant vector +is multiplied by 2, the result is another vector in which every sample value of +the original vector is multiplied by 2. + +**Between two instant vectors**, a binary arithmetic operator is applied to +each entry in the left-hand-side vector and its [matching element](#vector-matching) +in the right hand vector. The result is propagated into the result vector and the metric +name is dropped. Entries for which no matching entry in the right-hand vector can be +found are not part of the result. + +### Comparison binary operators + +The following binary comparison operators exist in Prometheus: + +* `==` (equal) +* `!=` (not-equal) +* `>` (greater-than) +* `<` (less-than) +* `>=` (greater-or-equal) +* `<=` (less-or-equal) + +Comparison operators are defined between scalar/scalar, vector/scalar, +and vector/vector value pairs. By default they filter. Their behaviour can be +modified by providing `bool` after the operator, which will return `0` or `1` +for the value rather than filtering. + +**Between two scalars**, the `bool` modifier must be provided and these +operators result in another scalar that is either `0` (`false`) or `1` +(`true`), depending on the comparison result. + +**Between an instant vector and a scalar**, these operators are applied to the +value of every data sample in the vector, and vector elements between which the +comparison result is `false` get dropped from the result vector. If the `bool` +modifier is provided, vector elements that would be dropped instead have the value +`0` and vector elements that would be kept have the value `1`. + +**Between two instant vectors**, these operators behave as a filter by default, +applied to matching entries. Vector elements for which the expression is not +true or which do not find a match on the other side of the expression get +dropped from the result, while the others are propagated into a result vector +with their original (left-hand-side) metric names and label values. +If the `bool` modifier is provided, vector elements that would have been +dropped instead have the value `0` and vector elements that would be kept have +the value `1` with the left-hand-side metric names and label values. + +### Logical/set binary operators + +These logical/set binary operators are only defined between instant vectors: + +* `and` (intersection) +* `or` (union) +* `unless` (complement) + +`vector1 and vector2` results in a vector consisting of the elements of +`vector1` for which there are elements in `vector2` with exactly matching +label sets. Other elements are dropped. The metric name and values are carried +over from the left-hand-side vector. + +`vector1 or vector2` results in a vector that contains all original elements +(label sets + values) of `vector1` and additionally all elements of `vector2` +which do not have matching label sets in `vector1`. + +`vector1 unless vector2` results in a vector consisting of the elements of +`vector1` for which there are no elements in `vector2` with exactly matching +label sets. All matching elements in both vectors are dropped. + +## Vector matching + +Operations between vectors attempt to find a matching element in the right-hand-side +vector for each entry in the left-hand side. There are two basic types of +matching behavior: + +**One-to-one** finds a unique pair of entries from each side of the operation. +In the default case, that is an operation following the format `vector1 vector2`. +Two entries match if they have the exact same set of labels and corresponding values. +The `ignoring` keyword allows ignoring certain labels when matching, while the +`on` keyword allows reducing the set of considered labels to a provided list: + + ignoring(