diff --git a/docs/configuration.md b/docs/configuration.md new file mode 100644 index 000000000..67f214a02 --- /dev/null +++ b/docs/configuration.md @@ -0,0 +1,1141 @@ +--- +title: Configuration +sort_rank: 20 +--- + +# Configuration + +Prometheus is configured via command-line flags and a configuration file. While +the command-line flags configure immutable system parameters (such as storage +locations, amount of data to keep on disk and in memory, etc.), the +configuration file defines everything related to scraping [jobs and their +instances](https://prometheus.io/docs/concepts/jobs_instances/), as well as +which [rule files to load](querying/rules.md#configuring-rules). + +To view all available command-line flags, run `prometheus -h`. + +Prometheus can reload its configuration at runtime. If the new configuration +is not well-formed, the changes will not be applied. +A configuration reload is triggered by sending a `SIGHUP` to the Prometheus process or +sending a HTTP POST request to the `/-/reload` endpoint. +This will also reload any configured rule files. + +## Configuration file + +To specify which configuration file to load, use the `-config.file` flag. + +The file is written in [YAML format](http://en.wikipedia.org/wiki/YAML), +defined by the scheme described below. +Brackets indicate that a parameter is optional. For non-list parameters the +value is set to the specified default. + +Generic placeholders are defined as follows: + +* ``: a boolean that can take the values `true` or `false` +* ``: a duration matching the regular expression `[0-9]+(ms|[smhdwy])` +* ``: a string matching the regular expression `[a-zA-Z_][a-zA-Z0-9_]*` +* ``: a string of unicode characters +* ``: a valid path in the current working directory +* ``: a valid string consisting of a hostname or IP followed by an optional port number +* ``: a valid URL path +* ``: a string that can take the values `http` or `https` +* ``: a regular string +* ``: a regular string that is a secret, such as a password + +The other placeholders are specified separately. + +A valid example file can be found [here](/config/testdata/conf.good.yml). + +The global configuration specifies parameters that are valid in all other configuration +contexts. They also serve as defaults for other configuration sections. + +```yaml +global: + # How frequently to scrape targets by default. + [ scrape_interval: | default = 1m ] + + # How long until a scrape request times out. + [ scrape_timeout: | default = 10s ] + + # How frequently to evaluate rules. + [ evaluation_interval: | default = 1m ] + + # The labels to add to any time series or alerts when communicating with + # external systems (federation, remote storage, Alertmanager). + external_labels: + [ : ... ] + +# Rule files specifies a list of globs. Rules and alerts are read from +# all matching files. +rule_files: + [ - ... ] + +# A list of scrape configurations. +scrape_configs: + [ - ... ] + +# Alerting specifies settings related to the Alertmanager. +alerting: + alert_relabel_configs: + [ - ... ] + alertmanagers: + [ - ... ] + +# Settings related to the experimental remote write feature. +remote_write: + [ - ... ] + +# Settings related to the experimental remote read feature. +remote_read: + [ - ... ] +``` + +### `` + +A `scrape_config` section specifies a set of targets and parameters describing how +to scrape them. In the general case, one scrape configuration specifies a single +job. In advanced configurations, this may change. + +Targets may be statically configured via the `static_configs` parameter or +dynamically discovered using one of the supported service-discovery mechanisms. + +Additionally, `relabel_configs` allow advanced modifications to any +target and its labels before scraping. + +```yaml +# The job name assigned to scraped metrics by default. +job_name: + +# How frequently to scrape targets from this job. +[ scrape_interval: | default = ] + +# Per-scrape timeout when scraping this job. +[ scrape_timeout: | default = ] + +# The HTTP resource path on which to fetch metrics from targets. +[ metrics_path: | default = /metrics ] + +# honor_labels controls how Prometheus handles conflicts between labels that are +# already present in scraped data and labels that Prometheus would attach +# server-side ("job" and "instance" labels, manually configured target +# labels, and labels generated by service discovery implementations). +# +# If honor_labels is set to "true", label conflicts are resolved by keeping label +# values from the scraped data and ignoring the conflicting server-side labels. +# +# If honor_labels is set to "false", label conflicts are resolved by renaming +# conflicting labels in the scraped data to "exported_" (for +# example "exported_instance", "exported_job") and then attaching server-side +# labels. This is useful for use cases such as federation, where all labels +# specified in the target should be preserved. +# +# Note that any globally configured "external_labels" are unaffected by this +# setting. In communication with external systems, they are always applied only +# when a time series does not have a given label yet and are ignored otherwise. +[ honor_labels: | default = false ] + +# Configures the protocol scheme used for requests. +[ scheme: | default = http ] + +# Optional HTTP URL parameters. +params: + [ : [, ...] ] + +# Sets the `Authorization` header on every scrape request with the +# configured username and password. +basic_auth: + [ username: ] + [ password: ] + +# Sets the `Authorization` header on every scrape request with +# the configured bearer token. It is mutually exclusive with `bearer_token_file`. +[ bearer_token: ] + +# Sets the `Authorization` header on every scrape request with the bearer token +# read from the configured file. It is mutually exclusive with `bearer_token`. +[ bearer_token_file: /path/to/bearer/token/file ] + +# Configures the scrape request's TLS settings. +tls_config: + [ ] + +# Optional proxy URL. +[ proxy_url: ] + +# List of Azure service discovery configurations. +azure_sd_configs: + [ - ... ] + +# List of Consul service discovery configurations. +consul_sd_configs: + [ - ... ] + +# List of DNS service discovery configurations. +dns_sd_configs: + [ - ... ] + +# List of EC2 service discovery configurations. +ec2_sd_configs: + [ - ... ] + +# List of OpenStack service discovery configurations. +openstack_sd_configs: + [ - ... ] + +# List of file service discovery configurations. +file_sd_configs: + [ - ... ] + +# List of GCE service discovery configurations. +gce_sd_configs: + [ - ... ] + +# List of Kubernetes service discovery configurations. +kubernetes_sd_configs: + [ - ... ] + +# List of Marathon service discovery configurations. +marathon_sd_configs: + [ - ... ] + +# List of AirBnB's Nerve service discovery configurations. +nerve_sd_configs: + [ - ... ] + +# List of Zookeeper Serverset service discovery configurations. +serverset_sd_configs: + [ - ... ] + +# List of Triton service discovery configurations. +triton_sd_configs: + [ - ... ] + +# List of labeled statically configured targets for this job. +static_configs: + [ - ... ] + +# List of target relabel configurations. +relabel_configs: + [ - ... ] + +# List of metric relabel configurations. +metric_relabel_configs: + [ - ... ] + +# Per-scrape limit on number of scraped samples that will be accepted. +# If more than this number of samples are present after metric relabelling +# the entire scrape will be treated as failed. 0 means no limit. +[ sample_limit: | default = 0 ] +``` + +Where `` must be unique across all scrape configurations. + +### `` + +A `tls_config` allows configuring TLS connections. + +```yaml +# CA certificate to validate API server certificate with. +[ ca_file: ] + +# Certificate and key files for client cert authentication to the server. +[ cert_file: ] +[ key_file: ] + +# ServerName extension to indicate the name of the server. +# http://tools.ietf.org/html/rfc4366#section-3.1 +[ server_name: ] + +# Disable validation of the server certificate. +[ insecure_skip_verify: ] +``` + +### `` + +CAUTION: Azure SD is in beta: breaking changes to configuration are still +likely in future releases. + +Azure SD configurations allow retrieving scrape targets from Azure VMs. + +The following meta labels are available on targets during relabeling: + +* `__meta_azure_machine_id`: the machine ID +* `__meta_azure_machine_location`: the location the machine runs in +* `__meta_azure_machine_name`: the machine name +* `__meta_azure_machine_private_ip`: the machine's private IP +* `__meta_azure_machine_resource_group`: the machine's resource group +* `__meta_azure_machine_tag_`: each tag value of the machine + +See below for the configuration options for Azure discovery: + +```yaml +# The information to access the Azure API. +# The subscription ID. +subscription_id: +# The tenant ID. +tenant_id: +# The client ID. +client_id: +# The client secret. +client_secret: + +# Refresh interval to re-read the instance list. +[ refresh_interval: | default = 300s ] + +# The port to scrape metrics from. If using the public IP address, this must +# instead be specified in the relabeling rule. +[ port: | default = 80 ] +``` + +### `` + +Consul SD configurations allow retrieving scrape targets from [Consul's](https://www.consul.io) +Catalog API. + +The following meta labels are available on targets during [relabeling](#relabel_config): + +* `__meta_consul_address`: the address of the target +* `__meta_consul_dc`: the datacenter name for the target +* `__meta_consul_node`: the node name defined for the target +* `__meta_consul_service_address`: the service address of the target +* `__meta_consul_service_id`: the service ID of the target +* `__meta_consul_service_port`: the service port of the target +* `__meta_consul_service`: the name of the service the target belongs to +* `__meta_consul_tags`: the list of tags of the target joined by the tag separator + +```yaml +# The information to access the Consul API. It is to be defined +# as the Consul documentation requires. +server: +[ token: ] +[ datacenter: ] +[ scheme: ] +[ username: ] +[ password: ] + +# A list of services for which targets are retrieved. If omitted, all services +# are scraped. +services: + [ - ] + +# The string by which Consul tags are joined into the tag label. +[ tag_separator: | default = , ] +``` + +Note that the IP number and port used to scrape the targets is assembled as +`<__meta_consul_address>:<__meta_consul_service_port>`. However, in some +Consul setups, the relevant address is in `__meta_consul_service_address`. +In those cases, you can use the [relabel](#relabel_config) +feature to replace the special `__address__` label. + +### `` + +A DNS-based service discovery configuration allows specifying a set of DNS +domain names which are periodically queried to discover a list of targets. The +DNS servers to be contacted are read from `/etc/resolv.conf`. + +This service discovery method only supports basic DNS A, AAAA and SRV record +queries, but not the advanced DNS-SD approach specified in +[RFC6763](https://tools.ietf.org/html/rfc6763). + +During the [relabeling phase](#relabel_config), the meta label +`__meta_dns_name` is available on each target and is set to the +record name that produced the discovered target. + +```yaml +# A list of DNS domain names to be queried. +names: + [ - ] + +# The type of DNS query to perform. +[ type: | default = 'SRV' ] + +# The port number used if the query type is not SRV. +[ port: ] + +# The time after which the provided names are refreshed. +[ refresh_interval: | default = 30s ] +``` + +Where `` is a valid DNS domain name. +Where `` is `SRV`, `A`, or `AAAA`. + +### `` + +EC2 SD configurations allow retrieving scrape targets from AWS EC2 +instances. The private IP address is used by default, but may be changed to +the public IP address with relabeling. + +The following meta labels are available on targets during [relabeling](#relabel_config): + +* `__meta_ec2_availability_zone`: the availability zone in which the instance is running +* `__meta_ec2_instance_id`: the EC2 instance ID +* `__meta_ec2_instance_state`: the state of the EC2 instance +* `__meta_ec2_instance_type`: the type of the EC2 instance +* `__meta_ec2_private_ip`: the private IP address of the instance, if present +* `__meta_ec2_public_dns_name`: the public DNS name of the instance, if available +* `__meta_ec2_public_ip`: the public IP address of the instance, if available +* `__meta_ec2_subnet_id`: comma separated list of subnets IDs in which the instance is running, if available +* `__meta_ec2_tag_`: each tag value of the instance +* `__meta_ec2_vpc_id`: the ID of the VPC in which the instance is running, if available + +See below for the configuration options for EC2 discovery: + +```yaml +# The information to access the EC2 API. + +# The AWS Region. +region: + +# The AWS API keys. If blank, the environment variables `AWS_ACCESS_KEY_ID` +# and `AWS_SECRET_ACCESS_KEY` are used. +[ access_key: ] +[ secret_key: ] +# Named AWS profile used to connect to the API. +[ profile: ] + +# Refresh interval to re-read the instance list. +[ refresh_interval: | default = 60s ] + +# The port to scrape metrics from. If using the public IP address, this must +# instead be specified in the relabeling rule. +[ port: | default = 80 ] +``` + +### `` + +CAUTION: OpenStack SD is in beta: breaking changes to configuration are still +likely in future releases. + +OpenStack SD configurations allow retrieving scrape targets from OpenStack Nova +instances. + +The following meta labels are available on targets during [relabeling](#relabel_config): + +* `__meta_openstack_instance_id`: the OpenStack instance ID +* `__meta_openstack_instance_name`: the OpenStack instance name +* `__meta_openstack_instance_status`: the status of the OpenStack instance +* `__meta_openstack_instance_flavor`: the flavor of the OpenStack instance +* `__meta_openstack_public_ip`: the public IP of the OpenStack instance +* `__meta_openstack_private_ip`: the private IP of the OpenStack instance +* `__meta_openstack_tag_`: each tag value of the instance + +See below for the configuration options for OpenStack discovery: + +```yaml +# The information to access the OpenStack API. + +# The OpenStack Region. +region: + +# identity_endpoint specifies the HTTP endpoint that is required to work with +# the Identity API of the appropriate version. While it's ultimately needed by +# all of the identity services, it will often be populated by a provider-level +# function. +[ identity_endpoint: ] + +# username is required if using Identity V2 API. Consult with your provider's +# control panel to discover your account's username. In Identity V3, either +# userid or a combination of username and domain_id or domain_name are needed. +[ username: ] +[ userid: ] + +# password for the Identity V2 and V3 APIs. Consult with your provider's +# control panel to discover your account's preferred method of authentication. +[ password: ] + +# At most one of domain_id and domain_name must be provided if using username +# with Identity V3. Otherwise, either are optional. +[ domain_name: ] +[ domain_id: ] + +# The project_id and project_name fields are optional for the Identity V2 API. +# Some providers allow you to specify a project_name instead of the project_id. +# Some require both. Your provider's authentication policies will determine +# how these fields influence authentication. +[ project_name: ] +[ project_id: ] + +# Refresh interval to re-read the instance list. +[ refresh_interval: | default = 60s ] + +# The port to scrape metrics from. If using the public IP address, this must +# instead be specified in the relabeling rule. +[ port: | default = 80 ] +``` + +### `` + +File-based service discovery provides a more generic way to configure static targets +and serves as an interface to plug in custom service discovery mechanisms. + +It reads a set of files containing a list of zero or more +``s. Changes to all defined files are detected via disk watches +and applied immediately. Files may be provided in YAML or JSON format. Only +changes resulting in well-formed target groups are applied. + +The JSON file must contain a list of static configs, using this format: + +```yaml +[ + { + "targets": [ "", ... ], + "labels": { + "": "", ... + } + }, + ... +] +``` + +As a fallback, the file contents are also re-read periodically at the specified +refresh interval. + +Each target has a meta label `__meta_filepath` during the +[relabeling phase](#relabel_config). Its value is set to the +filepath from which the target was extracted. + +There is a list of +[integrations](/docs/operating/configuration/#) with this +discovery mechanism. + +```yaml +# Patterns for files from which target groups are extracted. +files: + [ - ... ] + +# Refresh interval to re-read the files. +[ refresh_interval: | default = 5m ] +``` + +Where `` may be a path ending in `.json`, `.yml` or `.yaml`. The last path segment +may contain a single `*` that matches any character sequence, e.g. `my/path/tg_*.json`. + +### `` + +CAUTION: GCE SD is in beta: breaking changes to configuration are still +likely in future releases. + +[GCE](https://cloud.google.com/compute/) SD configurations allow retrieving scrape targets from GCP GCE instances. +The private IP address is used by default, but may be changed to the public IP +address with relabeling. + +The following meta labels are available on targets during [relabeling](#relabel_config): + +* `__meta_gce_instance_name`: the name of the instance +* `__meta_gce_metadata_`: each metadata item of the instance +* `__meta_gce_network`: the network URL of the instance +* `__meta_gce_private_ip`: the private IP address of the instance +* `__meta_gce_project`: the GCP project in which the instance is running +* `__meta_gce_public_ip`: the public IP address of the instance, if present +* `__meta_gce_subnetwork`: the subnetwork URL of the instance +* `__meta_gce_tags`: comma separated list of instance tags +* `__meta_gce_zone`: the GCE zone URL in which the instance is running + +See below for the configuration options for GCE discovery: + +```yaml +# The information to access the GCE API. + +# The GCP Project +project: + +# The zone of the scrape targets. If you need multiple zones use multiple +# gce_sd_configs. +zone: + +# Filter can be used optionally to filter the instance list by other criteria +[ filter: ] + +# Refresh interval to re-read the instance list +[ refresh_interval: | default = 60s ] + +# The port to scrape metrics from. If using the public IP address, this must +# instead be specified in the relabeling rule. +[ port: | default = 80 ] + +# The tag separator is used to separate the tags on concatenation +[ tag_separator: | default = , ] +``` + +Credentials are discovered by the Google Cloud SDK default client by looking +in the following places, preferring the first location found: + +1. a JSON file specified by the `GOOGLE_APPLICATION_CREDENTIALS` environment variable +2. a JSON file in the well-known path `$HOME/.config/gcloud/application_default_credentials.json` +3. fetched from the GCE metadata server + +If Prometheus is running within GCE, the service account associated with the +instance it is running on should have at least read-only permissions to the +compute resources. If running outside of GCE make sure to create an appropriate +service account and place the credential file in one of the expected locations. + +### `` + +CAUTION: Kubernetes SD is in beta: breaking changes to configuration are still +likely in future releases. + +Kubernetes SD configurations allow retrieving scrape targets from +[Kubernetes'](http://kubernetes.io/) REST API and always staying synchronized with +the cluster state. + +One of the following `role` types can be configured to discover targets: + +#### `node` + +The `node` role discovers one target per cluster node with the address defaulting +to the Kubelet's HTTP port. +The target address defaults to the first existing address of the Kubernetes +node object in the address type order of `NodeInternalIP`, `NodeExternalIP`, +`NodeLegacyHostIP`, and `NodeHostName`. + +Available meta labels: + +* `__meta_kubernetes_node_name`: The name of the node object. +* `__meta_kubernetes_node_label_`: Each label from the node object. +* `__meta_kubernetes_node_annotation_`: Each annotation from the node object. +* `__meta_kubernetes_node_address_`: The first address for each node address type, if it exists. + +In addition, the `instance` label for the node will be set to the node name +as retrieved from the API server. + +#### `service` + +The `service` role discovers a target for each service port for each service. +This is generally useful for blackbox monitoring of a service. +The address will be set to the Kubernetes DNS name of the service and respective +service port. + +Available meta labels: + +* `__meta_kubernetes_namespace`: The namespace of the service object. +* `__meta_kubernetes_service_name`: The name of the service object. +* `__meta_kubernetes_service_label_`: The label of the service object. +* `__meta_kubernetes_service_annotation_`: The annotation of the service object. +* `__meta_kubernetes_service_port_name`: Name of the service port for the target. +* `__meta_kubernetes_service_port_number`: Number of the service port for the target. +* `__meta_kubernetes_service_port_protocol`: Protocol of the service port for the target. + +#### `pod` + +The `pod` role discovers all pods and exposes their containers as targets. For each declared +port of a container, a single target is generated. If a container has no specified ports, +a port-free target per container is created for manually adding a port via relabeling. + +Available meta labels: + +* `__meta_kubernetes_namespace`: The namespace of the pod object. +* `__meta_kubernetes_pod_name`: The name of the pod object. +* `__meta_kubernetes_pod_ip`: The pod IP of the pod object. +* `__meta_kubernetes_pod_label_`: The label of the pod object. +* `__meta_kubernetes_pod_annotation_`: The annotation of the pod object. +* `__meta_kubernetes_pod_container_name`: Name of the container the target address points to. +* `__meta_kubernetes_pod_container_port_name`: Name of the container port. +* `__meta_kubernetes_pod_container_port_number`: Number of the container port. +* `__meta_kubernetes_pod_container_port_protocol`: Protocol of the container port. +* `__meta_kubernetes_pod_ready`: Set to `true` or `false` for the pod's ready state. +* `__meta_kubernetes_pod_node_name`: The name of the node the pod is scheduled onto. +* `__meta_kubernetes_pod_host_ip`: The current host IP of the pod object. + +#### `endpoints` + +The `endpoints` role discovers targets from listed endpoints of a service. For each endpoint +address one target is discovered per port. If the endpoint is backed by a pod, all +additional container ports of the pod, not bound to an endpoint port, are discovered as targets as well. + +Available meta labels: + +* `__meta_kubernetes_namespace`: The namespace of the endpoints object. +* `__meta_kubernetes_endpoints_name`: The names of the endpoints object. +* For all targets discovered directly from the endpoints list (those not additionally inferred + from underlying pods), the following labels are attached: + * `__meta_kubernetes_endpoint_ready`: Set to `true` or `false` for the endpoint's ready state. + * `__meta_kubernetes_endpoint_port_name`: Name of the endpoint port. + * `__meta_kubernetes_endpoint_port_protocol`: Protocol of the endpoint port. +* If the endpoints belong to a service, all labels of the `role: service` discovery are attached. +* For all targets backed by a pod, all labels of the `role: pod` discovery are attached. + +See below for the configuration options for Kubernetes discovery: + +```yaml +# The information to access the Kubernetes API. + +# The API server addresses. If left empty, Prometheus is assumed to run inside +# of the cluster and will discover API servers automatically and use the pod's +# CA certificate and bearer token file at /var/run/secrets/kubernetes.io/serviceaccount/. +[ api_server: ] + +# The Kubernetes role of entities that should be discovered. +role: + +# Optional authentication information used to authenticate to the API server. +# Note that `basic_auth`, `bearer_token` and `bearer_token_file` options are +# mutually exclusive. + +# Optional HTTP basic authentication information. +basic_auth: + [ username: ] + [ password: ] + +# Optional bearer token authentication information. +[ bearer_token: ] + +# Optional bearer token file authentication information. +[ bearer_token_file: ] + +# TLS configuration. +tls_config: + [ ] + +# Optional namespace discovery. If omitted, all namespaces are used. +namespaces: + names: + [ - ] +``` + +Where `` must be `endpoints`, `service`, `pod`, or `node`. + +See [this example Prometheus configuration file](/documentation/examples/prometheus-kubernetes.yml) +for a detailed example of configuring Prometheus for Kubernetes. + +You may wish to check out the 3rd party [Prometheus Operator](https://github.com/coreos/prometheus-operator), +which automates the Prometheus setup on top of Kubernetes. + +### `` + +CAUTION: Marathon SD is in beta: breaking changes to configuration are still +likely in future releases. + +Marathon SD configurations allow retrieving scrape targets using the +[Marathon](https://mesosphere.github.io/marathon/) REST API. Prometheus +will periodically check the REST endpoint for currently running tasks and +create a target group for every app that has at least one healthy task. + +The following meta labels are available on targets during [relabeling](#relabel_config): + +* `__meta_marathon_app`: the name of the app (with slashes replaced by dashes) +* `__meta_marathon_image`: the name of the Docker image used (if available) +* `__meta_marathon_task`: the ID of the Mesos task +* `__meta_marathon_app_label_`: any Marathon labels attached to the app +* `__meta_marathon_port_definition_label_`: the port definition labels +* `__meta_marathon_port_mapping_label_`: the port mapping labels + +See below for the configuration options for Marathon discovery: + +```yaml +# List of URLs to be used to contact Marathon servers. +# You need to provide at least one server URL, but should provide URLs for +# all masters you have running. +servers: + - + +# Optional bearer token authentication information. +# It is mutually exclusive with `bearer_token_file`. +[ bearer_token: ] + +# Optional bearer token file authentication information. +# It is mutually exclusive with `bearer_token`. +[ bearer_token_file: ] + +# Polling interval +[ refresh_interval: | default = 30s ] +``` + +By default every app listed in Marathon will be scraped by Prometheus. If not all +of your services provide Prometheus metrics, you can use a Marathon label and +Prometheus relabeling to control which instances will actually be scraped. Also +by default all apps will show up as a single job in Prometheus (the one specified +in the configuration file), which can also be changed using relabeling. + +### `` + +Nerve SD configurations allow retrieving scrape targets from [AirBnB's +Nerve](https://github.com/airbnb/nerve) which are stored in +[Zookeeper](https://zookeeper.apache.org/). + +The following meta labels are available on targets during [relabeling](#relabel_config): + +* `__meta_nerve_path`: the full path to the endpoint node in Zookeeper +* `__meta_nerve_endpoint_host`: the host of the endpoint +* `__meta_nerve_endpoint_port`: the port of the endpoint +* `__meta_nerve_endpoint_name`: the name of the endpoint + +```yaml +# The Zookeeper servers. +servers: + - +# Paths can point to a single service, or the root of a tree of services. +paths: + - +[ timeout: | default = 10s ] +``` + +### `` + +Serverset SD configurations allow retrieving scrape targets from +[Serversets](https://github.com/twitter/finagle/tree/master/finagle-serversets) +which are stored in [Zookeeper](https://zookeeper.apache.org/). Serversets are +commonly used by [Finagle](https://twitter.github.io/finagle/) and +[Aurora](http://aurora.apache.org/). + +The following meta labels are available on targets during relabeling: + +* `__meta_serverset_path`: the full path to the serverset member node in Zookeeper +* `__meta_serverset_endpoint_host`: the host of the default endpoint +* `__meta_serverset_endpoint_port`: the port of the default endpoint +* `__meta_serverset_endpoint_host_`: the host of the given endpoint +* `__meta_serverset_endpoint_port_`: the port of the given endpoint +* `__meta_serverset_shard`: the shard number of the member +* `__meta_serverset_status`: the status of the member + +```yaml +# The Zookeeper servers. +servers: + - +# Paths can point to a single serverset, or the root of a tree of serversets. +paths: + - +[ timeout: | default = 10s ] +``` + +Serverset data must be in the JSON format, the Thrift format is not currently supported. + +### `` + +CAUTION: Triton SD is in beta: breaking changes to configuration are still +likely in future releases. + +[Triton](https://github.com/joyent/triton) SD configurations allow retrieving +scrape targets from [Container Monitor](https://github.com/joyent/rfd/blob/master/rfd/0027/README.md) +discovery endpoints. + +The following meta labels are available on targets during relabeling: + +* `__meta_triton_machine_id`: the UUID of the target container +* `__meta_triton_machine_alias`: the alias of the target container +* `__meta_triton_machine_image`: the target containers image type +* `__meta_triton_machine_server_id`: the server UUID for the target container + +```yaml +# The information to access the Triton discovery API. + +# The account to use for discovering new target containers. +account: + +# The DNS suffix which should be applied to target containers. +dns_suffix: + +# The Triton discovery endpoint (e.g. 'cmon.us-east-3b.triton.zone'). This is +# often the same value as dns_suffix. +endpoint: + +# The port to use for discovery and metric scraping. +[ port: | default = 9163 ] + +# The interval which should should be used for refreshing target containers. +[ refresh_interval: | default = 60s ] + +# The Triton discovery API version. +[ version: | default = 1 ] + +# TLS configuration. +tls_config: + [ ] +``` + +### `` + +A `static_config` allows specifying a list of targets and a common label set +for them. It is the canonical way to specify static targets in a scrape +configuration. + +```yaml +# The targets specified by the static config. +targets: + [ - '' ] + +# Labels assigned to all metrics scraped from the targets. +labels: + [ : ... ] +``` + +### `` + +Relabeling is a powerful tool to dynamically rewrite the label set of a target before +it gets scraped. Multiple relabeling steps can be configured per scrape configuration. +They are applied to the label set of each target in order of their appearance +in the configuration file. + +Initially, aside from the configured per-target labels, a target's `job` +label is set to the `job_name` value of the respective scrape configuration. +The `__address__` label is set to the `:` address of the target. +After relabeling, the `instance` label is set to the value of `__address__` by default if +it was not set during relabeling. The `__scheme__` and `__metrics_path__` labels +are set to the scheme and metrics path of the target respectively. The `__param_` +label is set to the value of the first passed URL parameter called ``. + +Additional labels prefixed with `__meta_` may be available during the +relabeling phase. They are set by the service discovery mechanism that provided +the target and vary between mechanisms. + +Labels starting with `__` will be removed from the label set after relabeling is completed. + +If a relabeling step needs to store a label value only temporarily (as the +input to a subsequent relabeling step), use the `__tmp` label name prefix. This +prefix is guaranteed to never be used by Prometheus itself. + +```yaml +# The source labels select values from existing labels. Their content is concatenated +# using the configured separator and matched against the configured regular expression +# for the replace, keep, and drop actions. +[ source_labels: '[' [, ...] ']' ] + +# Separator placed between concatenated source label values. +[ separator: | default = ; ] + +# Label to which the resulting value is written in a replace action. +# It is mandatory for replace actions. Regex capture groups are available. +[ target_label: ] + +# Regular expression against which the extracted value is matched. +[ regex: | default = (.*) ] + +# Modulus to take of the hash of the source label values. +[ modulus: ] + +# Replacement value against which a regex replace is performed if the +# regular expression matches. Regex capture groups are available. +[ replacement: | default = $1 ] + +# Action to perform based on regex matching. +[ action: | default = replace ] +``` + +`` is any valid +[RE2 regular expression](https://github.com/google/re2/wiki/Syntax). It is +required for the `replace`, `keep`, `drop`, `labelmap`,`labeldrop` and `labelkeep` actions. The regex is +anchored on both ends. To un-anchor the regex, use `.*.*`. + +`` determines the relabeling action to take: + +* `replace`: Match `regex` against the concatenated `source_labels`. Then, set + `target_label` to `replacement`, with match group references + (`${1}`, `${2}`, ...) in `replacement` substituted by their value. If `regex` + does not match, no replacement takes place. +* `keep`: Drop targets for which `regex` does not match the concatenated `source_labels`. +* `drop`: Drop targets for which `regex` matches the concatenated `source_labels`. +* `hashmod`: Set `target_label` to the `modulus` of a hash of the concatenated `source_labels`. +* `labelmap`: Match `regex` against all label names. Then copy the values of the matching labels + to label names given by `replacement` with match group references + (`${1}`, `${2}`, ...) in `replacement` substituted by their value. +* `labeldrop`: Match `regex` against all label names. Any label that matches will be + removed from the set of labels. +* `labelkeep`: Match `regex` against all label names. Any label that does not match will be + removed from the set of labels. + +Care must be taken with `labeldrop` and `labelkeep` to ensure that metrics are still uniquely labeled +once the labels are removed. + +### `` + +Metric relabeling is applied to samples as the last step before ingestion. It +has the same configuration format and actions as target relabeling. Metric +relabeling does not apply to automatically generated timeseries such as `up`. + +One use for this is to blacklist time series that are too expensive to ingest. + +### `` + +Alert relabeling is applied to alerts before they are sent to the Alertmanager. +It has the same configuration format and actions as target relabeling. Alert +relabeling is applied after external labels. + +One use for this is ensuring a HA pair of Prometheus servers with different +external labels send identical alerts. + +### `` + +CAUTION: Dynamic discovery of Alertmanager instances is in alpha state. Breaking configuration +changes may happen in future releases. Use static configuration via the `-alertmanager.url` flag +as a stable alternative. + +An `alertmanager_config` section specifies Alertmanager instances the Prometheus server sends +alerts to. It also provides parameters to configure how to communicate with these Alertmanagers. + +Alertmanagers may be statically configured via the `static_configs` parameter or +dynamically discovered using one of the supported service-discovery mechanisms. + +Additionally, `relabel_configs` allow selecting Alertmanagers from discovered +entities and provide advanced modifications to the used API path, which is exposed +through the `__alerts_path__` label. + +```yaml +# Per-target Alertmanager timeout when pushing alerts. +[ timeout: | default = 10s ] + +# Prefix for the HTTP path alerts are pushed to. +[ path_prefix: | default = / ] + +# Configures the protocol scheme used for requests. +[ scheme: | default = http ] + +# Sets the `Authorization` header on every request with the +# configured username and password. +basic_auth: + [ username: ] + [ password: ] + +# Sets the `Authorization` header on every request with +# the configured bearer token. It is mutually exclusive with `bearer_token_file`. +[ bearer_token: ] + +# Sets the `Authorization` header on every request with the bearer token +# read from the configured file. It is mutually exclusive with `bearer_token`. +[ bearer_token_file: /path/to/bearer/token/file ] + +# Configures the scrape request's TLS settings. +tls_config: + [ ] + +# Optional proxy URL. +[ proxy_url: ] + +# List of Azure service discovery configurations. +azure_sd_configs: + [ - ... ] + +# List of Consul service discovery configurations. +consul_sd_configs: + [ - ... ] + +# List of DNS service discovery configurations. +dns_sd_configs: + [ - ... ] + +# List of EC2 service discovery configurations. +ec2_sd_configs: + [ - ... ] + +# List of file service discovery configurations. +file_sd_configs: + [ - ... ] + +# List of GCE service discovery configurations. +gce_sd_configs: + [ - ... ] + +# List of Kubernetes service discovery configurations. +kubernetes_sd_configs: + [ - ... ] + +# List of Marathon service discovery configurations. +marathon_sd_configs: + [ - ... ] + +# List of AirBnB's Nerve service discovery configurations. +nerve_sd_configs: + [ - ... ] + +# List of Zookeeper Serverset service discovery configurations. +serverset_sd_configs: + [ - ... ] + +# List of Triton service discovery configurations. +triton_sd_configs: + [ - ... ] + +# List of labeled statically configured Alertmanagers. +static_configs: + [ - ... ] + +# List of Alertmanager relabel configurations. +relabel_configs: + [ - ... ] +``` + +### `` + +CAUTION: Remote write is experimental: breaking changes to configuration are +likely in future releases. + +`write_relabel_configs` is relabeling applied to samples before sending them +to the remote endpoint. Write relabeling is applied after external labels. This +could be used to limit which samples are sent. + +There is a [small demo](/documentation/examples/remote_storage) of how to use +this functionality. + +```yaml +# The URL of the endpoint to send samples to. +url: + +# Timeout for requests to the remote write endpoint. +[ remote_timeout: | default = 30s ] + +# List of remote write relabel configurations. +write_relabel_configs: + [ - ... ] + +# Sets the `Authorization` header on every remote write request with the +# configured username and password. +basic_auth: + [ username: ] + [ password: ] + +# Sets the `Authorization` header on every remote write request with +# the configured bearer token. It is mutually exclusive with `bearer_token_file`. +[ bearer_token: ] + +# Sets the `Authorization` header on every remote write request with the bearer token +# read from the configured file. It is mutually exclusive with `bearer_token`. +[ bearer_token_file: /path/to/bearer/token/file ] + +# Configures the remote write request's TLS settings. +tls_config: + [ ] + +# Optional proxy URL. +[ proxy_url: ] +``` + +There is a list of +[integrations](https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage) +with this feature. + +### `` + +CAUTION: Remote read is experimental: breaking changes to configuration are +likely in future releases. + +```yaml +# The URL of the endpoint to query from. +url: + +# Timeout for requests to the remote read endpoint. +[ remote_timeout: | default = 30s ] + +# Sets the `Authorization` header on every remote read request with the +# configured username and password. +basic_auth: + [ username: ] + [ password: ] + +# Sets the `Authorization` header on every remote read request with +# the configured bearer token. It is mutually exclusive with `bearer_token_file`. +[ bearer_token: ] + +# Sets the `Authorization` header on every remote read request with the bearer token +# read from the configured file. It is mutually exclusive with `bearer_token`. +[ bearer_token_file: /path/to/bearer/token/file ] + +# Configures the remote read request's TLS settings. +tls_config: + [ ] + +# Optional proxy URL. +[ proxy_url: ] +``` + +There is a list of +[integrations](https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage) +with this feature. diff --git a/docs/getting_started.md b/docs/getting_started.md new file mode 100644 index 000000000..8585e2566 --- /dev/null +++ b/docs/getting_started.md @@ -0,0 +1,275 @@ +--- +title: Getting started +sort_rank: 10 +--- + +# Getting started + +This guide is a "Hello World"-style tutorial which shows how to install, +configure, and use Prometheus in a simple example setup. You will download and run +Prometheus locally, configure it to scrape itself and an example application, +and then work with queries, rules, and graphs to make use of the collected time +series data. + +## Downloading and running Prometheus + +[Download the latest release](https://prometheus.io/download) of Prometheus for +your platform, then extract and run it: + +```bash +tar xvfz prometheus-*.tar.gz +cd prometheus-* +``` + +Before starting Prometheus, let's configure it. + +## Configuring Prometheus to monitor itself + +Prometheus collects metrics from monitored targets by scraping metrics HTTP +endpoints on these targets. Since Prometheus also exposes data in the same +manner about itself, it can also scrape and monitor its own health. + +While a Prometheus server that collects only data about itself is not very +useful in practice, it is a good starting example. Save the following basic +Prometheus configuration as a file named `prometheus.yml`: + +```yaml +global: + scrape_interval: 15s # By default, scrape targets every 15 seconds. + + # Attach these labels to any time series or alerts when communicating with + # external systems (federation, remote storage, Alertmanager). + external_labels: + monitor: 'codelab-monitor' + +# A scrape configuration containing exactly one endpoint to scrape: +# Here it's Prometheus itself. +scrape_configs: + # The job name is added as a label `job=` to any timeseries scraped from this config. + - job_name: 'prometheus' + + # Override the global default and scrape targets from this job every 5 seconds. + scrape_interval: 5s + + static_configs: + - targets: ['localhost:9090'] +``` + +For a complete specification of configuration options, see the +[configuration documentation](configuration.md). + +## Starting Prometheus + +To start Prometheus with your newly created configuration file, change to your +Prometheus build directory and run: + +```bash +# Start Prometheus. +# By default, Prometheus stores its database in ./data (flag -storage.local.path). +./prometheus -config.file=prometheus.yml +``` + +Prometheus should start up and it should show a status page about itself at +[localhost:9090](http://localhost:9090). Give it a couple of seconds to collect +data about itself from its own HTTP metrics endpoint. + +You can also verify that Prometheus is serving metrics about itself by +navigating to its metrics endpoint: +[localhost:9090/metrics](http://localhost:9090/metrics) + +The number of OS threads executed by Prometheus is controlled by the +`GOMAXPROCS` environment variable. As of Go 1.5 the default value is +the number of cores available. + +Blindly setting `GOMAXPROCS` to a high value can be +counterproductive. See the relevant [Go +FAQs](http://golang.org/doc/faq#Why_no_multi_CPU). + +Note that Prometheus by default uses around 3GB in memory. If you have a +smaller machine, you can tune Prometheus to use less memory. For details, +see the [memory usage documentation](storage.md#memory-usage). + +## Using the expression browser + +Let us try looking at some data that Prometheus has collected about itself. To +use Prometheus's built-in expression browser, navigate to +http://localhost:9090/graph and choose the "Console" view within the "Graph" +tab. + +As you can gather from http://localhost:9090/metrics, one metric that +Prometheus exports about itself is called +`prometheus_target_interval_length_seconds` (the actual amount of time between +target scrapes). Go ahead and enter this into the expression console: + +``` +prometheus_target_interval_length_seconds +``` + +This should return a lot of different time series (along with the latest value +recorded for each), all with the metric name +`prometheus_target_interval_length_seconds`, but with different labels. These +labels designate different latency percentiles and target group intervals. + +If we were only interested in the 99th percentile latencies, we could use this +query to retrieve that information: + +``` +prometheus_target_interval_length_seconds{quantile="0.99"} +``` + +To count the number of returned time series, you could write: + +``` +count(prometheus_target_interval_length_seconds) +``` + +For more about the expression language, see the +[expression language documentation](querying/basics.md). + +## Using the graphing interface + +To graph expressions, navigate to http://localhost:9090/graph and use the "Graph" +tab. + +For example, enter the following expression to graph the per-second rate of all +storage chunk operations happening in the self-scraped Prometheus: + +``` +rate(prometheus_local_storage_chunk_ops_total[1m]) +``` + +Experiment with the graph range parameters and other settings. + +## Starting up some sample targets + +Let us make this more interesting and start some example targets for Prometheus +to scrape. + +The Go client library includes an example which exports fictional RPC latencies +for three services with different latency distributions. + +Ensure you have the [Go compiler installed](https://golang.org/doc/install) and +have a [working Go build environment](https://golang.org/doc/code.html) (with +correct `GOPATH`) set up. + +Download the Go client library for Prometheus and run three of these example +processes: + +```bash +# Fetch the client library code and compile example. +git clone https://github.com/prometheus/client_golang.git +cd client_golang/examples/random +go get -d +go build + +# Start 3 example targets in separate terminals: +./random -listen-address=:8080 +./random -listen-address=:8081 +./random -listen-address=:8082 +``` + +You should now have example targets listening on http://localhost:8080/metrics, +http://localhost:8081/metrics, and http://localhost:8082/metrics. + +## Configuring Prometheus to monitor the sample targets + +Now we will configure Prometheus to scrape these new targets. Let's group all +three endpoints into one job called `example-random`. However, imagine that the +first two endpoints are production targets, while the third one represents a +canary instance. To model this in Prometheus, we can add several groups of +endpoints to a single job, adding extra labels to each group of targets. In +this example, we will add the `group="production"` label to the first group of +targets, while adding `group="canary"` to the second. + +To achieve this, add the following job definition to the `scrape_configs` +section in your `prometheus.yml` and restart your Prometheus instance: + +```yaml +scrape_configs: + - job_name: 'example-random' + + # Override the global default and scrape targets from this job every 5 seconds. + scrape_interval: 5s + + static_configs: + - targets: ['localhost:8080', 'localhost:8081'] + labels: + group: 'production' + + - targets: ['localhost:8082'] + labels: + group: 'canary' +``` + +Go to the expression browser and verify that Prometheus now has information +about time series that these example endpoints expose, such as the +`rpc_durations_seconds` metric. + +## Configure rules for aggregating scraped data into new time series + +Though not a problem in our example, queries that aggregate over thousands of +time series can get slow when computed ad-hoc. To make this more efficient, +Prometheus allows you to prerecord expressions into completely new persisted +time series via configured recording rules. Let's say we are interested in +recording the per-second rate of example RPCs +(`rpc_durations_seconds_count`) averaged over all instances (but +preserving the `job` and `service` dimensions) as measured over a window of 5 +minutes. We could write this as: + +``` +avg(rate(rpc_durations_seconds_count[5m])) by (job, service) +``` + +Try graphing this expression. + +To record the time series resulting from this expression into a new metric +called `job_service:rpc_durations_seconds_count:avg_rate5m`, create a file +with the following recording rule and save it as `prometheus.rules`: + +``` +job_service:rpc_durations_seconds_count:avg_rate5m = avg(rate(rpc_durations_seconds_count[5m])) by (job, service) +``` + +To make Prometheus pick up this new rule, add a `rule_files` statement to the +global configuration section in your `prometheus.yml`. The config should now +look like this: + +```yaml +global: + scrape_interval: 15s # By default, scrape targets every 15 seconds. + evaluation_interval: 15s # Evaluate rules every 15 seconds. + + # Attach these extra labels to all timeseries collected by this Prometheus instance. + external_labels: + monitor: 'codelab-monitor' + +rule_files: + - 'prometheus.rules' + +scrape_configs: + - job_name: 'prometheus' + + # Override the global default and scrape targets from this job every 5 seconds. + scrape_interval: 5s + + static_configs: + - targets: ['localhost:9090'] + + - job_name: 'example-random' + + # Override the global default and scrape targets from this job every 5 seconds. + scrape_interval: 5s + + static_configs: + - targets: ['localhost:8080', 'localhost:8081'] + labels: + group: 'production' + + - targets: ['localhost:8082'] + labels: + group: 'canary' +``` + +Restart Prometheus with the new configuration and verify that a new time series +with the metric name `job_service:rpc_durations_seconds_count:avg_rate5m` +is now available by querying it through the expression browser or graphing it. diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 000000000..8f4e3aabc --- /dev/null +++ b/docs/index.md @@ -0,0 +1,16 @@ +--- +# todo: internal +--- + +# Prometheus 1.8 + +Welcome to the documentation of the Prometheus server. + +The documentation is available alongside all the project documentation at +[prometheus.io](https://prometheus.io/docs/prometheus/1.8/). + +## Content + +- [Installing](install.md) +- [Getting started](getting_started.md) +- [Configuration](configuration.md) diff --git a/docs/installation.md b/docs/installation.md new file mode 100644 index 000000000..186de8aaf --- /dev/null +++ b/docs/installation.md @@ -0,0 +1,96 @@ +--- +title: Installing +--- + +# Installing + +## Using pre-compiled binaries + +We provide precompiled binaries for most official Prometheus components. Check +out the [download section](https://prometheus.io/download) for a list of all +available versions. + +## From source + +For building Prometheus components from source, see the `Makefile` targets in +the respective repository. + +## Using Docker + +All Prometheus services are available as Docker images under the +[prom](https://hub.docker.com/u/prom/) organization. + +Running Prometheus on Docker is as simple as `docker run -p 9090:9090 +prom/prometheus`. This starts Prometheus with a sample configuration and +exposes it on port 9090. + +The Prometheus image uses a volume to store the actual metrics. For +production deployments it is highly recommended to use the +[Data Volume Container](https://docs.docker.com/engine/userguide/containers/dockervolumes/#creating-and-mounting-a-data-volume-container) +pattern to ease managing the data on Prometheus upgrades. + +To provide your own configuration, there are several options. Here are +two examples. + +### Volumes & bind-mount + +Bind-mount your prometheus.yml from the host by running: + +``` +docker run -p 9090:9090 -v /tmp/prometheus.yml:/etc/prometheus/prometheus.yml \ + prom/prometheus +``` + +Or use an additional volume for the config: + +``` +docker run -p 9090:9090 -v /prometheus-data \ + prom/prometheus -config.file=/prometheus-data/prometheus.yml +``` + +### Custom image + +To avoid managing a file on the host and bind-mount it, the +configuration can be baked into the image. This works well if the +configuration itself is rather static and the same across all +environments. + +For this, create a new directory with a Prometheus configuration and a +Dockerfile like this: + +``` +FROM prom/prometheus +ADD prometheus.yml /etc/prometheus/ +``` + +Now build and run it: + +``` +docker build -t my-prometheus . +docker run -p 9090:9090 my-prometheus +``` + +A more advanced option is to render the config dynamically on start +with some tooling or even have a daemon update it periodically. + +## Using configuration management systems + +If you prefer using configuration management systems you might be interested in +the following third-party contributions: + +Ansible: + +* [griggheo/ansible-prometheus](https://github.com/griggheo/ansible-prometheus) +* [William-Yeh/ansible-prometheus](https://github.com/William-Yeh/ansible-prometheus) + +Chef: + +* [rayrod2030/chef-prometheus](https://github.com/rayrod2030/chef-prometheus) + +Puppet: + +* [puppet/prometheus](https://forge.puppet.com/puppet/prometheus) + +SaltStack: + +* [bechtoldt/saltstack-prometheus-formula](https://github.com/bechtoldt/saltstack-prometheus-formula)