consul/website/content/docs/connect/observability/grafanadashboards/consulserverdashboard.mdx

165 lines
4.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

---
layout: docs
page_title: Dashboard for Consul server metrics
description: >-
This documentation provides an overview of the Consul Server Dashboard. Learn about the metrics it displays and the queries that produce the metrics.
---
# Consul server monitoring dashboard
This page provides reference information about the [Grafana dashboard configuration included in the `hashicorp/consul` GitHub repository](https://github.com/hashicorp/consul/blob/main/grafana/consul-server-monitoring.json).
## Grafana queries overview
This dashboard provides the following information about service mesh operations.
### Raft commit time
**Description:** This metric measures the time it takes to commit Raft log entries. Stable values are expected for a healthy cluster. High values can indicate issues with resources such as memory, CPU, or disk space.
```promql
consul_raft_commitTime
```
### Raft commits per 5 minutes
**Description:** This metric tracks the rate of Raft log commits emitted by the leader, showing how quickly changes are being applied across the cluster.
```promql
rate(consul_raft_apply[5m])
```
### Last contacted leader
**Description:** Measures the duration since the last contact with the Raft leader. Spikes in this metric can indicate network issues or an unavailable leader, which may affect cluster stability.
```promql
consul_raft_leader_lastContact != 0
```
### Election events
**Description:** Tracks Raft state transitions, which indicate leadership elections. Frequent transitions might suggest cluster instability and require investigation.
```promql
rate(consul_raft_state_candidate[1m])
```
```promql
rate(consul_raft_state_leader[1m])
```
### Autopilot health
**Description:** A boolean metric that shows a value of 1 when Autopilot is healthy and 0 when issues are detected. Ensures that the cluster has sufficient resources and an operational leader.
```promql
consul_autopilot_healthy
```
### DNS queries per 5 minutes
**Description:** This metric tracks the rate of DNS queries per node, bucketed into 5 minute intervals. It helps monitor the query load on Consuls DNS service.
```promql
rate(consul_dns_domain_query_count[5m])
```
### DNS domain query time
**Description:** Measures the time spent handling DNS domain queries. Spikes in this metric may indicate high contention in the catalog or too many concurrent queries.
```promql
consul_dns_domain_query
```
### DNS reverse query time
**Description:** Tracks the time spent processing reverse DNS queries. Spikes in query time may indicate performance bottlenecks or increased workload.
```promql
consul_dns_ptr_query
```
### KV applies per 5 minutes
**Description:** This metric tracks the rate of key-value store applies over 5 minute intervals, indicating the operational load on Consuls KV store.
```promql
rate(consul_kvs_apply_count[5m])
```
### KV apply time
**Description:** Measures the time taken to apply updates to the key-value store. Spikes in this metric might suggest resource contention or client overload.
```promql
consul_kvs_apply
```
### Transaction apply time
**Description:** Tracks the time spent applying transaction operations in Consul, providing insights into potential bottlenecks in transaction operations.
```promql
consul_txn_apply
```
### ACL resolves per 5 minutes
**Description:** This metric tracks the rate of ACL token resolutions over 5 minute intervals. It provides insights into the activity related to ACL tokens within the cluster.
```promql
rate(consul_acl_ResolveToken_count[5m])
```
### ACL resolve token time
**Description:** Measures the time taken to resolve ACL tokens into their associated policies.
```promql
consul_acl_ResolveToken
```
### ACL updates per 5 minutes
**Description:** Tracks the rate of ACL updates over 5 minute intervals. This metric helps monitor changes in ACL configurations over time.
```promql
rate(consul_acl_apply_count[5m])
```
### ACL apply time
**Description:** Measures the time spent applying ACL changes. Spikes in apply time might suggest resource constraints or high operational load.
```promql
consul_acl_apply
```
### Catalog operations per 5 minutes
**Description:** Tracks the rate of register and deregister operations in the Consul catalog, providing insights into the churn of services within the cluster.
```promql
rate(consul_catalog_register_count[5m])
```
```promql
rate(consul_catalog_deregister_count[5m])
```
### Catalog operation time
**Description:** Measures the time taken to complete catalog register or deregister operations. Spikes in this metric may indicate performance issues within the catalog.
```promql
consul_catalog_register
```
```promql
consul_catalog_deregister
```