consul/website/content/docs/k8s/deployment-configurations/datadog.mdx

469 lines
20 KiB
Markdown

---
layout: docs
page_title: Configure Datadog metrics collection for Consul on Kubernetes
description: >-
Consul can integrate with external platforms such as Datadog to stream metrics about its operations. Learn how to enable Consul monitoring with Datadog by configuring the `metrics.datadog` Helm value override options.
---
# Configure Datadog metrics collection for Consul on Kubernetes
This page describes the processes for integrating Datadog metrics collection in your Consul on Kubernetes deployment. The Helm chart includes automated configuration options to simplify the integration process.
## Datadog Metrics Integration Methods
- [DogstatsD](#dogstatsd)
- [Datadog Checks: Official Consul Integration](#datadog-checks-official-consul-integration)
- [Openmetrics Prometheus](#openmetrics-prometheus)
Users should choose **_one_** integration method from the three described below that best suites the intent for metrics collection. **[DogStatsD](https://docs.datadoghq.com/developers/dogstatsd/?tab=kubernetes)**, **[Consul Integration](https://docs.datadoghq.com/integrations/consul/?tab=containerized)**, and **[Openmetrics Prometheus](https://docs.datadoghq.com/containers/kubernetes/prometheus/?tab=kubernetesadv2)** methods of integration are **_mutually exclusive_**.
**Reasoning:** _The consul-k8s helm chart automated configuration implements Datadog's [Consul Integration](https://docs.datadoghq.com/integrations/consul/?tab=containerized) method using the [`use_prometheus_endpoint`](https://github.com/DataDog/integrations-core/blob/07c04c5e9465ba1f3e0198830896d05923e81283/consul/datadog_checks/consul/data/conf.yaml.example#L59) configuration parameter. **DogstatsD**, **Consul Integration**, and **Openmetrics Prometheus** Metrics **by design** share the same [metric name](https://docs.datadoghq.com/integrations/consul/?tab=host#data-collected) syntax for collection, and would therefore cause a conflict. The [consul.py](https://github.com/DataDog/integrations-core/blob/07c04c5e9465ba1f3e0198830896d05923e81283/consul/datadog_checks/consul/consul.py#L55-L61) integration source code, as well as the [consul-k8s helm chart](https://github.com/hashicorp/consul-k8s/blob/4cac70496788f50354f96e9331003fcf338f419c/charts/consul/templates/_helpers.tpl#L595-L598) prohibit the enablement of more that one integration at a time._
## DogstatsD
This method of implementation leverages the [hashicorp/go-metrics DogstatsD client library](https://github.com/hashicorp/go-metrics/tree/master/datadog) to manage metrics collection.
Metrics are aggregated and sent via UDP or UDS transports to a Datadog Agent that runs on the same Kube Node as the Consul servers.
Enabling this method of metrics collection allows Consul to control the delivery of metrics traffic directly to a Datadog agent rather
than a Datadog agent attempting to reach Consul and scrape the `/v1/agent/metrics` API endpoint.
This is accomplished by updating each server agent's configuration telemetry stanza.
### Helm Chart Configuration
<Tabs>
<CodeBlockConfig heading={"DogstatsD (UDS)"}>
Consul Helm Chart Overrides
```yaml
metrics:
enabled: true
enableAgentMetrics: true
datadog:
enabled: true
namespace: "datadog"
dogstatsd:
enabled: true
socketTransportType: "UDS"
dogstatsdAddr: "/var/run/datadog/dsd.socket"
```
Resulting server agent telemetry configuration
```json
{
"telemetry": {
"dogstatsd_addr": "unix:///var/run/datadog/dsd.socket"
}
}
```
</CodeBlockConfig>
<CodeBlockConfig heading={"DogstatsD (UDP:KubeService)"}>
Consul Helm Chart Overrides
```yaml
metrics:
enabled: true
enableAgentMetrics: true
datadog:
enabled: true
namespace: "datadog"
dogstatsd:
enabled: true
socketTransportType: "UDP"
# Set `dogstatsdPort` to `0` (default) to omit port number append to address.
dogstatsdPort: 0
dogstatsdAddr: "datadog-agent.datadog.svc.cluster.local"
```
Resulting server agent telemetry configuration
```json
{
"telemetry": {
"dogstatsd_addr": "datadog-agent.datadog.svc.cluster.local"
}
}
```
</CodeBlockConfig>
<CodeBlockConfig heading={"DogstatsD (UDP: hostPort)"}>
Consul Helm Chart Overrides
```yaml
metrics:
enabled: true
enableAgentMetrics: true
datadog:
enabled: true
namespace: "datadog"
dogstatsd:
enabled: true
socketTransportType: "UDP"
dogstatsdPort: 8125
dogstatsdAddr: "172.20.180.10"
```
Resulting server agent telemetry configuration
```json
{
"telemetry": {
"dogstatsd_addr": "172.20.180.10:8125",
}
}
```
</CodeBlockConfig>
</Tabs>
### UDS/UDP Advantages and Disadvantages
This integration method accomplishes metrics collection by leveraging either [Unix Domain Sockets](https://docs.datadoghq.com/developers/dogstatsd/unix_socket/?tab=kubernetes) (**UDS**) or [User Datagram Protocol](https://docs.datadoghq.com/developers/dogstatsd/?tab=kubernetes#agent) (**UDP**) transport.
Practitioners who manage their Kubernetes infrastructure and/or service-mesh should take into account the implications outlined in the tables below.
#### UDS
**Packet Transport**: Unix Domain Socket File
| Advantages | Disadvantages |
|-------------------------------------------------------|------------------------------------------------------------------------------------------------------|
| No IP or DNS resolution requirement for Datadog Agent | Requires [hostPath](https://kubernetes.io/docs/concepts/storage/volumes/#hostpath) Volume attachment |
| Improved network performance | Datadog Agent must run on every host you send metrics from |
| Higher throughput capacity | |
| Packet error handling | |
| Automatic container ID tagging | |
#### UDP
**Packet Transport**:
* Kubernetes Service `IP:Port`
* Container Host Port
| Advantages | Disadvantages |
|------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------|
| Does **not** require [hostPath](https://kubernetes.io/docs/concepts/storage/volumes/#hostpath) Volume attachment | **No** packet error handling |
| (**_KubeDNS_**) Does **not** require Hostport exposure if accessible from cluster | (**_Hostport_**) Requires a networking provider that adheres to the CNI specification, such as Calico, Canal, or Flannel. |
| Similar `IP:Port` configuration as Virtual Machine hosts | (**_Hostport_**) Requires port to be exposed on host using `hostNetwork` |
| | (**_Hostport_**) Requires firewall access controls to permit access |
| | (**_Hostport_**) Network Namespace sharing is required |
#### Verifying DogstatsD Metric Collection
To confirm you're Datadog agent is receiving traffic, the `status` subcommand can be ran from the Datadog Agent expecting to receive DogstatsD traffic from Consul.
There should be an increase in either UDP or UDS traffic packet counts from the resultant output after the configuration has been properly established.
| Transport | Command | Pod | Container |
|:---------------|----------------------------------------------------------------------------|---------------|-----------|
| `UDP`\|\|`UDS` | `agent status` | datadog-agent | agent |
<Tabs>
<CodeBlockConfig heading={"UDP"}>
```shell
# Example: UDP Packet and Metric Packet Traffic Increase
=========
DogStatsD
=========
Event Packets: 0
Event Parse Errors: 0
Metric Packets: 5,908
Metric Parse Errors: 0
Service Check Packets: 0
Service Check Parse Errors: 0
Udp Bytes: 636,872
Udp Packet Reading Errors: 0
Udp Packets: 3,300
Uds Bytes: 0
Uds Origin Detection Errors: 0
Uds Packet Reading Errors: 0
Uds Packets: 0
Unterminated Metric Errors: 0
```
</CodeBlockConfig>
<CodeBlockConfig heading={"UDS"}>
```shell
# Example: UDS Packet and Metric Packet Traffic Increase
=========
DogStatsD
=========
Event Packets: 0
Event Parse Errors: 0
Metric Packets: 30,523
Metric Parse Errors: 0
Service Check Packets: 0
Service Check Parse Errors: 0
Udp Bytes: 124,635
Udp Packet Reading Errors: 0
Udp Packets: 731
Uds Bytes: 2,957,433
Uds Origin Detection Errors: 0
Uds Packet Reading Errors: 0
Uds Packets: 11,563
Unterminated Metric Errors: 0
```
</CodeBlockConfig>
</Tabs>
Traffic verification can also be accomplished using the `netstat` command line utility from a consul-server expected to be submitting
metrics data to Datadog.
<Note>
Using <code>netstat</code> requires privileged container permissions to install <code>open-bsd</code> networking tools on the consul-server for testing.
</Note>
| Transport | Command | Pod | Container |
|:-----------------|-----------|---------------|-----------|
| `UDP` \|\| `UDS` | `netstat` | consul-server | consul |
<Tabs>
<CodeBlockConfig heading={"UDP"}>
```shell
$ netstat -nup | grep "172.28.13.12:8125.*ESTABLISHED
udp 0 0 127.0.0.1:53874 127.0.0.1:8125 ESTABLISHED 23176/consul
```
</CodeBlockConfig>
<CodeBlockConfig heading={"UDS"}>
```shell
$ netstat -x
Active UNIX domain sockets (w/o servers)
Proto RefCnt Flags Type State I-Node Path
unix 2 [ ] DGRAM CONNECTED 15952473
unix 2 [ ] DGRAM 15652537 @9d10c
```
</CodeBlockConfig>
</Tabs>
UDS provides the additional capability for verification by sending a test metrics packet to the Unix Socket configured.
<Note>
Using <code>netcat</code> (nc) requires privileged container permissions to install <code>open-bsd</code> networking tools on the consul-server for testing.
</Note>
| Transport | Command | Pod | Container |
|:----------|---------|---------------|-----------|
| `UDS` | `nc` | consul-server | consul |
<CodeBlockConfig heading={"UDS"}>
```shell
$ echo -n "custom.metric.name:1|c" | nc -U -u -w1 /var/run/datadog/dsd.socket
Bound on /tmp/nc-IjJkoG/recv.sock
```
</CodeBlockConfig>
#### Use Case
DogstatsD integration provides full-scope metrics collection from Consul, and minimizes access control configuration requirements as traffic
flow is outbound (toward the Datadog Agent) as opposed to inbound (toward the `/v1/agent/metrics/` API endpoint).
#### Metrics Data Collected
- Full list of metrics sent via DogstatsD consists of those listed in the [Agent Telemetry](https://developer.hashicorp.com/consul/docs/agent/telemetry) documentation.
## Datadog Checks: Official Consul Integration
The Datadog Agent package includes official third-party integrations for built-in availability upon agent deployment.
The Consul Integration Datadog checks provided some additional metric verification checks that leverage Consul's built-in feature-set, and help monitor Consul
during normal operation beyond that of Consul's available metrics.
See the below [table](#additional-integration-checks-performed) for an outline of the features added by the official integration.
<Note>
Currently, the annotations configured by the Helm overrides with Consul RPC TLS enabled
assume server and ca certificate secrets are shared with the Datadog agent release namespace and mount the valid <code>tls.crt</code>, <code>tls.key</code>,
and <code>ca.crt</code> secret volumes at the <code>/etc/datadog-agent/conf.d/consul.d/certs</code> path on the Datadog Agent, agent container.
</Note>
### Helm Chart Configuration
<Tabs>
<CodeBlockConfig heading={"Datadog Consul Checks"}>
Consul Helm Chart Overrides
```yaml
global:
tls:
enabled: true
enableAutoEncrypt: true
acls:
manageSystemACLs: true
metrics:
enabled: true
enableAgentMetrics: true
datadog:
enabled: true
namespace: "datadog"
```
Consul `server-statefulset.yaml` annotations
```yaml
"ad.datadoghq.com/consul.checks": |
{
"consul": {
"init_config": {},
"instances": [
{
"url": "https://consul-server.consul.svc:8501",
"tls_cert": "/etc/datadog-agent/conf.d/consul.d/certs/tls.crt",
"tls_private_key": "/etc/datadog-agent/conf.d/consul.d/certs/tls.key",
"tls_ca_cert": "/etc/datadog-agent/conf.d/consul.d/ca/tls.crt",
"use_prometheus_endpoint": true,
"acl_token": "ENC[k8s_secret@consul/consul-datadog-agent-metrics-acl-token/token]",
"new_leader_checks": true,
"network_latency_checks": true,
"catalog_checks": true,
"auth_type": "basic"
}
]
}
}
```
</CodeBlockConfig>
</Tabs>
### Additional Integration Checks Performed
| Consul Component | Description | API Endpoint(s) |
|------------------|-----------------------------------------------------|----------------------------------------------------------------------|
| Agent | Agent Metadata (i.e., version) | `/v1/agent/self` |
| Metrics | Prometheus formatted metrics | `/v1/agent/metrics` |
| Serf | Events and Membership Flaps | `/v1/health/service/consul` `/v1/agent/self` |
| Raft | Monitors Raft peer information and leader elections | `/v1/status/leader` `/v1/status/peers` |
| Catalog Services | Service Health Status and Node Count | `/v1/catalog/services` `/v1/health/state/any` |
| Catalog Nodes | Node Service Count and Health Status | `/v1/health/state/any` `/v1/health/service/<service>` |
| Consul Latency | Consul LAN + WAN Coordinate Latency Calculations | `/v1/agent/self` `/v1/coordinate/nodes` `/v1/coordinate/datacenters` |
#### Use Case
This integration is primarily for basic Consul monitoring with focus on the service discovery.
#### Metrics Data Collected
The list of Consul's Prometheus metrics scraped and mapped by this method are listed in the latest [metrics.py](https://github.com/DataDog/integrations-core/blob/master/consul/datadog_checks/consul/metrics.py) of the integration source code.
To understand how Consul Latency metrics are calculated, review the [Consul Network Coordinates](https://developer.hashicorp.com/consul/docs/architecture/coordinates) documentation.
Review the [Datadog Documentation](https://docs.datadoghq.com/integrations/consul/?tab=containerized#data-collected) for the full description of Metrics data collected.
## Openmetrics Prometheus
For Datadog agents at or above v6.5.0, OpenMetrics and Prometheus checks are available to scrape Kubernetes application Prometheus endpoints.
This method implements the collection via Openmetrics as that is fully supported for Prometheus text format and is accomplished using pod annotations as demonstrated below.
<Note>
Enabling OpenMetrics collection via Datadog <strong><em>by design</em></strong> removes the <code>prometheus.io/path</code> and <code>prometheus.io/port</code> annotations from the consul-server statefulset deployment to allow Datadog
to scrape the agent's metrics API endpoint using either RPC TLS and Consul ACLs as necessary.
</Note>
<Note>
Currently, the annotations configured by the Helm overrides with Consul RPC TLS enabled
assume server and ca certificate secrets are shared with the Datadog agent release namespace and mount the valid <code>tls.crt</code>, <code>tls.key</code>,
and <code>ca.crt</code> secret volumes at the <code>/etc/datadog-agent/conf.d/consul.d/certs</code> path on the Datadog Agent, agent container.
</Note>
### Helm Chart Configuration
<Tabs>
<CodeBlockConfig heading={"OpenMetrics Prometheus"}>
Consul Helm Chart Overrides
```yaml
global:
tls:
enabled: true
enableAutoEncrypt: true
acls:
manageSystemACLs: true
metrics:
enabled: true
enableAgentMetrics: true
datadog:
enabled: true
namespace: "datadog"
openMetricsPrometheus:
enabled: true
```
Consul `server-statefulset.yaml` annotations
```yaml
ad.datadoghq.com/consul.checks: |
{
"openmetrics": {
"init_config": {},
"instances": [
{
"openmetrics_endpoint": "https://consul-server.consul.svc:8501/v1/agent/metrics?format=prometheus",
"tls_cert": "/etc/datadog-agent/conf.d/consul.d/certs/tls.crt",
"tls_private_key": "/etc/datadog-agent/conf.d/consul.d/certs/tls.key",
"tls_ca_cert": "/etc/datadog-agent/conf.d/consul.d/ca/tls.crt",
"headers": {
"X-Consul-Token": "ENC[k8s_secret@consul/consul-datadog-agent-metrics-acl-token/token]"
},
"namespace": "consul",
"metrics": [ ".*" ]
}
]
}
}
```
</CodeBlockConfig>
</Tabs>
#### Use Case
This method of integration is useful for Prometheus-enabled scrapes with further customization of the collected data.
By default, all metrics pulled using this method scrape Consul metrics using the `/v1/agent/metrics?format=prometheus` API query, and are considered to be custom metrics.
Use of this method maps to Datadog as described in [Mapping Prometheus Metrics to Datadog Metrics](https://docs.datadoghq.com/integrations/guide/prometheus-metrics/?tab=latestversion). The following table summarizing how these metrics map to each other.
| OpenMetrics metric type | Datadog metric type |
|:------------------------|:-----------------------------------|
| `Gauge` | `gauge` |
| `Counter` | `count` |
| Histogram: `_count ` | `count.count` |
| Histogram: `_sum` | `count.sum` |
| Histogram: `_bucket` | `count.bucket` \|\| `distribution` |
| Summary: `_count` | `count.count` |
| Summary: `_sum` | `count.sum` |
| Summary: `sample` | `gauge.quantile` |
#### Metrics Data Collected
The integration, by default, uses a wildcard (`".*"`) to collect **_all_** metrics emitted from the `/v1/agent/metrics` endpoint.
Please refer to the [Agent Telemetry](https://developer.hashicorp.com/consul/docs/agent/telemetry) documentation for a full list and description of the metrics data collected.