From e16af25e7c58e256da79019095640cce0653413e Mon Sep 17 00:00:00 2001 From: hc-github-team-consul-core Date: Tue, 20 Feb 2024 10:06:25 -0500 Subject: [PATCH] Backport of Consul Kubernetes Datadog Integration Docs Update into release/1.17.x (#20677) --- .../k8s/deployment-configurations/datadog.mdx | 468 ++++++++++++++++++ website/data/docs-nav-data.json | 4 + 2 files changed, 472 insertions(+) create mode 100644 website/content/docs/k8s/deployment-configurations/datadog.mdx diff --git a/website/content/docs/k8s/deployment-configurations/datadog.mdx b/website/content/docs/k8s/deployment-configurations/datadog.mdx new file mode 100644 index 0000000000..cc627cb59b --- /dev/null +++ b/website/content/docs/k8s/deployment-configurations/datadog.mdx @@ -0,0 +1,468 @@ +--- +layout: docs +page_title: Configure Datadog metrics collection for Consul on Kubernetes +description: >- + Consul can integrate with external platforms such as Datadog to stream metrics about its operations. Learn how to enable Consul monitoring with Datadog by configuring the `metrics.datadog` Helm value override options. +--- + +# Configure Datadog metrics collection for Consul on Kubernetes + +This page describes the processes for integrating Datadog metrics collection in your Consul on Kubernetes deployment. The Helm chart includes automated configuration options to simplify the integration process. + +## Datadog Metrics Integration Methods + +- [DogstatsD](#dogstatsd) +- [Datadog Checks: Official Consul Integration](#datadog-checks-official-consul-integration) +- [Openmetrics Prometheus](#openmetrics-prometheus) + +Users should choose **_one_** integration method from the three described below that best suites the intent for metrics collection. **[DogStatsD](https://docs.datadoghq.com/developers/dogstatsd/?tab=kubernetes)**, **[Consul Integration](https://docs.datadoghq.com/integrations/consul/?tab=containerized)**, and **[Openmetrics Prometheus](https://docs.datadoghq.com/containers/kubernetes/prometheus/?tab=kubernetesadv2)** methods of integration are **_mutually exclusive_**. + +**Reasoning:** _The consul-k8s helm chart automated configuration implements Datadog's [Consul Integration](https://docs.datadoghq.com/integrations/consul/?tab=containerized) method using the [`use_prometheus_endpoint`](https://github.com/DataDog/integrations-core/blob/07c04c5e9465ba1f3e0198830896d05923e81283/consul/datadog_checks/consul/data/conf.yaml.example#L59) configuration parameter. **DogstatsD**, **Consul Integration**, and **Openmetrics Prometheus** Metrics **by design** share the same [metric name](https://docs.datadoghq.com/integrations/consul/?tab=host#data-collected) syntax for collection, and would therefore cause a conflict. The [consul.py](https://github.com/DataDog/integrations-core/blob/07c04c5e9465ba1f3e0198830896d05923e81283/consul/datadog_checks/consul/consul.py#L55-L61) integration source code, as well as the [consul-k8s helm chart](https://github.com/hashicorp/consul-k8s/blob/4cac70496788f50354f96e9331003fcf338f419c/charts/consul/templates/_helpers.tpl#L595-L598) prohibit the enablement of more that one integration at a time._ + + +## DogstatsD + +This method of implementation leverages the [hashicorp/go-metrics DogstatsD client library](https://github.com/hashicorp/go-metrics/tree/master/datadog) to manage metrics collection. +Metrics are aggregated and sent via UDP or UDS transports to a Datadog Agent that runs on the same Kube Node as the Consul servers. + +Enabling this method of metrics collection allows Consul to control the delivery of metrics traffic directly to a Datadog agent rather +than a Datadog agent attempting to reach Consul and scrape the `/v1/agent/metrics` API endpoint. + +This is accomplished by updating each server agent's configuration telemetry stanza. + +### Helm Chart Configuration + + + + + Consul Helm Chart Overrides + + ```yaml + metrics: + enabled: true + enableAgentMetrics: true + datadog: + enabled: true + namespace: "datadog" + dogstatsd: + enabled: true + socketTransportType: "UDS" + dogstatsdAddr: "/var/run/datadog/dsd.socket" + ``` + + Resulting server agent telemetry configuration + + ```json + { + "telemetry": { + "dogstatsd_addr": "unix:///var/run/datadog/dsd.socket" + } + } + ``` + + + + + + Consul Helm Chart Overrides + + ```yaml + metrics: + enabled: true + enableAgentMetrics: true + datadog: + enabled: true + namespace: "datadog" + dogstatsd: + enabled: true + socketTransportType: "UDP" + # Set `dogstatsdPort` to `0` (default) to omit port number append to address. + dogstatsdPort: 0 + dogstatsdAddr: "datadog-agent.datadog.svc.cluster.local" + ``` + + Resulting server agent telemetry configuration + + ```json + { + "telemetry": { + "dogstatsd_addr": "datadog-agent.datadog.svc.cluster.local" + } + } + ``` + + + + + + Consul Helm Chart Overrides + + ```yaml + metrics: + enabled: true + enableAgentMetrics: true + datadog: + enabled: true + namespace: "datadog" + dogstatsd: + enabled: true + socketTransportType: "UDP" + dogstatsdPort: 8125 + dogstatsdAddr: "172.20.180.10" + ``` + + Resulting server agent telemetry configuration + + ```json + { + "telemetry": { + "dogstatsd_addr": "172.20.180.10:8125", + } + } + ``` + + + + +### UDS/UDP Advantages and Disadvantages + +This integration method accomplishes metrics collection by leveraging either [Unix Domain Sockets](https://docs.datadoghq.com/developers/dogstatsd/unix_socket/?tab=kubernetes) (**UDS**) or [User Datagram Protocol](https://docs.datadoghq.com/developers/dogstatsd/?tab=kubernetes#agent) (**UDP**) transport. +Practitioners who manage their Kubernetes infrastructure and/or service-mesh should take into account the implications outlined in the tables below. + +#### UDS + +**Packet Transport**: Unix Domain Socket File + +| Advantages | Disadvantages | +|-------------------------------------------------------|------------------------------------------------------------------------------------------------------| +| No IP or DNS resolution requirement for Datadog Agent | Requires [hostPath](https://kubernetes.io/docs/concepts/storage/volumes/#hostpath) Volume attachment | +| Improved network performance | Datadog Agent must run on every host you send metrics from | +| Higher throughput capacity | | +| Packet error handling | | +| Automatic container ID tagging | | + +#### UDP + +**Packet Transport**: + * Kubernetes Service `IP:Port` + * Container Host Port + +| Advantages | Disadvantages | +|------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------| +| Does **not** require [hostPath](https://kubernetes.io/docs/concepts/storage/volumes/#hostpath) Volume attachment | **No** packet error handling | +| (**_KubeDNS_**) Does **not** require Hostport exposure if accessible from cluster | (**_Hostport_**) Requires a networking provider that adheres to the CNI specification, such as Calico, Canal, or Flannel. | +| Similar `IP:Port` configuration as Virtual Machine hosts | (**_Hostport_**) Requires port to be exposed on host using `hostNetwork` | +| | (**_Hostport_**) Requires firewall access controls to permit access | +| | (**_Hostport_**) Network Namespace sharing is required | + + +#### Verifying DogstatsD Metric Collection + +To confirm you're Datadog agent is receiving traffic, the `status` subcommand can be ran from the Datadog Agent expecting to receive DogstatsD traffic from Consul. + +There should be an increase in either UDP or UDS traffic packet counts from the resultant output after the configuration has been properly established. + + + | Transport | Command | Pod | Container | + |:---------------|----------------------------------------------------------------------------|---------------|-----------| + | `UDP`\|\|`UDS` | `agent status` | datadog-agent | agent | + + + + + + ```shell + # Example: UDP Packet and Metric Packet Traffic Increase + ========= + DogStatsD + ========= + Event Packets: 0 + Event Parse Errors: 0 + Metric Packets: 5,908 + Metric Parse Errors: 0 + Service Check Packets: 0 + Service Check Parse Errors: 0 + Udp Bytes: 636,872 + Udp Packet Reading Errors: 0 + Udp Packets: 3,300 + Uds Bytes: 0 + Uds Origin Detection Errors: 0 + Uds Packet Reading Errors: 0 + Uds Packets: 0 + Unterminated Metric Errors: 0 + ``` + + + + ```shell + # Example: UDS Packet and Metric Packet Traffic Increase + ========= + DogStatsD + ========= + Event Packets: 0 + Event Parse Errors: 0 + Metric Packets: 30,523 + Metric Parse Errors: 0 + Service Check Packets: 0 + Service Check Parse Errors: 0 + Udp Bytes: 124,635 + Udp Packet Reading Errors: 0 + Udp Packets: 731 + Uds Bytes: 2,957,433 + Uds Origin Detection Errors: 0 + Uds Packet Reading Errors: 0 + Uds Packets: 11,563 + Unterminated Metric Errors: 0 + ``` + + + + +Traffic verification can also be accomplished using the `netstat` command line utility from a consul-server expected to be submitting +metrics data to Datadog. + + + Using netstat requires privileged container permissions to install open-bsd networking tools on the consul-server for testing. + + +| Transport | Command | Pod | Container | +|:-----------------|-----------|---------------|-----------| +| `UDP` \|\| `UDS` | `netstat` | consul-server | consul | + + + + + ```shell + $ netstat -nup | grep "172.28.13.12:8125.*ESTABLISHED + udp 0 0 127.0.0.1:53874 127.0.0.1:8125 ESTABLISHED 23176/consul + ``` + + + + + + ```shell + $ netstat -x + Active UNIX domain sockets (w/o servers) + Proto RefCnt Flags Type State I-Node Path + unix 2 [ ] DGRAM CONNECTED 15952473 + unix 2 [ ] DGRAM 15652537 @9d10c + ``` + + + + + +UDS provides the additional capability for verification by sending a test metrics packet to the Unix Socket configured. + + + Using netcat (nc) requires privileged container permissions to install open-bsd networking tools on the consul-server for testing. + + +| Transport | Command | Pod | Container | +|:----------|---------|---------------|-----------| +| `UDS` | `nc` | consul-server | consul | + + + + ```shell + $ echo -n "custom.metric.name:1|c" | nc -U -u -w1 /var/run/datadog/dsd.socket + Bound on /tmp/nc-IjJkoG/recv.sock + ``` + + + +#### Use Case + +DogstatsD integration provides full-scope metrics collection from Consul, and minimizes access control configuration requirements as traffic +flow is outbound (toward the Datadog Agent) as opposed to inbound (toward the `/v1/agent/metrics/` API endpoint). + + +#### Metrics Data Collected + + - Full list of metrics sent via DogstatsD consists of those listed in the [Agent Telemetry](https://developer.hashicorp.com/consul/docs/agent/telemetry) documentation. + + +## Datadog Checks: Official Consul Integration + +The Datadog Agent package includes official third-party integrations for built-in availability upon agent deployment. + +The Consul Integration Datadog checks provided some additional metric verification checks that leverage Consul's built-in feature-set, and help monitor Consul +during normal operation beyond that of Consul's available metrics. + +See the below [table](#additional-integration-checks-performed) for an outline of the features added by the official integration. + + + Currently, the annotations configured by the Helm overrides with Consul RPC TLS enabled + assume server and ca certificate secrets are shared with the Datadog agent release namespace and mount the valid tls.crt, tls.key, + and ca.crt secret volumes at the /etc/datadog-agent/conf.d/consul.d/certs path on the Datadog Agent, agent container. + + +### Helm Chart Configuration + + + + + Consul Helm Chart Overrides + + ```yaml + global: + tls: + enabled: true + enableAutoEncrypt: true + acls: + manageSystemACLs: true + metrics: + enabled: true + enableAgentMetrics: true + datadog: + enabled: true + namespace: "datadog" + ``` + + + Consul `server-statefulset.yaml` annotations + + ```yaml + "ad.datadoghq.com/consul.checks": | + { + "consul": { + "init_config": {}, + "instances": [ + { + "url": "https://consul-server.consul.svc:8501", + "tls_cert": "/etc/datadog-agent/conf.d/consul.d/certs/tls.crt", + "tls_private_key": "/etc/datadog-agent/conf.d/consul.d/certs/tls.key", + "tls_ca_cert": "/etc/datadog-agent/conf.d/consul.d/ca/tls.crt", + "use_prometheus_endpoint": true, + "acl_token": "ENC[k8s_secret@consul/consul-datadog-agent-metrics-acl-token/token]", + "new_leader_checks": true, + "network_latency_checks": true, + "catalog_checks": true, + "auth_type": "basic" + } + ] + } + } + ``` + + + + +### Additional Integration Checks Performed + +| Consul Component | Description | API Endpoint(s) | +|------------------|-----------------------------------------------------|----------------------------------------------------------------------| +| Agent | Agent Metadata (i.e., version) | `/v1/agent/self` | +| Metrics | Prometheus formatted metrics | `/v1/agent/metrics` | +| Serf | Events and Membership Flaps | `/v1/health/service/consul` `/v1/agent/self` | +| Raft | Monitors Raft peer information and leader elections | `/v1/status/leader` `/v1/status/peers` | +| Catalog Services | Service Health Status and Node Count | `/v1/catalog/services` `/v1/health/state/any` | +| Catalog Nodes | Node Service Count and Health Status | `/v1/health/state/any` `/v1/health/service/` | +| Consul Latency | Consul LAN + WAN Coordinate Latency Calculations | `/v1/agent/self` `/v1/coordinate/nodes` `/v1/coordinate/datacenters` | + +#### Use Case + +This integration is primarily for basic Consul monitoring with focus on the service discovery. + +#### Metrics Data Collected + +The list of Consul's Prometheus metrics scraped and mapped by this method are listed in the latest [metrics.py](https://github.com/DataDog/integrations-core/blob/master/consul/datadog_checks/consul/metrics.py) of the integration source code. + +To understand how Consul Latency metrics are calculated, review the [Consul Network Coordinates](https://developer.hashicorp.com/consul/docs/architecture/coordinates) documentation. + +Review the [Datadog Documentation](https://docs.datadoghq.com/integrations/consul/?tab=containerized#data-collected) for the full description of Metrics data collected. + +## Openmetrics Prometheus + +For Datadog agents at or above v6.5.0, OpenMetrics and Prometheus checks are available to scrape Kubernetes application Prometheus endpoints. + +This method implements the collection via Openmetrics as that is fully supported for Prometheus text format and is accomplished using pod annotations as demonstrated below. + + + Enabling OpenMetrics collection via Datadog by design removes the prometheus.io/path and prometheus.io/port annotations from the consul-server statefulset deployment to allow Datadog + to scrape the agent's metrics API endpoint using either RPC TLS and Consul ACLs as necessary. + + + + Currently, the annotations configured by the Helm overrides with Consul RPC TLS enabled + assume server and ca certificate secrets are shared with the Datadog agent release namespace and mount the valid tls.crt, tls.key, + and ca.crt secret volumes at the /etc/datadog-agent/conf.d/consul.d/certs path on the Datadog Agent, agent container. + + +### Helm Chart Configuration + + + + + Consul Helm Chart Overrides + + ```yaml + global: + tls: + enabled: true + enableAutoEncrypt: true + acls: + manageSystemACLs: true + metrics: + enabled: true + enableAgentMetrics: true + datadog: + enabled: true + namespace: "datadog" + openMetricsPrometheus: + enabled: true + ``` + + Consul `server-statefulset.yaml` annotations + + ```yaml + ad.datadoghq.com/consul.checks: | + { + "openmetrics": { + "init_config": {}, + "instances": [ + { + "openmetrics_endpoint": "https://consul-server.consul.svc:8501/v1/agent/metrics?format=prometheus", + "tls_cert": "/etc/datadog-agent/conf.d/consul.d/certs/tls.crt", + "tls_private_key": "/etc/datadog-agent/conf.d/consul.d/certs/tls.key", + "tls_ca_cert": "/etc/datadog-agent/conf.d/consul.d/ca/tls.crt", + "headers": { + "X-Consul-Token": "ENC[k8s_secret@consul/consul-datadog-agent-metrics-acl-token/token]" + }, + "namespace": "consul", + "metrics": [ ".*" ] + } + ] + } + } + ``` + + + + +#### Use Case + +This method of integration is useful for Prometheus-enabled scrapes with further customization of the collected data. + +By default, all metrics pulled using this method scrape Consul metrics using the `/v1/agent/metrics?format=prometheus` API query, and are considered to be custom metrics. + +Use of this method maps to Datadog as described in [Mapping Prometheus Metrics to Datadog Metrics](https://docs.datadoghq.com/integrations/guide/prometheus-metrics/?tab=latestversion). The following table summarizing how these metrics map to each other. + +| OpenMetrics metric type | Datadog metric type | +|:------------------------|:-----------------------------------| +| `Gauge` | `gauge` | +| `Counter` | `count` | +| Histogram: `_count ` | `count.count` | +| Histogram: `_sum` | `count.sum` | +| Histogram: `_bucket` | `count.bucket` \|\| `distribution` | +| Summary: `_count` | `count.count` | +| Summary: `_sum` | `count.sum` | +| Summary: `sample` | `gauge.quantile` | + + +#### Metrics Data Collected + +The integration, by default, uses a wildcard (`".*"`) to collect **_all_** metrics emitted from the `/v1/agent/metrics` endpoint. + +Please refer to the [Agent Telemetry](https://developer.hashicorp.com/consul/docs/agent/telemetry) documentation for a full list and desription of the metrics data collected. diff --git a/website/data/docs-nav-data.json b/website/data/docs-nav-data.json index 84c08a5c39..fd53b8e732 100644 --- a/website/data/docs-nav-data.json +++ b/website/data/docs-nav-data.json @@ -1406,6 +1406,10 @@ "path": "k8s/deployment-configurations/vault/wan-federation" } ] + }, + { + "title": "Datadog metrics", + "path": "k8s/deployment-configurations/datadog" } ] },