consul/website/content/docs/architecture/capacity-planning.mdx

189 lines
15 KiB
Markdown

---
layout: docs
page_title: Consul capacity planning
description: >-
Learn how to maintain your Consul cluster in a healthy state by provisioning the correct resources.
---
# Consul capacity planning
This page describes our capacity planning recommendations when deploying and maintaining a Consul cluster in production. When your organization designs a production environment, you should consider your available resources and their impact on network capacity.
## Introduction
It is important to select the correct size for your server instances. Consul server environments have a standard set of minimum requirements. However, these requirements may vary depending on what you are using Consul for.
Insufficient resource allocations may cause network issues or degraded performance in general. When a slowdown in performance results in a Consul leader node that is unable to respond to requests in sufficient time, the Consul cluster triggers a new leader election. Consul pauses all network requests and Raft updates until the election ends.
## Hardware requirements
The minimum hardware requirements for Consul servers in production clusters as recommended by the [reference architecture](/consul/tutorials/production-deploy/reference-architecture#hardware-sizing-for-consul-servers) are:
| CPU | Memory | Disk Capacity | Disk IO | Disk Throughput | Avg Round-Trip-Time | 99% Round-Trip-Time |
| --------- | ------------ | ------------- | ----------- | --------------- | ------------------- | ------------------- |
| 8-16 core | 32-64 GB RAM | 200+ GB | 7500+ IOPS | 250+ MB/s | Lower than 50ms | Lower than 100ms |
For the major cloud providers, we recommend starting with one of the following instances that meet the minimum requirements. Then scale up as needed. We also recommend avoiding "burstable" CPU and storage options where performance may drop after a consistent load.
| Provider | Size | Instance/VM Types | Disk Volume Specs |
| --------- | ----- | ------------------------------------- | --------------------------------- |
| **AWS** | Large | `m5.2xlarge`, `m5.4xlarge` | 200+GB `gp3`, 10000 IOPS, 250MB/s |
| **Azure** | Large | `Standard_D8s_v3`, `Standard_D16s_v3` | 2048GB `Premium SSD`, 7500 IOPS, 200MB/s |
| **GCP** | Large | `n2-standard-8`, `n2-standard-16` | 1000GB `pd-ssd`, 30000 IOPS, 480MB/s |
For HCP Consul Dedicated, cluster size is measured in the number of service instances supported. Find out more information in the [HCP Consul Dedicated pricing page](https://cloud.hashicorp.com/products/consul/pricing).
## Workload input and output requirements
Workloads are any actions that interact with the Consul cluster. These actions consist of key/value reads and writes, service registrations and deregistrations, adding or removing Consul client agents, and more.
Input/output operations per second (IOPS) is a unit of measurement for the amount of reads and writes to non-adjacent storage locations.
For high workloads, ensure that the Consul server disks support a [high number of IOPS](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-io-characteristics.html#ebs-io-iops) to keep up with the rapid Raft log update rate.
Unlike bare-metal environments, IOPS for virtual instances in cloud environments is often tied to storage sizing. More storage GBs typically grants you more IOPS. Therefore, we recommend deploying on [IOPS-optimized instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/provisioned-iops.html).
Consul server agents are generally I/O bound for writes and CPU bound for reads. For additional tuning recommendations, refer to [raft tuning](#raft-tuning).
## Memory requirements
You should allocate RAM for server agents so that they contain 2 to 4 times the working set size. You can determine the working set size of a running cluster by noting the value of `consul.runtime.alloc_bytes` in the leader node's telemetry data. Inspect your monitoring solution for the telemetry value, or run the following commands with the [jq](https://stedolan.github.io/jq/download/) tool installed on your Consul leader instance.
<Tip>
For Kubernetes, execute the command from the leader pod. `jq` is available in the Consul server containers.
</Tip>
Set `$CONSUL_HTTP_TOKEN` to an ACL token with valid permissions, then retrieve the working set size.
```shell-session
$ curl --silent --header "X-Consul-Token: $CONSUL_HTTP_TOKEN" http://127.0.0.1:8500/v1/agent/metrics | jq '.Gauges[] | select(.Name=="consul.runtime.alloc_bytes") | .Value'`
616017920
```
## Kubernetes storage requirements
When you set up persistent volumes (PV) resources, you should define the correct server storage class parameter because the defaults are likely insufficient in performance. To set the [storageClass Helm chart parameter](/consul/docs/k8s/helm#v-server-storageclass), refer to the [Kubernetes documentation on storageClasses](https://kubernetes.io/docs/concepts/storage/storage-classes/) for more information about your specific cloud provider.
## Read and write heavy workload recommendations
In production, your use case may lead to Consul performing read-heavy workloads, write-heavy workloads, or both. Refer to the following table for specific resource recommendations for these types of workloads.
| Workload type | Instance Recommendations | Workload element examples | Enterprise Feature Recommendations |
| ------------- | ------------------------- | ------------------------ | ------------------------ |
| Read-heavy | Instances of type `m5.4xlarge (AWS)`, `Standard_D16s_v3 (Azure)`, `n2-standard-16 (GCP)` | Raft RPCs calls, DNS queries, key/value retrieval | [Read replicas](/consul/docs/enterprise/read-scale) |
| Write-heavy | IOPS performance of `10 000+` | Consul agent joins and leaves, services registration and deregistration, key/value writes | [Network segments](/consul/docs/enterprise/network-segments/network-segments-overview) |
For recommendations on troubleshooting issues with read-heavy or write-heavy workloads, refer to [Consul at Scale](/consul/docs/architecture/scale#resource-usage-and-metrics-recommendations).
## Monitor performance
Monitoring is critical to ensure that your Consul datacenter has sufficient resources to continue operations. A proactive monitoring strategy helps you find problems in your network before they impact your deployments.
We recommend completing the [Monitor Consul server health and performance with metrics and logs](/consul/tutorials/observe-your-network/server-metrics-and-logs) tutorial as a starting point for Consul metrics and telemetry. The following tutorials guide you through specific monitoring solutions for your Consul cluster.
- [Monitor Consul server health and performance with metrics and logs](/consul/tutorials/observe-your-network/server-metrics-and-logs)
- [Observe Consul service mesh traffic](/consul/tutorials/get-started-kubernetes/kubernetes-gs-observability)
### Important metrics
In production environments, create baselines for your Consul cluster's metrics. After you discover the baselines, you will be able to define alerts and receive notifications when there are unexpected values. For a detailed explanation on the metrics and their values, refer to [Consul Agent telemetry](/consul/docs/agent/telemetry).
### Transaction metrics
These metrics indicate how long it takes to complete write operations in various parts of the Consul cluster.
- [`consul.kvs.apply`](/consul/docs/agent/monitor/telemetry#transaction-timing) measures the time it takes to complete an update to the KV store.
- [`consul.txn.apply`](/consul/docs/agent/monitor/telemetry#transaction-timing) measures the time spent applying a transaction operation.
- [`consul.raft.apply`](/consul/docs/agent/monitor/telemetry#transaction-timing) counts the number of Raft transactions applied during the measurement interval. This metric is only reported on the leader.
- [`consul.raft.commitTime`](/consul/docs/agent/monitor/telemetry#transaction-timing) measures the time it takes to commit a new entry to the Raft log on disk on the leader.
### Memory metrics
These performance indicators can help you diagnose if the current instance sizing is unable to handle the workload.
- [`consul.runtime.alloc_bytes`](/consul/docs/agent/monitor/telemetry#memory-usage) measures the number of bytes allocated by the Consul process.
- [`consul.runtime.sys_bytes`](/consul/docs/agent/monitor/telemetry#memory-usage) measures the total number of bytes of memory obtained from the OS.
- [`consul.runtime.heap_objects`](/consul/docs/agent/monitor/telemetry#metrics-reference) measures the number of objects allocated on the heap and is a general memory pressure indicator.
### Leadership metrics
Leadership changes are not a cause for concern but frequent changes may be a symptom of a deeper problem. Frequent elections or leadership changes may indicate network issues between the Consul servers, or the Consul servers are unable to keep up with the load.
- [`consul.raft.leader.lastContact`](/consul/docs/agent/monitor/telemetry#leadership-changes) measures the time since the leader was last able to contact the follower nodes when checking its leader lease.
- [`consul.raft.state.candidate`](/consul/docs/agent/monitor/telemetry#leadership-changes) increments whenever a Consul server starts an election.
- [`consul.raft.state.leader`](/consul/docs/agent/monitor/telemetry#leadership-changes) increments whenever a Consul server becomes a leader.
- [`consul.server.isLeader`](/consul/docs/agent/monitor/telemetry#leadership-changes) tracks whether a server is a leader.
### Network metrics
Network activity and RPC count measurements indicate the current load created from a Consul agent, including when the load becomes high enough to be rate limited. If an unusually high RPC count occurs, you should investigate before it overloads the cluster.
- [`consul.client.rpc`](/consul/docs/agent/monitor/telemetry#network-activity-rpc-count) increments whenever a Consul agent in client mode makes an RPC request to a Consul server.
- [`consul.client.rpc.exceeded`](/consul/docs/agent/monitor/telemetry#network-activity-rpc-count) increments whenever a Consul agent in client mode makes an RPC request to a Consul server gets rate limited by that agent's limits configuration.
- [`consul.client.rpc.failed`](/consul/docs/agent/monitor/telemetry#network-activity-rpc-count) increments whenever a Consul agent in client mode makes an RPC request to a Consul server and fails.
## Network constraints and alternate approaches
If it is impossible for you to allocate the required resources, you can make changes to Consul's performance so that it operates with lower speed or resilience. These changes ensure that your cluster remains within its resource capacity.
- Soft limits prevent your cluster from degrading due to overload.
- Raft tuning lets you compensate for unfavorable environments.
### Soft limits
The recommended maximum size for a single datacenter is 5,000 Consul client agents. This recommendation is based on a standard, non-tuned environment and considers a blast radius's risk management factor. The maximum number of agents may be lower, depending on how you use Consul.
If you require more than 5,000 client agents, you should break up the single Consul datacenter into multiple smaller datacenters.
- When the nodes are spread across separate physical locations such as different regions, you can model multiple datacenter structures based on physical locations.
- Use [network segments](/consul/docs/enterprise/network-segments/network-segments-overview) in a single available zone or region to lower overall resource usage in a single datacenter.
When deploying [Consul in Kubernetes](/consul/docs/k8s), we recommend you set both _requests_ and _limits_ in the Helm chart. Refer to the [Helm chart documentation](/consul/docs/k8s/helm#v-server-resources) for more information.
- Requests allocate the required resources for your Consul workloads.
- Limits prevent your pods from being terminated and restarted if they consume more resources than requested and Kubernetes needs to reclaim these resources. Limits can prevent outage situations where the Consul leader's container gets terminated and redeployed due to resource constraints.
The following is an example Helm configuration that allocates 16 CPU cores and 64 gigabytes of memory:
<CodeBlockConfig hideClipboard>
```yaml
global:
image: "hashicorp/consul"
## ...
resources:
requests:
memory: '64G'
cpu: '16000m'
limits:
memory: '64G'
cpu: '16000m'
```
</CodeBlockConfig>
### Raft tuning
Consul uses the [Raft consensus algorithm](/consul/docs/architecture/consensus) to provide consistency.
You may need to adjust Raft to suit your specific environment. Adjust the [`raft_multiplier` configuration](/consul/docs/agent/config/config-files#raft_multiplier) to define the trade-off between leader stability and time to recover from a leader failure.
- A lower multiplier minimizes failure detection and election time, but it may trigger frequently in high latency situations.
- A higher multiplier reduces the chances that failures cause leadership churn, but your cluster takes longer to detect real failures and restore availability.
The value of `raft_multiplier` has a default value of 5. It is a scaling factor setting that directly affects the following parameters:
| Parameter name | Default value | Derived from |
| --- | --- | --- |
| HeartbeatTimeout | 5000ms | 5 x 1000ms |
| ElectionTimeout | 5000ms | 5 x 1000ms |
| LeaderLeaseTimeout | 2500ms | 5 x 500ms |
You can use the telemetry from [`consul.raft.leader.lastContact`](/consul/docs/agent/telemetry#leadership-changes) to observe Raft timing performance.
Wide networks with more latency perform better with larger values of `raft_multiplier`, but cluster failure detection will take longer. If your network operates with low latency, we recommend that you do not set the Raft multiplier higher than 5. Instead, you should either replace the servers with more powerful ones or minimize the network latency between nodes.
We recommend you start from a baseline and perform [chaos engineering testing](/consul/tutorials/resiliency/introduction-chaos-engineering?in=consul%2Fresiliency) with different values for the Raft multiplier to find the acceptable time for problem detection and recovery for the cluster. Then scale the cluster and its dedicated resources with the number of workloads handled. This approach gives you the best balance between pure resource growth and pure Raft tuning strategies because it lets you use Raft tuning as a backup plan if you cannot scale your resources.
The types of workloads the Consul cluster handles also play an important role in Raft tuning. For example, if your Consul clusters are mostly static and do not handle many events, you should increase your Raft multiplier instead of scaling your resources because the risk of an important event happening while the cluster is converging or re-electing a leader is lower.