consul/website/source/docs/guides/performance.html.markdown

87 lines
4.2 KiB
Markdown
Raw Normal View History

---
layout: "docs"
page_title: "Server Performance"
sidebar_current: "docs-guides-performance"
description: |-
Consul requires different amounts of compute resources, depending on cluster size and expected workload. This guide provides guidance on choosing compute resources.
---
# Server Performance
Since Consul servers run a [consensus protocol](/docs/internals/consensus.html) to
process all write operations and are contacted on nearly all read operations, server
performance is critical for overall throughput and health of a Consul cluster. Servers
are generally I/O bound for writes because the underlying Raft log store performs a sync
to disk every time an entry is appended. Servers are generally CPU bound for reads since
reads work from a fully in-memory data store that is optimized for concurrent access.
<a name="minimum"></a>
## Minimum Server Requirements
In Consul 0.7, the default server [performance parameters](/docs/agent/options.html#performance)
were tuned to allow Consul to run reliably (but relatively slowly) on a server cluster of three
[AWS t2.micro](https://aws.amazon.com/ec2/instance-types/) instances. These thresholds
were determined empirically using a leader instance that was under sufficient read, write,
and network load to cause it to permanently be at zero CPU credits, forcing it to the baseline
performance mode for that instance type. Real-world workloads typically have more bursts of
activity, so this is a conservative and pessimistic tuning strategy.
This default was chosen based on feedback from users, many of whom wanted a low cost way
to run small production or development clusters with low cost compute resources, at the
expense of some performance in leader failure detection and leader election times.
The default performance configuration is equivalent to this:
```javascript
{
"performance": {
"raft_multiplier": 5
}
}
```
<a name="production"></a>
## Production Server Requirements
When running Consul 0.7 and later in production, it is recommended to configure the server
[performance parameters](/docs/agent/options.html#performance) back to Consul's original
high-performance settings. This will let Consul servers detect a failed leader and complete
leader elections much more quickly than the default configuration which extends key Raft
timeouts by a factor of 5, so it quite slow during these events.
The high performance configuration is simple and looks like this:
```javascript
{
"performance": {
"raft_multiplier": 1
}
}
```
It's best to benchmark with a realistic workload when choosing a production server for Consul.
Here are some general recommendations:
* For write-heavy workloads, disk speed on the servers is key for performance. Use SSDs or
another fast disk technology for the best write throughput.
* <a name="last-contact"></a>Spurious leader elections can be caused by networking issues between
the servers or lack of CPU. Users in cloud environments often bump their servers up to the next
instance class with improved networking and CPU until leader elections stabilize, and in Consul
0.7 or later the [performance parameters](/docs/agent/options.html#performance) configuration
now gives you the option to trade off performance instead of upsizing servers. You can use the
[`consul.raft.leader.lastContact` telemetry](/docs/agent/telemetry.html#last-contact) to help
observe how the Raft timing is performing and decide if de-tuning Raft performance or adding
more powerful servers might be needed.
* For DNS-heavy workloads, configuring all Consul agents in a cluster with the
[`allow_stale`](/docs/agent/options.html#allow_stale) configuration option will allow reads to
scale across all Consul servers, not just the leader. See [Stale Reads](/docs/guides/dns-cache.html#stale)
in the [DNS Caching](/docs/guides/dns-cache.html) guide for more details. It's also good to set
reasonable, non-zero [DNS TTL values](/docs/guides/dns-cache.html#ttl) if your clients will
respect them.
* In other applications that perform high volumes of reads against Consul, consider using the
[stale consistency mode](/docs/agent/http.html#consistency) available to allow reads to scale
across all the servers and not just be forwarded to the leader.