Add Deployment Guide and update links (#4487)

* Adds Deployment Guide and update links
* Fixes releases link
* Re-organisation of content
* Cuts down "deployment" doc (which should focus on Reference Architecture) by moving raft and performance tuning to the Server Performance page which already covers some of this.
* Moves backups from "deployment" doc (which should focus on Reference Architecture) to "deployment-guide"
* Cleans up some notes and add single DC diagram
* Removes old link to deployment guide from nav
* Corrects minor styling, formatting, and grammar
pull/4750/head
Dan Brown 2018-10-03 22:37:36 +01:00 committed by Geoffrey Grosenbach
parent 9da8ad8f3d
commit d3b6750c3b
7 changed files with 401 additions and 144 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 199 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 118 KiB

View File

@ -0,0 +1,279 @@
---
layout: "docs"
page_title: "Consul Deployment Guide"
sidebar_current: "docs-guides-deployment-guide"
description: |-
This deployment guide covers the steps required to install and
configure a single HashiCorp Consul cluster as defined in the
Consul Reference Architecture
product_version: 1.2
---
# Consul Deployment Guide
This deployment guide covers the steps required to install and configure a single HashiCorp Consul cluster as defined in the [Consul Reference Architecture](/docs/guides/deployment.html).
These instructions are for installing and configuring Consul on Linux hosts running the systemd system and service manager.
## Reference Material
This deployment guide is designed to work in combination with the [Consul Reference Architecture](/docs/guides/deployment.html). Although not a strict requirement to follow the Consul Reference Architecture, please ensure you are familiar with the overall architecture design; for example installing Consul server agents on multiple physical or virtual (with correct anti-affinity) hosts for high-availability.
## Overview
To provide a highly-available single cluster architecture, we recommend Consul server agents be deployed to more than one host, as shown in the [Consul Reference Architecture](/docs/guides/deployment.html).
![Reference Diagram](/assets/images/consul-arch-single.png "Reference Diagram")
These setup steps should be completed on all Consul hosts.
- [Download Consul](#download-consul)
- [Install Consul](#install-consul)
- [Configure systemd](#configure-systemd)
- Configure Consul [(server)](#configure-consul-server-) or [(client)](#configure-consul-client-)
- [Start Consul](#start-consul)
## Download Consul
Precompiled Consul binaries are available for download at [https://releases.hashicorp.com/consul/](https://releases.hashicorp.com/consul/) and Consul Enterprise binaries are available for download by following the instructions made available to HashiCorp Consul customers.
You should perform checksum verification of the zip packages using the SHA256SUMS and SHA256SUMS.sig files available for the specific release version. HashiCorp provides [a guide on checksum verification](https://www.hashicorp.com/security.html) for precompiled binaries.
```text
CONSUL_VERSION="1.2.0"
curl --silent --remote-name https://releases.hashicorp.com/consul/${CONSUL_VERSION}/consul_${CONSUL_VERSION}_linux_amd64.zip
curl --silent --remote-name https://releases.hashicorp.com/consul/${CONSUL_VERSION}/consul_${CONSUL_VERSION}_SHA256SUMS
curl --silent --remote-name https://releases.hashicorp.com/consul/${CONSUL_VERSION}/consul_${CONSUL_VERSION}_SHA256SUMS.sig
```
## Install Consul
Unzip the downloaded package and move the `consul` binary to `/usr/local/bin/`. Check `consul` is available on the system path.
```text
unzip consul_${CONSUL_VERSION}_linux_amd64.zip
sudo chown root:root consul
sudo mv consul /usr/local/bin/
consul --version
```
The `consul` command features opt-in autocompletion for flags, subcommands, and arguments (where supported). Enable autocompletion.
```text
consul -autocomplete-install
complete -C /usr/local/bin/consul consul
```
Create a unique, non-privileged system user to run Consul and create its data directory.
```text
sudo useradd --system --home /etc/consul.d --shell /bin/false consul
sudo mkdir --parents /opt/consul
sudo chown --recursive consul:consul /opt/consul
```
## Configure systemd
Systemd uses [documented sane defaults](https://www.freedesktop.org/software/systemd/man/systemd.directives.html) so only non-default values must be set in the configuration file.
Create a Consul service file at /etc/systemd/system/consul.service.
```text
sudo touch /etc/systemd/system/consul.service
```
Add this configuration to the Consul service file:
```text
[Unit]
Description="HashiCorp Consul - A service mesh solution"
Documentation=https://www.consul.io/
Requires=network-online.target
After=network-online.target
ConditionFileNotEmpty=/etc/consul.d/consul.hcl
[Service]
User=consul
Group=consul
ExecStart=/usr/local/bin/consul agent -config-dir=/etc/consul.d/
ExecReload=/usr/local/bin/consul reload
KillMode=process
Restart=on-failure
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
```
The following parameters are set for the `[Unit]` stanza:
- [`Description`](https://www.freedesktop.org/software/systemd/man/systemd.unit.html#Description=) - Free-form string describing the consul service
- [`Documentation`](https://www.freedesktop.org/software/systemd/man/systemd.unit.html#Documentation=) - Link to the consul documentation
- [`Requires`](https://www.freedesktop.org/software/systemd/man/systemd.unit.html#Requires=) - Configure a requirement dependency on the network service
- [`After`](https://www.freedesktop.org/software/systemd/man/systemd.unit.html#Before=) - Configure an ordering dependency on the network service being started before the consul service
- [`ConditionFileNotEmpty`](https://www.freedesktop.org/software/systemd/man/systemd.unit.html#ConditionArchitecture=) - Check for a non-zero sized configuration file before consul is started
The following parameters are set for the `[Service]` stanza:
- [`User`, `Group`](https://www.freedesktop.org/software/systemd/man/systemd.exec.html#User=) - Run consul as the consul user
- [`ExecStart`](https://www.freedesktop.org/software/systemd/man/systemd.service.html#ExecStart=) - Start consul with the `agent` argument and path to the configuration file
- [`ExecReload`](https://www.freedesktop.org/software/systemd/man/systemd.service.html#ExecReload=) - Send consul a reload signal to trigger a configuration reload in consul
- [`KillMode`](https://www.freedesktop.org/software/systemd/man/systemd.kill.html#KillMode=) - Treat consul as a single process
- [`Restart`](https://www.freedesktop.org/software/systemd/man/systemd.service.html#RestartSec=) - Restart consul unless it returned a clean exit code
- [`LimitNOFILE`](https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Process%20Properties) - Set an increased Limit for File Descriptors
The following parameters are set for the `[Install]` stanza:
- [`WantedBy`](https://www.freedesktop.org/software/systemd/man/systemd.unit.html#WantedBy=) - Creates a weak dependency on consul being started by the multi-user run level
## Configure Consul (server)
Consul uses [documented sane defaults](/docs/agent/options.html) so only non-default values must be set in the configuration file. Configuration can be read from multiple files and is loaded in lexical order. See the [full description](/docs/agent/options.html) for more information about configuration loading and merge semantics.
Consul server agents typically require a superset of configuration required by Consul client agents. We will specify common configuration used by all Consul agents in `consul.hcl` and server specific configuration in `server.hcl`.
### General configuration
Create a configuration file at `/etc/consul.d/consul.hcl`:
```text
sudo mkdir --parents /etc/consul.d
sudo touch /etc/consul.d/consul.hcl
sudo chown --recursive consul:consul /etc/consul.d
sudo chmod 640 /etc/consul.d/consul.hcl
```
Add this configuration to the `consul.hcl` configuration file:
~> **NOTE** Replace the `datacenter` parameter value with the identifier you will use for the datacenter this Consul cluster is deployed in. Replace the `encrypt` parameter value with the output from running `consul keygen` on any host with the `consul` binary installed.
```hcl
datacenter = "dc1"
data_dir = "/opt/consul"
encrypt = "Luj2FZWwlt8475wD1WtwUQ=="
```
- [`datacenter`](/docs/agent/options.html#_datacenter) - The datacenter in which the agent is running.
- [`data_dir`](/docs/agent/options.html#_data_dir) - The data directory for the agent to store state.
- [`encrypt`](/docs/agent/options.html#_encrypt) - Specifies the secret key to use for encryption of Consul network traffic.
### ACL configuration
The [ACL](/docs/guides/acl.html) guide provides instructions on configuring and enabling ACLs.
### Cluster auto-join
The `retry_join` parameter allows you to configure all Consul agents to automatically form a cluster using a common Consul server accessed via DNS address, IP address or using Cloud Auto-join. This removes the need to manually join the Consul cluster nodes together.
Add the retry_join parameter to the `consul.hcl` configuration file:
~> **NOTE** Replace the `retry_join` parameter value with the correct DNS address, IP address or [cloud auto-join configuration](/docs/agent/cloud-auto-join.html) for your environment.
```hcl
retry_join = "172.16.0.11"
```
- [`retry_join`](/docs/agent/options.html#retry-join) - Address of another agent to join upon starting up.
### Performance stanza
The [`performance`](/docs/agent/options.html#performance) stanza allows tuning the performance of different subsystems in Consul.
Add the performance configuration to the `consul.hcl` configuration file:
```hcl
performance {
raft_multiplier = 1
}
```
- [`raft_multiplier`](/docs/agent/options.html#raft_multiplier) - An integer multiplier used by Consul servers to scale key Raft timing parameters. Setting this to a value of 1 will configure Raft to its highest-performance mode, equivalent to the default timing of Consul prior to 0.7, and is recommended for production Consul servers.
For more information on Raft tuning and the `raft_multiplier` setting, see the [server performance](/docs/guides/performance.html) documentation.
### Telemetry stanza
The [`telemetry`](/docs/agent/options.html#telemetry) stanza specifies various configurations for Consul to publish metrics to upstream systems.
If you decide to configure Consul to publish telemtery data, you should review the [telemetry configuration section](/docs/agent/options.html#telemetry) of our documentation.
### TLS configuration
The [Creating Certificates](/docs/guides/creating-certificates.html) guide provides instructions on configuring and enabling TLS.
### Server configuration
Create a configuration file at `/etc/consul.d/server.hcl`:
```text
sudo mkdir --parents /etc/consul.d
sudo touch /etc/consul.d/server.hcl
sudo chown --recursive consul:consul /etc/consul.d
sudo chmod 640 /etc/consul.d/server.hcl
```
Add this configuration to the `server.hcl` configuration file:
~> **NOTE** Replace the `bootstrap_expect` value with the number of Consul servers you will use; three or five [is recommended](/docs/internals/consensus.html#deployment-table).
```hcl
server = true
bootstrap_expect = 3
```
- [`server`](/docs/agent/options.html#_server) - This flag is used to control if an agent is in server or client mode.
- [`bootstrap-expect`](/docs/agent/options.html#_bootstrap_expect) - This flag provides the number of expected servers in the datacenter. Either this value should not be provided or the value must agree with other servers in the cluster.
### Consul UI
Consul features a web-based user interface, allowing you to easily view all services, nodes, intentions and more using a graphical user interface, rather than the CLI or API.
~> **NOTE** You should consider running the Consul UI on select Consul hosts rather than all hosts.
Optionally, add the UI configuration to the `server.hcl` configuration file to enable the Consul UI:
```hcl
ui = true
```
## Configure Consul (client)
Consul client agents typically require a subset of configuration required by Consul server agents. All Consul clients can use the `consul.hcl` file created when [configuring the Consul servers](#general-configuration). If you have added host-specific configuration such as identifiers, you will need to set these individually.
## Start Consul
Enable and start Consul using the systemctl command responsible for controlling systemd managed services. Check the status of the consul service using systemctl.
```text
sudo systemctl enable consul
sudo systemctl start consul
sudo systemctl status consul
```
## Backups
Creating server backups is an important step in production deployments. Backups provide a mechanism for the server to recover from an outage (network loss, operator error, or a corrupted data directory). All agents write to the `-data-dir` before commit. This directory persists the local agents state and — in the case of servers — it also holds the Raft information.
Consul provides the [snapshot](/docs/commands/snapshot.html) command which can be run using the CLI command or the API. The `snapshot` command saves the point-in-time snapshot of the state of the Consul servers which includes KV entries, the service catalog, prepared queries, sessions, and ACL.
With [Consul Enterprise](/docs/commands/snapshot/agent.html), the `snapshot agent` command runs periodically and writes to local or remote storage (such as Amazon S3).
By default, all snapshots are taken using `consistent` mode where requests are forwarded to the leader which verifies that it is still in power before taking the snapshot. Snapshots will not be saved if the clusted is degraded or if no leader is available. To reduce the burden on the leader, it is possible to [run the snapshot](/docs/commands/snapshot/save.html) on any non-leader server using `stale` consistency mode:
```text
consul snapshot save -stale backup.snap
```
This spreads the load across nodes at the possible expense of losing full consistency guarantees. Typically this means that a very small number of recent writes may not be included. The omitted writes are typically limited to data written in the last `100ms` or less from the recovery point. This is usually suitable for disaster recovery. However, the system cant guarantee how stale this may be if executed against a partitioned server.
## Next Steps
- Read [Monitoring Consul with Telegraf](/docs/guides/monitoring-telegraf.html)
for an example guide to monitoring Consul for improved operational visibility.
- Read [Outage Recovery](/docs/guides/outage.html) to learn the steps required
for recovery from a Consul outage due to a majority of server nodes in a
datacenter being lost.
- Read [Server Performance](/docs/guides/performance.html) to learn about
additional configuration that benefits production deployments.

View File

@ -1,62 +1,71 @@
---
layout: "docs"
page_title: "Production Deployment"
sidebar_current: "docs-guides-deployment"
page_title: "Consul Reference Architecture"
sidebar_current: "docs-guides-reference-architecture"
description: |-
Best practice approaches for Consul production architecture.
This document provides recommended practices and a reference
architecture for HashiCorp Consul production deployments.
product_version: 1.2
---
# Consul Production Deployment
# Consul Reference Architecture
The goal of this document is to recommend best practice approaches for Consul production deployments. The guide provides recommendations on system requirements, datacenter design, networking, and performance optimizations for Consul cluster based on the latest Consul release.
As applications are migrated to dynamically provisioned infrastructure, scaling services and managing the communications between them becomes challenging. Consuls service discovery capabilities provide the connectivity between dynamic applications. Consul also monitors the health of each node and its applications to ensure that only healthy service instances are discovered. Consuls distributed runtime configuration store allows updates across global infrastructure.
This document assumes a basic working knowledge of Consul.
This document provides recommended practices and a reference architecture, including system requirements, datacenter design, networking, and performance optimizations for Consul production deployments.
## System Requirements
## Infrastructure Requirements
### Consul Servers
Consul server agents are responsible for maintaining the cluster state, responding to RPC queries (read operations), and for processing all write operations. Given that Consul server agents do most of the heavy lifting, server sizing is critical for the overall performance efficiency and health of the Consul cluster.
The following instance configurations are recommended.
The following table provides high-level server guidelines. Of particular
note is the strong recommendation to avoid non-fixed performance CPUs,
or "Burstable CPU".
|Size|CPU|Memory|Disk|Typical Cloud Instance Types|
|----|---|------|----|----------------------------|
|Small|2 core|8-16 GB RAM|50 GB|**AWS**: m5.large, m5.xlarge|
|||||**Azure**: Standard\_A4\_v2, Standard\_A8\_v2|
|||||**GCE**: n1-standard-8, n1-standard-16|
|Large|4-8 core|32-64+ GB RAM|100 GB|**AWS**: m5.2xlarge, m5.4xlarge|
|||||**Azure**: Standard\_D4\_v3, Standard\_D5\_v3|
|||||**GCE**: n1-standard-32, n1-standard-64|
|XL|12-24 core|64+ GB RAM|SSD|**AWS**: m5d.4xlarge|
|||||**Azure**: Standard\_D16\_v3, Standard\_D32\_v3|
|||||**GCE**: n1-standard-32, n1-standard-64|
| Type | CPU | Memory | Disk | Typical Cloud Instance Types |
|-------------|----------|--------------|-------|-----------------------------------------------|
| Small | 2 core | 8-16 GB RAM | 50GB | **AWS**: m5.large, m5.xlarge |
| | | | | **Azure**: Standard\_A4\_v2, Standard\_A8\_v2 |
| | | | | **GCE**: n1-standard-8, n1-standard-16 |
| Large | 4-8 core | 32-64 GB RAM | 100GB | **AWS**: m5.2xlarge, m5.4xlarge |
| | | | | **Azure**: Standard\_D4\_v3, Standard\_D5\_v3 |
| | | | | **GCE**: n1-standard-32, n1-standard-64 |
The **small size** instance configuration is appropriate for most initial production deployments, or for development/testing environments. The large size is for production environments where there is a consistently high workload. Suggested instance types are provided for common platforms, but do refer to platform documentation for up-to-date instance types that align with the recommended resources.
#### Hardware Sizing Considerations
- The small size would be appropriate for most initial production
deployments, or for development/testing environments.
- The large size is for production environments where there is a
consistently high workload.
~> **NOTE** For large workloads, ensure that the disks support a high number of IOPS to keep up with the rapid Raft log update rate.
For a write-heavy and/or a read-heavy cluster, the number of clients may need to be reduced further with considerations for the impact of the number of services and/or watches registered and the number and size of KV pairs. Alternately, large scale read requests can be achieved by increasing the number of non-voting servers ([Enterprise feature](/docs/enterprise/read-scale/index.html)) while maintaining the recommended number of servers (3 or 5) in the quorum. See [Performance Tuning](#performance-tuning) for more recommendations for read-heavy clusters.
For more information on server requirements, review the [server performance](/docs/guides/performance.html) documentation.
## Infrastructure Diagram
![Reference Diagram](/assets/images/consul-arch.png "Reference Diagram")
## Datacenter Design
Consul may be deployed in a single physical datacenter or it may span multiple datacenters.
A single Consul datacenter includes a set of servers that maintain consensus. These are designed to be deployed in a single geographical location to ensure low latency for consensus operations otherwise performance and stability can be significantly compromised. In cloud environments, spreading the servers over multiple availability zones within the same region is recommended for fault tolerance while maintaining low-latency and high performance networking.
Consul supports multi-datacenter deployments via federation. The servers in each datacenter are joined by WAN gossip to enable forwarding service discovery and KV requests between datacenters. The separate datacenters do not replicate services or KV state between them to ensure they remain as isolated failure domains. Each set of servers must remain available (with at least a quorum online) for cross-datacenter requests to be made. If one datacenter has an outage of the servers, services within that datacenter will not be discoverable outside it until the servers recover.
A Consul cluster (typically three or five servers plus client agents) may be deployed in a single physical datacenter or it may span multiple datacenters. For a large cluster with high runtime reads and writes, deploying servers in the same physical location improves performance. In cloud environments, a single datacenter may be deployed across multiple availability zones i.e. each server in a separate availability zone on a single host. Consul also supports multi-datacenter deployments via separate clusters joined by WAN links. In some cases, one may also deploy two or more Consul clusters in the same LAN environment.
### Single Datacenter
A single Consul cluster is recommended for applications deployed in the same datacenter.
A single Consul cluster is recommended for applications deployed in the same datacenter. Consul supports traditional three-tier applications as well as microservices.
Typically, there must be three or five servers to balance between availability and performance. These servers together run the Raft-driven consistent state store for catalog, session, prepared query, ACL, and KV updates.
Consul is proven to work well with up to `5,000` nodes in a single datacenter gossip pool. Some deployments have stretched this number much further but they require care and testing to ensure they remain reliable and can converge their cluster state quickly. It's highly recommended that clusters are increased in size gradually when approaching or exceeding `5,000` nodes.
The recommended maximum cluster size for a single datacenter is 5,000 nodes. For a write-heavy and/or a read-heavy cluster, the maximum number of nodes may need to be reduced further, considering the impact of the number and the size of KV pairs and the number of watches. The time taken for gossip to converge increases as more client machines are added. Similarly, the time taken by the new server to join an existing multi-thousand node cluster with a large KV store and update rate may increase as they are replicated to the new servers log.
Consul can support larger single datacenter cluster sizes by tuning the [gossip parameters](/docs/agent/options.html#gossip_lan) and ensuring Consul agents -- particularly servers -- are running on sufficient hardware. There are real production users of Consul running with greater than 25,000 nodes in a single datacenter today by tuning these parameters. [XL server instances](#system-requirements) or better are required to achieve this scale.
-> **TIP** For write-heavy clusters, consider scaling vertically with larger machine instances and lower latency storage.
~> For write-heavy clusters, consider scaling vertically with larger machine instances and lower latency storage.
One must take care to use service tags in a way that assists with the kinds of queries that will be run against the cluster. If two services (e.g. blue and green) are running on the same cluster, appropriate service tags must be used to identify between them. If a query is made without tags, nodes running both blue and green services may show up in the results of the query.
In cases where a full mesh among all agents cannot be established due to network segmentation, Consuls own [network segments](/docs/enterprise/network-segments/index.html) can be used. Network segments is an Enterprise feature that allows the creation of multiple tenants which share Raft servers in the same cluster. Each tenant has its own gossip pool and doesnt communicate with the agents outside this pool. The KV store, however, is shared between all tenants. If Consul network segments cannot be used, isolation between agents can be accomplished by creating discrete [Consul datacenters](/docs/guides/datacenters.html).
In cases where a full mesh among all agents cannot be established due to network segmentation, Consuls own [network segments](/docs/enterprise/network-segments/index.html) can be used. Network segments is a Consul Enterprise feature that allows the creation of multiple tenants which share Raft servers in the same cluster. Each tenant has its own gossip pool and doesnt communicate with the agents outside this pool. The KV store, however, is shared between all tenants. If Consul network segments cannot be used, isolation between agents can be accomplished by creating discrete [Consul datacenters](/docs/guides/datacenters.html).
### Multiple Datacenters
@ -64,7 +73,7 @@ Consul clusters in different datacenters running the same service can be joined
-> A good practice is to enable TLS server name checking to avoid accidental cross-joining of agents.
Advanced federation can be achieved with [Network Areas](/api/operator/area.html) (Enterprise).
Advanced federation can be achieved with the [network areas](/api/operator/area.html) feature in Consul Enterprise.
A typical use case is where datacenter1 (dc1) hosts share services like LDAP (or ACL datacenter) which are leveraged by all other datacenters. However, due to compliance issues, servers in dc2 must not connect with servers in dc3. This cannot be accomplished with the basic WAN federation. Basic federation requires that all the servers in dc1, dc2 and dc3 are connected in a full mesh and opens both gossip (`8302 tcp/udp`) and RPC (`8300`) ports for communication.
@ -86,110 +95,22 @@ In addition, the agent also periodically performs a full state sync over TCP whi
In a larger network that spans L2 segments, traffic typically traverses through a firewall and/or a router. ACL or firewall rules must be updated to allow the following ports:
|Name|Port|Flag|Description|
|----|----|----|-----------|
|Server RPC|8300||Used by servers to handle incoming requests from other agents (clients and servers). TCP only.|
|Serf LAN|8301||Used to handle gossip in the LAN. Required by all agents. TCP and UDP.|
|Serf WAN|8302|`-1` to disable (available in Consul 1.0.7)|Used by servers to gossip over the LAN and WAN to other servers. TCP and UDP.|
|HTTP API|8500|`-1` to disable|Used by clients to talk to the HTTP API. TCP only.|
|DNS Interface|8600|`-1` to disable||
| Name | Port | Flag | Description |
|---------------|------|------|-------------|
| Server RPC | 8300 | | Used by servers to handle incoming requests from other agents. TCP only. |
| Serf LAN | 8301 | | Used to handle gossip in the LAN. Required by all agents. TCP and UDP. |
| Serf WAN | 8302 | `-1` to disable (available in Consul 1.0.7) | Used by servers to gossip over the LAN and WAN to other servers. TCP and UDP. |
| HTTP API | 8500 | `-1` to disable | Used by clients to talk to the HTTP API. TCP only. |
| DNS Interface | 8600 | `-1` to disable | |
-> As mentioned in the [datacenter design section](#single-datacenter), network areas and network segments can be used to prevent opening up firewall ports between different subnets.
-> As mentioned in the [datacenter design section](#datacenter-design), network areas and network segments can be used to prevent opening up firewall ports between different subnets.
By default agents will only listen for HTTP and DNS traffic on the local interface.
### Raft Tuning
## Next steps
Leader elections can be affected by network communication issues between servers. If the cluster spans multiple zones, the network latency between them must be taken into consideration and the [`raft_multiplier`](/docs/agent/options.html#raft_multiplier) must be adjusted accordingly.
- Read [Deployment Guide](/docs/guides/deployment-guide.html) to learn
the steps required to install and configure a single HashiCorp Consul cluster.
By default, the recommended value for production environments is `1`. This value must take into account the network latency between the servers and the read/write load on the servers.
The value of `raft_multiplier` is a scaling factor and directly affects the following parameters:
|Param|Value||
|-----|----:|-:|
|HeartbeatTimeout|1000ms|default|
|ElectionTimeout|1000ms|default|
|LeaderLeaseTimeout|500ms|default|
So a scaling factor of `5` (i.e. `raft_multiplier: 5`) updates the following values:
|Param|Value|Calculation|
|-----|----:|-:|
|HeartbeatTimeout|5000ms|5 x 1000ms|
|ElectionTimeout|5000ms|5 x 1000ms|
|LeaderLeaseTimeout|2500ms|5 x 500ms|
~> **NOTE** Wide networks with more latency will perform better with larger values of `raft_multiplier`.
The trade off is between leader stability and time to recover from an actual leader failure. A short multiplier minimizes failure detection and election time but may be triggered frequently in high latency situations. This can cause constant leadership churn and associated unavailability. A high multiplier reduces the chances that spurious failures will cause leadership churn but it does this at the expense of taking longer to detect real failures and thus takes longer to restore cluster availability.
Leadership instability can also be caused by under-provisioned CPU resources and is more likely in environments where CPU cycles are shared with other workloads. In order for a server to remain the leader, it must send frequent heartbeat messages to all other servers every few hundred milliseconds. If some number of these are missing or late due to the leader not having sufficient CPU to send them on time, the other servers will detect it as failed and hold a new election.
## Performance Tuning
Consul is write limited by disk I/O and read limited by CPU. Memory requirements will be dependent on the total size of KV pairs stored and should be sized according to that data (as should the hard drive storage). The limit on a keys value size is `512KB`.
-> Consul is write limited by disk I/O and read limited by CPU.
For **write-heavy** workloads, the total RAM available for overhead must approximately be equal to
RAM NEEDED = number of keys * (average key + value size) * 2-3x
Since writes must be synced to disk (persistent storage) on a quorum of servers before they are committed, deploying a disk with high write throughput (or an SSD) will enhance performance on the write side ([configuration reference](/docs/agent/options.html#\_data\_dir)).
For a **read-heavy** workload, configure all Consul server agents with the `allow_stale` DNS option, or query the API with the `stale` [consistency mode](/api/index.html#consistency-modes). By default, all queries made to the server are RPC forwarded to and serviced by the leader. By enabling stale reads, any server will respond to any query, thereby reducing overhead on the leader. Typically, the stale response is `100ms` or less from consistent mode but it drastically improves performance and reduces latency under high load.
If the leader server is out of memory or the disk is full, the server eventually stops responding, loses its election and cannot move past its last commit time. However, by configuring `max_stale` and setting it to a large value, Consul will continue to respond to queries during such outage scenarios ([max_stale documentation](/docs/agent/options.html#max_stale)).
It should be noted that `stale` is not appropriate for coordination where strong consistency is important (i.e. locking or application leader election). For critical cases, the optional `consistent` API query mode is required for true linearizability; the trade off is that this turns a read into a full quorum operation so requires more resources and takes longer.
**Read-heavy** clusters may take advantage of the [enhanced reading](/docs/enterprise/read-scale/index.html) feature (Enterprise) for better scalability. This feature allows additional servers to be introduced as non-voters. Being a non-voter, the server will still participate in data replication, but it will not block the leader from committing log entries.
Consuls agents use network sockets for communicating with the other nodes (gossip) and with the server agent. In addition, file descriptors are also opened for watch handlers, health checks, and log files. For a **write heavy** cluster, the `ulimit` size must be increased from the default value (`1024`) to prevent the leader from running out of file descriptors.
A safe limit to set for `ulimit` will depend heavily on cluster size and workload but there is usually no downside to being liberal and allowing tens of thousands of descriptors for large servers (or even more).
To prevent any CPU spikes from a misconfigured client, RPC requests to the server should be [rate limited](/docs/agent/options.html#limits)
~> **NOTE** Rate limiting is configured on the client agent only.
In addition, two [performance indicators](/docs/agent/telemetry.html) — `consul.runtime.alloc_bytes` and `consul.runtime.heap_objects` — can help diagnose if the current sizing is not adequately meeting the load.
## Backups
Creating server backups is an important step in production deployments. Backups provide a mechanism for the server to recover from an outage (network loss, operator error, or a corrupted data directory). All agents write to the `-data-dir` before commit. This directory persists the local agents state and — in the case of servers — it also holds the Raft information.
The local agent state on client agents is largely an optimization and does not typically need backing up. If this data is lost, any API registrations will need to be replayed and if Connect managed proxy instances were running, they will need to be manually stopped as they will no longer be managed by the agent, other than that a new agent can rejoin the cluster with no issues.
Consul provides the [snapshot](/docs/commands/snapshot.html) command which can be run using the CLI command or the API. The `snapshot` command saves the point-in-time snapshot of the state of the Consul servers which includes all cluster state including but not limited to KV entries, the service catalog, prepared queries, sessions, ACL, Connect CA state, and the Connect intention graph.
With [Consul Enterprise](/docs/commands/snapshot/agent.html), the `snapshot agent` command runs periodically and writes to local or remote storage (such as Amazon S3).
By default, all snapshots are taken using `consistent` mode where requests are forwarded to the leader which verifies that it is still in power before taking the snapshot. Snapshots will not be saved if the clusted is degraded or if no leader is available. To reduce the burden on the leader, it is possible to [run the snapshot](/docs/commands/snapshot/save.html) on any non-leader server using `stale` consistency mode:
$ consul snapshot save -stale backup.snap
This spreads the load across nodes at the possible expense of losing full consistency guarantees. Typically this means that a very small number of recent writes may not be included. The omitted writes are typically limited to data written in the last `100ms` or less from the recovery point. This is usually suitable for disaster recovery. However, the system cant guarantee how stale this may be if executed against a partitioned server.
## Node Removal
Failed nodes will be automatically removed after 72 hours. This can happen if a node does not shutdown cleanly or if the process supervisor does not give the agent long enough to gracefully leave the cluster. Until then, Consul periodically tries to reconnect to the failed node. After 72 hours, Consul will reap failed nodes and stop trying to reconnect.
This sequence can be accelerated with the [`force-leave`](/docs/commands/force-leave.html) command. Nodes running as servers will be removed from the Raft quorum. Force-leave may also be used to remove nodes that have accidentally joined the datacenter. Force-leave can only be applied to the nodes in its respective datacenter and cannot be executed on the nodes outside the datacenter.
Alternately, servers can be removed using [`remove-peer`](/docs/commands/operator/raft.html#remove-peer) if `force-leave` is not effective in removing the nodes.
$ consul operator raft remove-peer -address=x.x.x.x:8300
~> **NOTE** `remove-peer` only works on clusters that still have a leader.
If the leader is affected by an outage, then read the [outage recovery](/docs/guides/outage.html) guide. Depending on your scenario, several options for recovery may be possible.
To remove all agents that accidentally joined the wrong set of servers, clear out the contents of the data directory (`-data-dir`) on both client and server nodes.
!> **WARNING** Removing data on server nodes will destroy all state in the cluster and cant be undone.
The [Autopilot](/docs/guides/autopilot.html) (Enterprise) feature automatically cleans up dead servers instead of waiting 72 hours. Dead servers will periodically be cleaned up and removed from the Raft peer set, to prevent them from interfering with the quorum size and leader elections.
Removing any server must be done carefully. For a cluster of `N` servers to function properly, `(N/2) + 1` must be available. Before removing an old server from the cluster, the new server must be added in order to make the cluster failure tolerant. The old server can then be removed.
- Read [Server Performance](/docs/guides/performance.html) to learn about
additional configuration that benefits production deployments.

View File

@ -59,14 +59,35 @@ The high performance configuration is simple and looks like this:
}
```
This value must take into account the network latency between the servers and the read/write load on the servers.
The value of `raft_multiplier` is a scaling factor and directly affects the following parameters:
|Param|Value||
|-----|----:|-:|
|HeartbeatTimeout|1000ms|default|
|ElectionTimeout|1000ms|default|
|LeaderLeaseTimeout|500ms|default|
So a scaling factor of `5` (i.e. `raft_multiplier: 5`) updates the following values:
|Param|Value|Calculation|
|-----|----:|-:|
|HeartbeatTimeout|5000ms|5 x 1000ms|
|ElectionTimeout|5000ms|5 x 1000ms|
|LeaderLeaseTimeout|2500ms|5 x 500ms|
~> **NOTE** Wide networks with more latency will perform better with larger values of `raft_multiplier`.
The trade off is between leader stability and time to recover from an actual leader failure. A short multiplier minimizes failure detection and election time but may be triggered frequently in high latency situations. This can cause constant leadership churn and associated unavailability. A high multiplier reduces the chances that spurious failures will cause leadership churn but it does this at the expense of taking longer to detect real failures and thus takes longer to restore cluster availability.
Leadership instability can also be caused by under-provisioned CPU resources and is more likely in environments where CPU cycles are shared with other workloads. In order for a server to remain the leader, it must send frequent heartbeat messages to all other servers every few hundred milliseconds. If some number of these are missing or late due to the leader not having sufficient CPU to send them on time, the other servers will detect it as failed and hold a new election.
It's best to benchmark with a realistic workload when choosing a production server for Consul.
Here are some general recommendations:
* Consul will make use of multiple cores, and at least 2 cores are recommended.
* For write-heavy workloads, disk speed on the servers is key for performance. Use SSDs or
another fast disk technology for the best write throughput.
* <a name="last-contact"></a>Spurious leader elections can be caused by networking issues between
the servers or insufficient CPU resources. Users in cloud environments often bump their servers
up to the next instance class with improved networking and CPU until leader elections stabilize,
@ -110,3 +131,31 @@ set size. You can determine the working set size by noting the value of
> NOTE: Consul is not designed to serve as a general purpose database, and you
> should keep this in mind when choosing what data are populated to the
> key/value store.
## Read/Write Tuning
Consul is write limited by disk I/O and read limited by CPU. Memory requirements will be dependent on the total size of KV pairs stored and should be sized according to that data (as should the hard drive storage). The limit on a keys value size is `512KB`.
-> Consul is write limited by disk I/O and read limited by CPU.
For **write-heavy** workloads, the total RAM available for overhead must approximately be equal to
RAM NEEDED = number of keys * average key size * 2-3x
Since writes must be synced to disk (persistent storage) on a quorum of servers before they are committed, deploying a disk with high write throughput (or an SSD) will enhance performance on the write side. ([Documentation](/docs/agent/options.html#\_data\_dir))
For a **read-heavy** workload, configure all Consul server agents with the `allow_stale` DNS option, or query the API with the `stale` [consistency mode](/api/index.html#consistency-modes). By default, all queries made to the server are RPC forwarded to and serviced by the leader. By enabling stale reads, any server will respond to any query, thereby reducing overhead on the leader. Typically, the stale response is `100ms` or less from consistent mode but it drastically improves performance and reduces latency under high load.
If the leader server is out of memory or the disk is full, the server eventually stops responding, loses its election and cannot move past its last commit time. However, by configuring `max_stale` and setting it to a large value, Consul will continue to respond to queries during such outage scenarios. ([max_stale documentation](/docs/agent/options.html#max_stale)).
It should be noted that `stale` is not appropriate for coordination where strong consistency is important (i.e. locking or application leader election). For critical cases, the optional `consistent` API query mode is required for true linearizability; the trade off is that this turns a read into a full quorum write so requires more resources and takes longer.
**Read-heavy** clusters may take advantage of the [enhanced reading](/docs/enterprise/read-scale/index.html) feature (Enterprise) for better scalability. This feature allows additional servers to be introduced as non-voters. Being a non-voter, the server will still participate in data replication, but it will not block the leader from committing log entries.
Consuls agents use network sockets for communicating with the other nodes (gossip) and with the server agent. In addition, file descriptors are also opened for watch handlers, health checks, and log files. For a **write heavy** cluster, the `ulimit` size must be increased from the default value (`1024`) to prevent the leader from running out of file descriptors.
To prevent any CPU spikes from a misconfigured client, RPC requests to the server should be [rate limited](/docs/agent/options.html#limits)
~> **NOTE** Rate limiting is configured on the client agent only.
In addition, two [performance indicators](/docs/agent/telemetry.html) &mdash; `consul.runtime.alloc_bytes` and `consul.runtime.heap_objects` &mdash; can help diagnose if the current sizing is not adequately meeting the load.

View File

@ -114,6 +114,12 @@ The leader should also emit various logs including:
At this point the node has been gracefully removed from the cluster, and
will shut down.
If the leader is affected by an outage, then [manual recovery](/docs/guides/outage.html#manual-recovery-using-peers-json) needs to be done.
To remove all agents that accidentally joined the wrong set of servers, clear out the contents of the data directory (`-data-dir`) on both client and server nodes.
!> **WARNING** Removing data on server nodes will destroy all state in the cluster
## Forced Removal
In some cases, it may not be possible to gracefully remove a server. For example,

View File

@ -334,9 +334,6 @@
<li<%= sidebar_current("docs-guides") %>>
<a href="/docs/guides/index.html">Guides</a>
<ul class="nav">
<li<%= sidebar_current("docs-guides-deployment") %>>
<a href="/docs/guides/deployment.html">Production Deployment</a>
</li>
<li<%= sidebar_current("docs-guides-acl") %>>
<a href="/docs/guides/acl.html">ACLs</a>
</li>
@ -356,8 +353,10 @@
<a href="/docs/guides/consul-containers.html">Consul with Containers</a>
</li>
<li<%= sidebar_current("docs-guides-creating-certificates") %>>
<a href="/docs/guides/creating-certificates.html">Creating
Certificates</a>
<a href="/docs/guides/creating-certificates.html">Creating Certificates</a>
</li>
<li<%= sidebar_current("docs-guides-deployment-guide") %>>
<a href="/docs/guides/deployment-guide.html">Deployment Guide</a>
</li>
<li<%= sidebar_current("docs-guides-dns-cache") %>>
<a href="/docs/guides/dns-cache.html">DNS Caching</a>
@ -386,15 +385,18 @@ Certificates</a>
<li<%= sidebar_current("docs-guides-segments") %>>
<a href="/docs/guides/segments.html">Network Segments</a>
</li>
<li<%= sidebar_current("docs-guides-sentinel") %>>
<a href="/docs/guides/sentinel.html">Sentinel</a>
</li>
<li<%= sidebar_current("docs-guides-outage") %>>
<a href="/docs/guides/outage.html">Outage Recovery</a>
</li>
<li<%= sidebar_current("docs-guides-reference-architecture") %>>
<a href="/docs/guides/deployment.html">Reference Architecture</a>
</li>
<li<%= sidebar_current("docs-guides-semaphore") %>>
<a href="/docs/guides/semaphore.html">Semaphore</a>
</li>
<li<%= sidebar_current("docs-guides-sentinel") %>>
<a href="/docs/guides/sentinel.html">Sentinel</a>
</li>
<li<%= sidebar_current("docs-guides-performance") %>>
<a href="/docs/guides/performance.html">Server Performance</a>
</li>