mirror of https://github.com/hashicorp/consul
109 lines
3.6 KiB
Markdown
109 lines
3.6 KiB
Markdown
|
---
|
||
|
layout: "docs"
|
||
|
page_title: "Autopilot"
|
||
|
sidebar_current: "docs-guides-autopilot"
|
||
|
description: |-
|
||
|
This guide covers how to configure and use Autopilot features.
|
||
|
---
|
||
|
|
||
|
# Autopilot
|
||
|
|
||
|
Autopilot is a set of new features added in Consul 0.8 to allow for automatic
|
||
|
operator-friendly management of Consul servers. It includes cleanup of dead
|
||
|
servers, monitoring the of the Raft cluster, and stable server introduction.
|
||
|
|
||
|
To enable Autopilot features (with the exception of dead server cleanup),
|
||
|
the [`raft_protocol`](/docs/agent/options.html#_raft_protocol) setting in
|
||
|
the Agent configuration must be set to 3 or higher on all servers. In Consul
|
||
|
0.8 this setting defaults to 2; in Consul 0.9 it will default to 3.
|
||
|
|
||
|
## Configuration
|
||
|
|
||
|
The configuration of Autopilot is loaded by the leader from the agent's
|
||
|
[`autopilot`](/docs/agent/options.html#autopilot) settings when initially
|
||
|
bootstrapping the cluster. After bootstrapping, the configuration can
|
||
|
be viewed or modified either via the [`operator autopilot`]
|
||
|
(/docs/commands/operator/autopilot.html) subcommand or the
|
||
|
[`/v1/operator/autopilot/configuration`](/docs/agent/http/operator.html#autopilot-configuration)
|
||
|
HTTP endpoint:
|
||
|
|
||
|
```
|
||
|
$ consul operator autopilot get-config
|
||
|
CleanupDeadServers = true
|
||
|
LastContactThreshold = 200ms
|
||
|
MaxTrailingLogs = 250
|
||
|
ServerStabilizationTime = 10s
|
||
|
|
||
|
$ consul operator autopilot set-config -cleanup-dead-servers=false
|
||
|
Configuration updated!
|
||
|
|
||
|
$ consul operator autopilot get-config
|
||
|
CleanupDeadServers = false
|
||
|
LastContactThreshold = 200ms
|
||
|
MaxTrailingLogs = 250
|
||
|
ServerStabilizationTime = 10s
|
||
|
```
|
||
|
|
||
|
## Dead Server Cleanup
|
||
|
|
||
|
Dead servers will periodically be cleaned up and removed from the Raft peer
|
||
|
set, to prevent them from interfering with the quorum size and leader elections.
|
||
|
This cleanup will also happen whenever a new server is successfully added to the
|
||
|
cluster.
|
||
|
|
||
|
This option can be disabled by running `consul operator autopilot set-config`
|
||
|
with the `-cleanup-dead-servers=false` option.
|
||
|
|
||
|
## Server Health Checking
|
||
|
|
||
|
An internal health check runs on the leader to track the stability of servers.
|
||
|
</br>A server is considered healthy if:
|
||
|
|
||
|
- It has a SerfHealth status of 'Alive'
|
||
|
- The time since its last contact with the current leader is below
|
||
|
`LastContactThreshold`
|
||
|
- Its latest Raft term matches the leader's term
|
||
|
- The number of Raft log entries it trails the leader by does not exceed
|
||
|
`MaxTrailingLogs`
|
||
|
|
||
|
The status of these health checks can be viewed through the [`/v1/operator/autopilot/health`]
|
||
|
(/docs/agent/http/operator.html#autopilot-health) HTTP endpoint, with a top level
|
||
|
`Healthy` field indicating the overall status of the cluster:
|
||
|
|
||
|
```
|
||
|
$ curl localhost:8500/v1/operator/autopilot/health
|
||
|
{
|
||
|
"Healthy": true,
|
||
|
"FailureTolerance": 0,
|
||
|
"Servers": [
|
||
|
{
|
||
|
"ID": "e349749b-3303-3ddf-959c-b5885a0e1f6e",
|
||
|
"Name": "node1",
|
||
|
"SerfStatus": "alive",
|
||
|
"LastContact": "0s",
|
||
|
"LastTerm": 3,
|
||
|
"LastIndex": 23,
|
||
|
"Healthy": true,
|
||
|
"StableSince": "2017-03-10T22:01:14Z"
|
||
|
},
|
||
|
{
|
||
|
"ID": "099061c7-ea74-42d5-be04-a0ad74caaaf5",
|
||
|
"Name": "node2",
|
||
|
"SerfStatus": "alive",
|
||
|
"LastContact": "53.279635ms",
|
||
|
"LastTerm": 3,
|
||
|
"LastIndex": 23,
|
||
|
"Healthy": true,
|
||
|
"StableSince": "2017-03-10T22:03:26Z"
|
||
|
}
|
||
|
]
|
||
|
}
|
||
|
```
|
||
|
|
||
|
## Stable Server Introduction
|
||
|
|
||
|
When a new server is added to the cluster, there is a waiting period where it
|
||
|
must be healthy and stable for a certain amount of time before being promoted
|
||
|
to a full, voting member. This can be configured via the `ServerStabilizationTime`
|
||
|
setting.
|