From 8881d352aae4cbaf1507e42db836bca937cc1c95 Mon Sep 17 00:00:00 2001
From: kaitlincarter-hc <43049322+kaitlincarter-hc@users.noreply.github.com>
Date: Tue, 27 Nov 2018 09:10:44 -0800
Subject: [PATCH] Updated autopilot guide (#4997)

* Updated the autopilot guide to prepare for the migration to the learn platform.

* Correcting typos
---
 website/source/docs/guides/autopilot.html.md | 283 ++++++++++++-------
 1 file changed, 182 insertions(+), 101 deletions(-)

diff --git a/website/source/docs/guides/autopilot.html.md b/website/source/docs/guides/autopilot.html.md
index 5f42d564cd..b8987f1591 100644
--- a/website/source/docs/guides/autopilot.html.md
+++ b/website/source/docs/guides/autopilot.html.md
@@ -8,7 +8,7 @@ description: |-
 
 # Autopilot
 
-Autopilot is a set of new features added in Consul 0.8 to allow for automatic
+Autopilot features allow for automatic,
 operator-friendly management of Consul servers. It includes cleanup of dead
 servers, monitoring the state of the Raft cluster, and stable server introduction.
 
@@ -19,11 +19,25 @@ the Agent configuration must be set to 3 or higher on all servers. In Consul
 information, see the [Version Upgrade section](/docs/upgrade-specific.html#raft_protocol)
 on Raft Protocol versions.
 
-## Configuration
+In this guide we will learn more about Autopilot's features.
+
+* Dead server cleanup
+* Server Stabilization
+* Redundancy zone tags
+* Upgrade migration
+
+Finally, we will review how to ensure Autopilot is healthy.
+
+Note, in this guide we are using  examples from a Consul 1.4 cluster, we
+are starting with Autopilot enabled by default.
+
+## Default Configuration
 
 The configuration of Autopilot is loaded by the leader from the agent's
 [Autopilot settings](/docs/agent/options.html#autopilot) when initially
-bootstrapping the cluster:
+bootstrapping the cluster. Since Autopilot and it's features are already
+enabled, we only need to update the configuration to disable them. The
+following are the defaults.
 
 ```
 {
@@ -31,16 +45,22 @@ bootstrapping the cluster:
     "last_contact_threshold": "200ms",
     "max_trailing_logs": 250,
     "server_stabilization_time": "10s",
-    "redundancy_zone_tag": "az",
+    "redundancy_zone_tag": "",
     "disable_upgrade_migration": false,
     "upgrade_version_tag": ""
 }
 ```
 
+All Consul servers should have Autopilot and its features either enabled
+or disabled to ensure consistency accross servers in case of a failure. Additionally,
+Autopilot must be enabled to use any of the features, but the features themselves
+can be configured independently. Meaning you can enable or disable any of the features
+separately, at any time.
+
 After bootstrapping, the configuration can be viewed or modified either via the
 [`operator autopilot`](/docs/commands/operator/autopilot.html) subcommand or the
 [`/v1/operator/autopilot/configuration`](/api/operator.html#autopilot-configuration)
-HTTP endpoint:
+HTTP endpoint.
 
 ```
 $ consul operator autopilot get-config
@@ -51,7 +71,29 @@ ServerStabilizationTime = 10s
 RedundancyZoneTag = ""
 DisableUpgradeMigration = false
 UpgradeVersionTag = ""
+```
 
+In the example above, we used the `operator autopilot get-config` subcommand to check
+the autopilot configuration. You can see we still have all the defaults.
+
+## Dead Server Cleanup
+
+If Autopilot is disabled, it will take 72 hours for dead servers to be automatically reaped
+or an operator had to script a `consul force-leave`. If another server failure occurred
+it could jeopardize the quorum, even if the failed Consul server had been automatically
+replaced. Autopilot helps prevent these kinds of outages by quickly removing failed
+servers as soon as a replacement Consul server comes online. When servers are removed
+by the cleanup process they will enter the "left" state.
+
+With Autopilot's dead server cleanup enabled, dead servers will periodically be
+cleaned up and removed from the Raft peer set to prevent them from interfering with
+the quorum size and leader elections. The cleanup process will also be automatically
+triggered whenever a new server is successfully added to the cluster.
+
+To update the dead server cleanup feature use `consul operator autopilot set-config`
+with the `-cleanup-dead-servers` flag.
+
+```sh
 $ consul operator autopilot set-config -cleanup-dead-servers=false
 Configuration updated!
 
@@ -65,34 +107,143 @@ DisableUpgradeMigration = false
 UpgradeVersionTag = ""
 ```
 
-## Dead Server Cleanup
+We have disabled dead server cleanup, but sill have all the other Autopilot defaults.
 
-Dead servers will periodically be cleaned up and removed from the Raft peer
-set, to prevent them from interfering with the quorum size and leader elections.
-This cleanup will also happen whenever a new server is successfully added to the
-cluster.
+## Server Stabalization
 
-Prior to Autopilot, it would take 72 hours for dead servers to be automatically reaped,
-or operators had to script a `consul force-leave`. If another server failure occurred,
-it could jeopardize the quorum, even if the failed Consul server had been automatically
-replaced. Autopilot helps prevent these kinds of outages by quickly removing failed
-servers as soon as a replacement Consul server comes online. When servers are removed
-by the cleanup process they will enter the "left" state.
+When a new server is added to the cluster, there is a waiting period where it
+must be healthy and stable for a certain amount of time before being promoted
+to a full, voting member. This can be configured via the `ServerStabilizationTime`
+setting.
 
-This option can be disabled by running `consul operator autopilot set-config`
-with the `-cleanup-dead-servers=false` option.
+```sh
+consul operator autopilot set-config -server-stabalization-time=5s
+Configuration updated!
+
+$ consul operator autopilot get-config
+CleanupDeadServers = false
+LastContactThreshold = 200ms
+MaxTrailingLogs = 250
+ServerStabilizationTime = 5s
+RedundancyZoneTag = ""
+DisableUpgradeMigration = false
+UpgradeVersionTag = ""
+```
+
+Now we have disabled dead server cleanup and set the server stabilization time to 5 seconds.
+When a new server is added to our cluster, it will only need to be healthy and stable for
+5 seconds.
+
+## Redundancy Zones
+
+Prior to Autopilot, it was difficult to deploy servers in a way that took advantage of
+isolated failure domains such as AWS Availability Zones; users would be forced to either
+have an overly-large quorum (2-3 nodes per AZ) or give up redundancy within an AZ by
+deploying just one server in each.
+
+If the `RedundancyZoneTag` setting is set, Consul will use its value to look for a
+zone in each server's specified [`-node-meta`](/docs/agent/options.html#_node_meta)
+tag. For example, if `RedundancyZoneTag` is set to `zone`, and `-node-meta zone:east1a`
+is used when starting a server, that server's redundancy zone will be `east1a`.
+
+```
+$ consul operator autopilot set-config -redundancy-zone-tag=uswest1
+Configuration updated!
+
+$ consul operator autopilot get-config
+CleanupDeadServers = false
+LastContactThreshold = 200ms
+MaxTrailingLogs = 250
+ServerStabilizationTime = 5s
+RedundancyZoneTag = "uswest1"
+DisableUpgradeMigration = false
+UpgradeVersionTag = ""
+```
+
+For our Autopilot features, we now have disabled dead server cleanup, server stabilization time to 5 seconds, and
+the redundancy zone tag is uswest1.
+
+Consul will then use these values to partition the servers by redundancy zone, and will
+aim to keep one voting server per zone. Extra servers in each zone will stay as non-voters
+on standby to be promoted if the active voter leaves or dies.
+
+## Upgrade Migrations
+
+Autopilot in Consul *Enterprise* supports upgrade migrations by default. To disable this
+functionality, set `DisableUpgradeMigration` to true.
+
+```sh
+$ consul operator autopilot set-config -disable-upgrade-migration=true
+Configuration updated!
+
+$ consul operator autopilot get-config
+CleanupDeadServers = false
+LastContactThreshold = 200ms
+MaxTrailingLogs = 250
+ServerStabilizationTime = 5s
+RedundancyZoneTag = "uswest1"
+DisableUpgradeMigration = true
+UpgradeVersionTag = ""
+```
+
+With upgrade migration enabled, when a new server is added and Autopilot detects that
+its Consul version is newer than that of the existing servers, Autopilot will avoid
+promoting the new server until enough newer-versioned servers have been added to the
+cluster. When the count of new servers equals or exceeds that of the old servers,
+Autopilot will begin promoting the new servers to voters and demoting the old servers.
+After this is finished, the old servers can be safely removed from the cluster.
+
+To check the consul version of the servers, you can either use the [autopilot health]
+(/api/operator.html#autopilot-health) endpoint or the `consul members`
+command.
+
+```
+$ consul members
+Node   Address         Status  Type    Build  Protocol  DC   Segment
+node1  127.0.0.1:8301  alive   server  1.4.0  2         dc1   <all>
+node2  127.0.0.1:8703  alive   server  1.4.0  2         dc1   <all>
+node3  127.0.0.1:8803  alive   server  1.4.0  2         dc1   <all>
+node4  127.0.0.1:8203  alive   server  1.3.0  2         dc1   <all>
+```
+
+### Migrations Without a Consul Version Change
+
+The `UpgradeVersionTag` can be used to override the version information used during
+a migration, so that the migration logic can be used for updating the cluster when
+changing configuration.
+
+If the `UpgradeVersionTag` setting is set, Consul will use its value to look for a
+version in each server's specified [`-node-meta`](/docs/agent/options.html#_node_meta)
+tag. For example, if `UpgradeVersionTag` is set to `build`, and `-node-meta build:0.0.2`
+is used when starting a server, that server's version will be `0.0.2` when considered in
+a migration. The upgrade logic will follow semantic versioning and the version string
+must be in the form of either `X`, `X.Y`, or `X.Y.Z`.
+
+```sh
+$ consul operator autopilot set-config -upgrade-version-tag=1.4.0
+Configuration updated!
+
+$ consul operator autopilot get-config
+CleanupDeadServers = false
+LastContactThreshold = 200ms
+MaxTrailingLogs = 250
+ServerStabilizationTime = 5s
+RedundancyZoneTag = "uswest1"
+DisableUpgradeMigration = true
+UpgradeVersionTag = "1.4.0"
+```
 
 ## Server Health Checking
 
 An internal health check runs on the leader to track the stability of servers.
-<br>A server is considered healthy if all of the following conditions are true:
+<br>A server is considered healthy if all of the following conditions are true.
 
-- It has a SerfHealth status of 'Alive'
+- It has a SerfHealth status of 'Alive'.
 - The time since its last contact with the current leader is below
-`LastContactThreshold`
-- Its latest Raft term matches the leader's term
+`LastContactThreshold`.
+- Its latest Raft term matches the leader's term.
 - The number of Raft log entries it trails the leader by does not exceed
-`MaxTrailingLogs`
+`MaxTrailingLogs`.
 
 The status of these health checks can be viewed through the [`/v1/operator/autopilot/health`]
 (/api/operator.html#autopilot-health) HTTP endpoint, with a top level
@@ -136,83 +287,13 @@ $ curl localhost:8500/v1/operator/autopilot/health
 }
 ```
 
-## Stable Server Introduction
+## Summary
 
-When a new server is added to the cluster, there is a waiting period where it
-must be healthy and stable for a certain amount of time before being promoted
-to a full, voting member. This can be configured via the `ServerStabilizationTime`
-setting.
+In this guide we configured most of the Autopilot features; dead server cleanup, server
+stabalization, redunancy zone tags, upgrade migration, and upgrade version tag.
 
----
-
-~> The following Autopilot features are available only in
-   [Consul Enterprise](https://www.hashicorp.com/products/consul/) version 0.8.0 and later.
-
-## Server Read Scaling
-
-With the [`-non-voting-server`](/docs/agent/options.html#_non_voting_server) option, a
-server can be explicitly marked as a non-voter and will never be promoted to a voting
-member. This can be useful when more read scaling is needed; being a non-voter means
-that the server will still have data replicated to it, but it will not be part of the
-quorum that the leader must wait for before committing log entries.
-
-## Redundancy Zones
-
-Prior to Autopilot, it was difficult to deploy servers in a way that took advantage of
-isolated failure domains such as AWS Availability Zones; users would be forced to either
-have an overly-large quorum (2-3 nodes per AZ) or give up redundancy within an AZ by
-deploying just one server in each.
-
-If the `RedundancyZoneTag` setting is set, Consul will use its value to look for a
-zone in each server's specified [`-node-meta`](/docs/agent/options.html#_node_meta)
-tag. For example, if `RedundancyZoneTag` is set to `zone`, and `-node-meta zone:east1a`
-is used when starting a server, that server's redundancy zone will be `east1a`.
-
-Here's an example showing how to configure this:
-
-```
-$ consul operator autopilot set-config -redundancy-zone-tag=zone
-Configuration updated!
-```
-
-Consul will then use these values to partition the servers by redundancy zone, and will
-aim to keep one voting server per zone. Extra servers in each zone will stay as non-voters
-on standby to be promoted if the active voter leaves or dies.
-
-## Upgrade Migrations
-
-Autopilot in Consul Enterprise supports upgrade migrations by default. To disable this
-functionality, set `DisableUpgradeMigration` to true.
-
-When a new server is added and Autopilot detects that its Consul version is newer than
-that of the existing servers, Autopilot will avoid promoting the new server until enough
-newer-versioned servers have been added to the cluster. When the count of new servers
-equals or exceeds that of the old servers, Autopilot will begin promoting the new servers
-to voters and demoting the old servers. After this is finished, the old servers can be
-safely removed from the cluster.
-
-To check the consul version of the servers, either the [autopilot health]
-(/api/operator.html#autopilot-health) endpoint or the `consul members`
-command can be used:
-
-```
-$ consul members
-Node   Address         Status  Type    Build  Protocol  DC
-node1  127.0.0.1:8301  alive   server  0.7.5  2         dc1
-node2  127.0.0.1:8703  alive   server  0.7.5  2         dc1
-node3  127.0.0.1:8803  alive   server  0.7.5  2         dc1
-node4  127.0.0.1:8203  alive   server  0.8.0  2         dc1
-```
-
-### Migrations Without a Consul Version Change
-
-The `UpgradeVersionTag` can be used to override the version information used during
-a migration, so that the migration logic can be used for updating the cluster when
-changing configuration.
-
-If the `UpgradeVersionTag` setting is set, Consul will use its value to look for a
-version in each server's specified [`-node-meta`](/docs/agent/options.html#_node_meta)
-tag. For example, if `UpgradeVersionTag` is set to `build`, and `-node-meta build:0.0.2`
-is used when starting a server, that server's version will be `0.0.2` when considered in
-a migration. The upgrade logic will follow semantic versioning and the version string
-must be in the form of either `X`, `X.Y`, or `X.Y.Z`.
+To learn more about the Autopilot settings we did not configure,
+[last_contact_threshold](https://www.consul.io/docs/agent/options.html#last_contact_threshold)
+and [max_trailing_logs](https://www.consul.io/docs/agent/options.html#max_trailing_logs),
+either read the agent configuration documenation or use the help flag with the
+opertor autopilot `consul operator autopilot set-config -h`.