From 1f20d5afbaea37f130b0f72751cf324f6c12fcc5 Mon Sep 17 00:00:00 2001 From: Joel Watson Date: Mon, 14 Sep 2020 12:18:08 -0500 Subject: [PATCH 1/3] Add documentation for large version jump upgrades. --- website/data/docs-navigation.js | 14 +- website/pages/docs/upgrading/index.mdx | 15 ++ .../instructions/general-process.mdx | 178 ++++++++++++++ .../docs/upgrading/instructions/index.mdx | 34 +++ .../instructions/upgrade-to-1-2-x.mdx | 132 ++++++++++ .../instructions/upgrade-to-1-6-x.mdx | 228 ++++++++++++++++++ .../instructions/upgrade-to-1-8-x.mdx | 119 +++++++++ 7 files changed, 719 insertions(+), 1 deletion(-) create mode 100644 website/pages/docs/upgrading/instructions/general-process.mdx create mode 100644 website/pages/docs/upgrading/instructions/index.mdx create mode 100644 website/pages/docs/upgrading/instructions/upgrade-to-1-2-x.mdx create mode 100644 website/pages/docs/upgrading/instructions/upgrade-to-1-6-x.mdx create mode 100644 website/pages/docs/upgrading/instructions/upgrade-to-1-8-x.mdx diff --git a/website/data/docs-navigation.js b/website/data/docs-navigation.js index 016df94c68..076f4be424 100644 --- a/website/data/docs-navigation.js +++ b/website/data/docs-navigation.js @@ -243,7 +243,19 @@ export default [ 'download-tools', { category: 'upgrading', - content: ['compatibility', 'upgrade-specific'], + content: [ + 'compatibility', + 'upgrade-specific', + { + category: 'instructions', + content: [ + 'general-process', + 'upgrade-to-1-2-x', + 'upgrade-to-1-6-x', + 'upgrade-to-1-8-x', + ], + }, + ], }, { category: 'troubleshoot', diff --git a/website/pages/docs/upgrading/index.mdx b/website/pages/docs/upgrading/index.mdx index 11f1bb62cc..04663bcadd 100644 --- a/website/pages/docs/upgrading/index.mdx +++ b/website/pages/docs/upgrading/index.mdx @@ -45,6 +45,21 @@ Consul is A, and version B is released. by running `consul members` to make sure all members have the latest build and highest protocol version. +## Large Version Jumps + +Consul is a service that frequently sits in the critical path for many +or even all other applications and services you might have. As such, +it can be easy to get into a "if it's not broken, don't fix it" mentality +to avoid the potential risk in upgrading. Unfortunately, this can lead to +being many major versions behind and ultimately just makes upgrading +riskier overall. + +We encourage our customers to remain no more than two major versions behind +(i.e., if 1.8.x is the current release, don't use versions older than 1.6.x), +but in some environments that's not possible. If you find yourself in this +situation and need to upgrade, please see our [Upgrade Instructions page](/docs/upgrading/instructions) for +additional help. + ## Backward Incompatible Upgrades In some cases, a backwards incompatible update may be released. This has not diff --git a/website/pages/docs/upgrading/instructions/general-process.mdx b/website/pages/docs/upgrading/instructions/general-process.mdx new file mode 100644 index 0000000000..86b487238b --- /dev/null +++ b/website/pages/docs/upgrading/instructions/general-process.mdx @@ -0,0 +1,178 @@ +--- +layout: docs +page_title: General Upgrade Process +sidebar_title: General Process +description: >- + Specific versions of Consul may have additional information about the upgrade + process beyond the standard flow. +--- + +# General Upgrade Process + +## Introduction + +Upgrading Consul is a relatively easy process, but there are some best +practices that you should follow when doing so. Some versions also have +steps that are specific to that version, so make sure you also look through +our [upgrade instructions](/docs/upgrading/instructions) for the version you're on. + +## Download the New Version + +The first thing you need to do is download the binary for the new version +you want. + + + + +If you're after the Consul binary, you can find all current and past versions +of the OSS and Enterprise releases here: + +- https://releases.hashicorp.com/consul + + + + +If you're using Docker containers, then you can find those here: + +- **OSS:** https://hub.docker.com/_/consul +- **Enterprise:** https://hub.docker.com/r/hashicorp/consul-enterprise + + + + +If you're using Kubernetes, then please see our documentation for +[Upgrading Consul on Kubernetes](/docs/k8s/operations/upgrading). + + + + +## Prepare for the Upgrade + +**1.** Take a snapshot: + +``` +consul snapshot save backup.snap +``` + +You can inspect the snapshot to ensure if was successful with: + +``` +consul snapshot inspect backup.snap +``` + +You should see output similar to this: + +``` +ID 2-1182-1542056499724 +Size 4115 +Index 1182 +Term 2 +Version 1 +``` + +This will ensure you have a safe fallback option in case something goes wrong. Store +this snapshot somewhere safe. If you would like more information on snapshots, you +can find that here: + +- https://www.consul.io/docs/commands/snapshot +- https://learn.hashicorp.com/tutorials/consul/backup-and-restore + +**2.** Temporarily modify your Consul configuration so that its [log_level](/docs/agent/options.html#_log_level) +is set to `debug`. After doing this, run `consul reload` on your servers. This will +give you more information to work with in the event something goes wrong. + +## Perform the Upgrade + +**1.** Run the following command to see which server is currently the leader: + +``` +consul operator raft list-peers +``` + +You should see output similar to this (exact formatting and may differ based on version): + +``` +Node ID Address State Voter RaftProtocol +dc1-node1 ae15858f-7f5f-4dcb-b7d5-710fdcdd2745 10.11.0.2:8300 leader true 3 +dc1-node2 20e6be1b-f1cb-4aab-929f-f7d2d43d9a96 10.11.0.3:8300 follower true 3 +dc1-node3 658c343b-8769-431f-a71a-236f9dbb17b3 10.11.0.4:8300 follower true 3 +``` + +Take note of which agent is the leader. + +**2.** Copy the new `consul` binary onto your servers and replace the existing +binary with the new one. + +**3.** Perform a rolling restart of Consul on your servers, leaving the leader agent +for last. Only restart one server at a time. After restarting each server, validate +that it has rejoined the cluster and is in sync with the leader by running `consul info` +and checking whether the `commit_index` and `last_log_index` fields have the same value. +If done properly, this should avoid an unexpected leadership election due to loss of quorum. + +~> It's important to run `consul leave` on each server node when shutting +Consul down. Make sure your service management system (e.g., systemd, upstart, etc.) is +performing that action. If not, make sure you do it manually or you _will_ end up in a +bad cluster state. + +**4.** Double-check that all servers are showing up in the cluster as expected and are on +the correct version by running: + +``` +consul members +``` + +You should see output similar to this: + +``` +Node Address Status Type Build Protocol DC +dc1-node1 10.11.0.2:8301 alive server 1.8.3 2 dc1 +dc1-node2 10.11.0.3:8301 alive server 1.8.3 2 dc1 +dc1-node3 10.11.0.4:8301 alive server 1.8.3 2 dc1 +``` + +Also double-check the raft state to make sure there's a leader and sufficient voters: + +``` +consul operator raft list-peers +``` + +Which should look similar to this: + +``` +Node ID Address State Voter RaftProtocol +dc1-node1 ae15858f-7f5f-4dcb-b7d5-710fdcdd2745 10.11.0.2:8300 leader true 3 +dc1-node2 20e6be1b-f1cb-4aab-929f-f7d2d43d9a96 10.11.0.3:8300 follower true 3 +dc1-node3 658c343b-8769-431f-a71a-236f9dbb17b3 10.11.0.4:8300 follower true 3 +``` + +**5.** Set your `log_level` back to what you had it at prior to the upgrade and run +`consul reload` again. + +## Troubleshooting + +Most issues with upgrading occur due to either failing to upgrade the leader agent last +or due to failing to wait for a follower agent to fully rejoin a cluster before moving +on to another server. This can cause a loss of quorum and occasionally can result in +all of your servers attempting to kick off leadership elections endlessly without ever +reaching a quorum and electing a leader. + +Most of these problems can be solved by following the steps outlined in our +[Outage Recovery](https://learn.hashicorp.com/tutorials/consul/recovery-outage) document. +If you're still having trouble after trying the recovery steps outlined there, +then you can get further assistance by: + +- OSS users without paid support plans can request help in our [Community Forum](https://discuss.hashicorp.com/c/consul/29) +- Enterprise users with paid support plans can contact [HashiCorp Support](https://support.hashicorp.com/) + +If you end up contacting support, please make sure you include the following information +in your support ticket: + +- Consul version you were upgrading FROM and TO. +- [Debug level logs](/docs/agent/options.html#_log_level) from all servers in the cluster + that you're having trouble with. These should include logs from prior to the upgrade attempt + up through the current time. If your logs were not set at debug level prior to the + upgrade, please include those logs anyways, but also update your config to use debug logs + and include logs from after that was done as well. +- Your Consul config files (please redact any secrets). +- Output from `consul members -detailed` and `consul operator raft list-peers` from each + server in your cluster. diff --git a/website/pages/docs/upgrading/instructions/index.mdx b/website/pages/docs/upgrading/instructions/index.mdx new file mode 100644 index 0000000000..6ca888249e --- /dev/null +++ b/website/pages/docs/upgrading/instructions/index.mdx @@ -0,0 +1,34 @@ +--- +layout: docs +page_title: Upgrade Instructions +sidebar_title: Upgrade Instructions +description: >- + Specific versions of Consul may have additional information about the upgrade + process beyond the standard flow. +--- + +# Upgrade Instructions + +This document is intended to help customers who find themselves many versions behind to upgrade safely. +Our recommended upgrade path is moving from version 0.8.5 to 1.2.4 to 1.6.9 to the current version. To get +started, you'll want to choose the version you're currently on below and then follow the instructions +until you're on the latest version. The upgrade guides will mention notable changes and link to relevant +changelogs – we recommend reading through the changelog for versions between the one you're on and the +one you're upgrading to at each step to familiarize yourself with changes. + +## Getting Started + +To get instructions for your upgrade, please choose the release series you're _currently using_: + +- [0.8.x](/docs/upgrading/instructions/upgrade-to-1-2-x) +- [0.9.x](/docs/upgrading/instructions/upgrade-to-1-2-x) +- [1.0.x](/docs/upgrading/instructions/upgrade-to-1-2-x) +- [1.1.x](/docs/upgrading/instructions/upgrade-to-1-2-x) +- [1.2.x](/docs/upgrading/instructions/upgrade-to-1-6-x) +- [1.3.x](/docs/upgrading/instructions/upgrade-to-1-6-x) +- [1.4.x](/docs/upgrading/instructions/upgrade-to-1-6-x) +- [1.5.x](/docs/upgrading/instructions/upgrade-to-1-6-x) +- [1.6.x](/docs/upgrading/instructions/upgrade-to-1-8-x) +- [1.7.x](/docs/upgrading/instructions/upgrade-to-1-8-x) + +If you're using <= 0.7.x, please [contact support](https://support.hashicorp.com) for assistance. diff --git a/website/pages/docs/upgrading/instructions/upgrade-to-1-2-x.mdx b/website/pages/docs/upgrading/instructions/upgrade-to-1-2-x.mdx new file mode 100644 index 0000000000..4d285370e8 --- /dev/null +++ b/website/pages/docs/upgrading/instructions/upgrade-to-1-2-x.mdx @@ -0,0 +1,132 @@ +--- +layout: docs +page_title: Upgrading to 1.2.4 +sidebar_title: Upgrading to 1.2.4 +description: >- + Specific versions of Consul may have additional information about the upgrade + process beyond the standard flow. +--- + +# Upgrading to 1.2.4 + +## Introduction + +This guide explains how to best upgrade a multi-datacenter Consul deployment that's using +a version of Consul >= 0.8.5 and < 1.2.4 while maintaining replication. If you're on a version +older than 0.8.5, but are in the 0.8.x series, please upgrade to 0.8.5 by following our +[General Upgrade Process](/docs/upgrading/instructions/general-process). If you're on a version +older than 0.8.0, please [contact support](https://support.hashicorp.com). As there weren't +any major breaking changes, this upgrade shoul be fairly simple. + +In this guide, we'll be using an example with two datacenters (DCs) and will be +referring to them as DC1 and DC2. DC1 will be the primary datacenter. + +## Requirements + +- All Consul servers should be on a version of Consul >= 0.8.5 and < 1.2.4. +- You need a Consul cluster with at least 3 nodes to perform this upgrade as documented. If + you either have a single node cluster or several single node clusters joined via WAN, the + servers will come up in a `No cluster leader` loop after upgrading. If that happens, you'll + need to recover the cluster using the method described [here](https://learn.hashicorp.com/tutorials/consul/recovery-outage#manual-recovery-using-peers-json). + You can avoid this issue entirely by growing your cluster to 3 nodes prior to upgrading. + +## Assumptions + +This guides makes the following assumptions: + +- You have at least two datacenters configured and have ACL replication enabled. If you're + not using multiple datacenters, you can follow along and simply skip the instructions related + to replication. + +## Considerations + +There aren't too many major changes that might cause issues upgrading from 1.0.8, but notable changes +are called out in our [Specific Version Details](/docs/upgrading/upgrade-specific#consul-1-1-0) +page. You can find more granular details in the full [changelog](https://github.com/hashicorp/consul/blob/master/CHANGELOG.md#124-november-27-2018). +Looking through these changes prior to upgrading is highly recommended. + +## Procedure + +**1.** Check replication status in DC1 by running the following curl command from a +consul server in that DC: + +```shell +curl -s -H "X-Consul-Token: $MASTER_TOKEN" localhost:8500/v1/acl/replication?pretty +``` + +You should see output that looks like this: + +```json +{ + "Enabled": false, + "Running": false, + "SourceDatacenter": "", + "ReplicatedIndex": 0, + "LastSuccess": "0001-01-01T00:00:00Z", + "LastError": "0001-01-01T00:00:00Z" +} +``` + +-> The primary datacenter (indicated by `acl_datacenter`) will always show as having replication +disabled, so this is normal even if replication is happening. + +**2.** Check replication status in DC2 by running the following curl command from a +consul server in that DC: + +```shell +curl -s -H "X-Consul-Token: $MASTER_TOKEN" localhost:8500/v1/acl/replication?pretty +``` + +You should see output that looks like this: + +```json +{ + "Enabled": true, + "Running": true, + "SourceDatacenter": "dc1", + "ReplicatedIndex": 9, + "LastSuccess": "2020-09-10T21:16:15Z", + "LastError": "0001-01-01T00:00:00Z" +} +``` + +**3.** Upgrade the Consul agents in all DCs to version 1.2.4 by following our [General Upgrade Process](/docs/upgrading/instructions/general-process). +This should be done one DC at a time, leaving the primary DC for last. + +**4.** Confirm that replication is still working in DC2 by running the following curl command from a +consul server in that DC: + +```shell +curl -s -H "X-Consul-Token: $MASTER_TOKEN" localhost:8500/v1/acl/replication?pretty +``` + +You should see output that looks like this: + +```json +{ + "Enabled": true, + "Running": true, + "SourceDatacenter": "dc1", + "ReplicatedIndex": 9, + "LastSuccess": "2020-09-10T21:16:15Z", + "LastError": "0001-01-01T00:00:00Z" +} +``` + +## Post-Upgrade Configuration Changes + +If you moved from a pre-1.0.0 version of Consul, you'll find that _many_ of the configuration +options were renamed. Backwards compatibility has been maintained, so your old config options +will continue working after upgrading, but you'll want to update those now to avoid issues when +moving to newer versions. + +You can find the full list of changes here: + +- https://www.consul.io/docs/upgrading/upgrade-specific#deprecated-options-have-been-removed + +You can make sure your config changes are valid by copying your existing configuration files, +making the changes, and then verifing them by running `consul validate $CONFIG_FILE1_PATH $CONFIG_FILE2_PATH ...`. + +Once your config is passing the validation check, replace your old config files with the new ones +and slowly roll your cluster again one server at a time – leaving the leader agent for last in each +datacenter. diff --git a/website/pages/docs/upgrading/instructions/upgrade-to-1-6-x.mdx b/website/pages/docs/upgrading/instructions/upgrade-to-1-6-x.mdx new file mode 100644 index 0000000000..687220a65a --- /dev/null +++ b/website/pages/docs/upgrading/instructions/upgrade-to-1-6-x.mdx @@ -0,0 +1,228 @@ +--- +layout: docs +page_title: Upgrading to 1.6.9 +sidebar_title: Upgrading to 1.6.9 +description: >- + Specific versions of Consul may have additional information about the upgrade + process beyond the standard flow. +--- + +# Upgrading to 1.6.9 + +## Introduction + +This guide explains how to best upgrade a multi-datacenter Consul deployment that's using +a version of Consul >= 1.2.4 and < 1.6.9 while maintaining replication. If you're on a version +older than 1.2.4, please take a look at our [Upgrading to 1.2.4](/docs/upgrading/instructions/upgrade-to-1-2-x) +guide. Due to changes to the ACL system, an ACL token migration will need to be performed +as part of this upgrade. The 1.6.x series is the last series that had support for legacy +ACL tokens, so this migration _must_ happen before upgrading past the 1.6.x release series. +Here is some documentation that may prove useful for reference during this upgrade process: + +- [ACL System in Legacy Mode](https://www.consul.io/docs/acl/acl-legacy) - You can find + information about legacy configuration options and differences between modes here. +- [Configuration](https://www.consul.io/docs/agent/options) - You can find more details + around legacy ACL and new ACL configuration options here. Legacy ACL config options + will be listed as deprecates as of 1.4.0. + +In this guide, we'll be using an example with two datacenters (DCs) and will be +referring to them as DC1 and DC2. DC1 will be the primary datacenter. + +## Requirements + +- All Consul servers should be on a version of Consul >= 1.2.4 and < 1.6.9. + +## Assumptions + +This guides makes the following assumptions: + +- You have at least two datacenters configured and have ACL replication enabled. If you're + not using multiple datacenters, you can follow along and simply skip the instructions related + to replication. +- You have not already performed the ACL token migration. If you have, please skip all related + steps. + +## Considerations + +There are quite a number of changes between releases. Notable changes +are called out in our [Specific Version Details](/docs/upgrading/upgrade-specific#consul-1-6-3) +page. You can find more granular details in the full [changelog](https://github.com/hashicorp/consul/blob/master/CHANGELOG.md#124-november-27-2018). +Looking through these changes prior to upgrading is highly recommended. + +Two very notable items are: + +- 1.6.2 introduced more strict JSON decoding. Invalid JSON that was previously ignored might result in errors now (e.g., `Connect: null` in service definitions). See [[GH#6680](https://github.com/hashicorp/consul/pull/6680)]. +- 1.6.3 introduced the [http_max_conns_per_client](https://www.consul.io/docs/agent/options.html#http_max_conns_per_client) limit. This defaults to 200. Prior to this, connections per client were unbounded. [[GH#7159](https://github.com/hashicorp/consul/issues/7159)] + +## Procedure + +**1.** Check replication status in DC1 by running the following curl command from a +consul server in that DC: + +```shell +curl -s -H "X-Consul-Token: $MASTER_TOKEN" localhost:8500/v1/acl/replication?pretty +``` + +You should see output that looks like this: + +```json +{ + "Enabled": false, + "Running": false, + "SourceDatacenter": "", + "ReplicatedIndex": 0, + "LastSuccess": "0001-01-01T00:00:00Z", + "LastError": "0001-01-01T00:00:00Z" +} +``` + +-> The primary datacenter (indicated by `acl_datacenter`) will always show as having replication +disabled, so this is normal even if replication is happening. + +**2.** Check replication status in DC2 by running the following curl command from a +consul server in that DC: + +```shell +curl -s -H "X-Consul-Token: $MASTER_TOKEN" localhost:8500/v1/acl/replication?pretty +``` + +You should see output that looks like this: + +```json +{ + "Enabled": true, + "Running": true, + "SourceDatacenter": "dc1", + "ReplicatedIndex": 9, + "LastSuccess": "2020-09-10T21:16:15Z", + "LastError": "0001-01-01T00:00:00Z" +} +``` + +**3.** Upgrade DC2 agents to version 1.6.9 by following our [General Upgrade Process](/docs/upgrading/instructions/general-process). _**Leave all DC1 agents at 1.2.4.**_ You should start seeing log messages like this after that: + +```log +2020/09/08 15:51:29 [DEBUG] acl: Cannot upgrade to new ACLs, servers in acl datacenter have not upgraded - found servers: true, mode: 3 +2020/09/08 15:51:32 [ERR] consul: RPC failed to server 192.168.5.2:8300 in DC "dc1": rpc error making call: rpc: can't find service ConfigEntry.ListAll +``` + +!> **Warning:** _It's important to upgrade your primary datacenter **last**_ (the one +specified in `acl_datacenter`). If you upgrade the primary datacenter first, it will +break replication between your other datacenters. If you upgrade your other datacenters +first, they will run in legacy mode and replication from your primary datacenter will +continue working. + +**4.** Check to see if replication is still working in DC2. + +From a Consul server in DC2: + +```shell +curl -s -H "X-Consul-Token: $MASTER_TOKEN" localhost:8500/v1/acl/replication?pretty +curl -s -H "X-Consul-Token: $MASTER_TOKEN" localhost:8500/v1/acl/list?pretty +``` + +Take note of the `ReplicatedIndex` value. + +Create a new file containing the payload for creating a new token named `test-ui-token.json` +with the following contents: + +```json +{ + "Name": "UI Token", + "Type": "client", + "Rules": "key \"\" { policy = \"write\" } node \"\" { policy = \"read\" } service \"\" { policy = \"read\" }" +} +``` + +From a Consul server in DC1, create a new token using that file: + +```shell +curl -X PUT -H "X-Consul-Token: $MASTER_TOKEN" -d @test-ui-token.json localhost:8500/v1/acl/create +``` + +From a Consul server in DC2: + +```shell +curl -s -H "X-Consul-Token: $MASTER_TOKEN" localhost:8500/v1/acl/replication?pretty +curl -s -H "X-Consul-Token: $MASTER_TOKEN" localhost:8500/v1/acl/list?pretty +``` + +`ReplicatedIndex` should have incremented and you should see the new token listed. If you try using CLI ACL commands you'll see this error: + +```log +Failed to retrieve the token list: Unexpected response code: 500 (The ACL system is currently in legacy mode.) +``` + +This is because Consul is running in legacy mode. ACL CLI commands won't work and you have to hit the old ACL HTTP endpoints (which is why `curl` is being used above rather than the `consul` CLI client). + +**5.** Upgrade DC1 agents to version 1.6.9 by following our [General Upgrade Process](/docs/upgrading/instructions/general-process). + +Once this is complete, you should see a log entry like this from your server agents: + +```log +2020/09/10 22:11:49 [DEBUG] acl: transitioning out of legacy ACL mode +``` + +**6.** Confirm that replication is still working in DC2 by running the following curl command from a +consul server in that DC: + +```shell +curl -s -H "X-Consul-Token: $MASTER_TOKEN" localhost:8500/v1/acl/replication?pretty +``` + +You should see output that looks like this: + +```json +{ + "Enabled": true, + "Running": true, + "SourceDatacenter": "dc1", + "ReplicationType": "tokens", + "ReplicatedIndex": 259, + "ReplicatedRoleIndex": 1, + "ReplicatedTokenIndex": 260, + "LastSuccess": "2020-09-10T22:11:51Z", + "LastError": "2020-09-10T22:11:43Z" +} +``` + +**6.** Migrate your legacy ACL tokens to the new ACL system by following the instructions in our [ACL Token Migration guide](https://www.consul.io/docs/acl/acl-migrate-tokens). + +~> This step _must_ be completed before upgrading to a version higher than 1.6.x. + +## Post-Upgrade Configuration Changes + +When moving from a pre-1.4.0 version of Consul, you'll find that several of the ACL-related +configuration options were renamed. Backwards compatibility is maintained in the 1.6.x release +series, so you're old config options will continue working after upgrading, but you'll want to +update those now to avoid issues when moving to newer versions. + +These are the changes you'll need to make: + +- `acl_datacenter` is now named `primary_datacenter` (see [docs](https://www.consul.io/docs/agent/options#primary_datacenter) for more info) +- `acl_default_policy`, `acl_down_policy`, `acl_ttl`, `acl_*_token` and `enable_acl_replication` options are now specified like this (see [docs](https://www.consul.io/docs/agent/options#acl) for more info): + ```hcl + acl { + enabled = true/false + default_policy = "..." + down_policy = "..." + policy_ttl = "..." + role_ttl = "..." + enable_token_replication = true/false + enable_token_persistence = true/false + tokens { + master = "..." + agent = "..." + agent_master = "..." + replication = "..." + default = "..." + } + } + ``` + +You can make sure your config changes are valid by copying your existing configuration files, +making the changes, and then verifing them by running `consul validate $CONFIG_FILE1_PATH $CONFIG_FILE2_PATH ...`. + +Once your config is passing the validation check, replace your old config files with the new ones +and slowly roll your cluster again one server at a time – leaving the leader agent for last in each +datacenter. diff --git a/website/pages/docs/upgrading/instructions/upgrade-to-1-8-x.mdx b/website/pages/docs/upgrading/instructions/upgrade-to-1-8-x.mdx new file mode 100644 index 0000000000..5a0a7ec3de --- /dev/null +++ b/website/pages/docs/upgrading/instructions/upgrade-to-1-8-x.mdx @@ -0,0 +1,119 @@ +--- +layout: docs +page_title: Upgrading to 1.8.4 +sidebar_title: Upgrading to 1.8.4 +description: >- + Specific versions of Consul may have additional information about the upgrade + process beyond the standard flow. +--- + +# Upgrading to 1.8.4 + +## Introduction + +This guide explains how to best upgrade a multi-datacenter Consul deployment that's using +a version of Consul >= 1.6.9 and < 1.8.4 while maintaining replication. If you're on a version +older than 1.6.9, please follow the link for the version you're on from [here](/docs/upgrading/instructions). +As there weren't any major breaking changes, this upgrade will be fairly simple. + +In this guide, we'll be using an example with two datacenters (DCs) and will be +referring to them as DC1 and DC2. DC1 will be the primary datacenter. + +## Requirements + +- All Consul servers should be on a version of Consul >= 1.6.9 and < 1.8.4. + +## Assumptions + +This guides makes the following assumptions: + +- You have at least two datacenters configured and have ACL replication enabled. If you're + not using multiple datacenters, you can follow along and simply skip the instructions related + to replication. + +## Considerations + +There aren't too many major changes that might cause issues upgrading from 1.6.9, but notable changes +are called out in our [Specific Version Details](/docs/upgrading/upgrade-specific#consul-1-8-0) +page. You can find more granular details in the full [changelog](https://github.com/hashicorp/consul/blob/master/CHANGELOG.md#183-august-12-2020). +Looking through these changes prior to upgrading is highly recommended. + +## Procedure + +**1.** Check replication status in DC1 by running the following curl command from a +consul server in that DC: + +```shell +curl -s -H "X-Consul-Token: $MASTER_TOKEN" localhost:8500/v1/acl/replication?pretty +``` + +You should see output that looks like this: + +```json +{ + "Enabled": false, + "Running": false, + "SourceDatacenter": "", + "ReplicationType": "", + "ReplicatedIndex": 0, + "ReplicatedRoleIndex": 0, + "ReplicatedTokenIndex": 0, + "LastSuccess": "0001-01-01T00:00:00Z", + "LastError": "0001-01-01T00:00:00Z" +} +``` + +-> The primary datacenter (indicated by `primary_datacenter`) will always show as having replication +disabled, so this is normal even if replication is happening. + +**2.** Check replication status in DC2 by running the following curl command from a +consul server in that DC: + +```shell +curl -s -H "X-Consul-Token: $MASTER_TOKEN" localhost:8500/v1/acl/replication?pretty +``` + +You should see output that looks like this: + +```json +{ + "Enabled": true, + "Running": true, + "SourceDatacenter": "dc1", + "ReplicationType": "tokens", + "ReplicatedIndex": 672, + "ReplicatedRoleIndex": 1, + "ReplicatedTokenIndex": 677, + "LastSuccess": "2020-09-14T17:06:07Z", + "LastError": "2020-09-14T16:53:22Z" +} +``` + +**3.** Upgrade the Consul agents in all DCs to version 1.8.4 by following our [General Upgrade Process](/docs/upgrading/instructions/general-process). + +**4.** Confirm that replication is still working in DC2 by running the following curl command from a +consul server in that DC: + +```shell +curl -s -H "X-Consul-Token: $MASTER_TOKEN" localhost:8500/v1/acl/replication?pretty +``` + +You should see output that looks like this: + +```json +{ + "Enabled": true, + "Running": true, + "SourceDatacenter": "dc1", + "ReplicationType": "tokens", + "ReplicatedIndex": 672, + "ReplicatedRoleIndex": 1, + "ReplicatedTokenIndex": 677, + "LastSuccess": "2020-09-14T17:15:16Z", + "LastError": "0001-01-01T00:00:00Z" +} +``` + +## Post-Upgrade Configuration Changes + +No configuration changes are required for this upgrade. From 70ebf306a215a9e55e59ff3f2d4b2142a99cbadc Mon Sep 17 00:00:00 2001 From: Joel Watson Date: Tue, 15 Sep 2020 16:03:17 -0500 Subject: [PATCH 2/3] Language changes from review Co-Authored-By: Derek Strickland <1111455+DerekStrickland@users.noreply.github.com> --- website/pages/docs/upgrading/index.mdx | 19 +++--- .../instructions/general-process.mdx | 58 +++++++++---------- .../docs/upgrading/instructions/index.mdx | 12 ++-- .../instructions/upgrade-to-1-2-x.mdx | 39 ++++++------- .../instructions/upgrade-to-1-6-x.mdx | 50 ++++++++-------- .../instructions/upgrade-to-1-8-x.mdx | 25 ++++---- 6 files changed, 97 insertions(+), 106 deletions(-) diff --git a/website/pages/docs/upgrading/index.mdx b/website/pages/docs/upgrading/index.mdx index 04663bcadd..cc84ac3711 100644 --- a/website/pages/docs/upgrading/index.mdx +++ b/website/pages/docs/upgrading/index.mdx @@ -47,18 +47,13 @@ Consul is A, and version B is released. ## Large Version Jumps -Consul is a service that frequently sits in the critical path for many -or even all other applications and services you might have. As such, -it can be easy to get into a "if it's not broken, don't fix it" mentality -to avoid the potential risk in upgrading. Unfortunately, this can lead to -being many major versions behind and ultimately just makes upgrading -riskier overall. - -We encourage our customers to remain no more than two major versions behind -(i.e., if 1.8.x is the current release, don't use versions older than 1.6.x), -but in some environments that's not possible. If you find yourself in this -situation and need to upgrade, please see our [Upgrade Instructions page](/docs/upgrading/instructions) for -additional help. +Operating a Consul datacenter that is multiple major versions behind the current major version can increase the risk incured during +upgrades. We encourage our customers to remain no more than two major versions behind +(i.e., if 1.8.x is the current release, do not use versions older than 1.6.x). +If you find yourself in a situation +where you are many major versions behind, and need to upgrade, please review our +[Upgrade Instructions page](/docs/upgrading/instructions) for information on +how to perform those upgrades. ## Backward Incompatible Upgrades diff --git a/website/pages/docs/upgrading/instructions/general-process.mdx b/website/pages/docs/upgrading/instructions/general-process.mdx index 86b487238b..9f7d12b2c8 100644 --- a/website/pages/docs/upgrading/instructions/general-process.mdx +++ b/website/pages/docs/upgrading/instructions/general-process.mdx @@ -11,28 +11,27 @@ description: >- ## Introduction -Upgrading Consul is a relatively easy process, but there are some best -practices that you should follow when doing so. Some versions also have -steps that are specific to that version, so make sure you also look through -our [upgrade instructions](/docs/upgrading/instructions) for the version you're on. +This document describes some best practices that you should follow when +upgrading Consul. Some versions also have steps that are specific to that +version, so make sure you also review the [upgrade instructions](/docs/upgrading/instructions) +for the version you are on. ## Download the New Version -The first thing you need to do is download the binary for the new version -you want. +First, download the binary for the new version you want. -If you're after the Consul binary, you can find all current and past versions -of the OSS and Enterprise releases here: +All current and past versions of the OSS and Enterprise releases are +available here: - https://releases.hashicorp.com/consul -If you're using Docker containers, then you can find those here: +Docker containers are available at these locations: - **OSS:** https://hub.docker.com/_/consul - **Enterprise:** https://hub.docker.com/r/hashicorp/consul-enterprise @@ -40,7 +39,7 @@ If you're using Docker containers, then you can find those here: -If you're using Kubernetes, then please see our documentation for +If you are using Kubernetes, then please review our documentation for [Upgrading Consul on Kubernetes](/docs/k8s/operations/upgrading). @@ -60,7 +59,7 @@ You can inspect the snapshot to ensure if was successful with: consul snapshot inspect backup.snap ``` -You should see output similar to this: +Example output: ``` ID 2-1182-1542056499724 @@ -71,25 +70,24 @@ Version 1 ``` This will ensure you have a safe fallback option in case something goes wrong. Store -this snapshot somewhere safe. If you would like more information on snapshots, you -can find that here: +this snapshot somewhere safe. More documentation on snapshot usage is available here: - https://www.consul.io/docs/commands/snapshot - https://learn.hashicorp.com/tutorials/consul/backup-and-restore **2.** Temporarily modify your Consul configuration so that its [log_level](/docs/agent/options.html#_log_level) -is set to `debug`. After doing this, run `consul reload` on your servers. This will +is set to `debug`. After doing this, issue the `consul reload` command on your servers. This will give you more information to work with in the event something goes wrong. ## Perform the Upgrade -**1.** Run the following command to see which server is currently the leader: +**1.** Issue the following command to discover which server is currently the leader: ``` consul operator raft list-peers ``` -You should see output similar to this (exact formatting and may differ based on version): +You should receive output similar to this (exact formatting and content may differ based on version): ``` Node ID Address State Voter RaftProtocol @@ -105,23 +103,23 @@ binary with the new one. **3.** Perform a rolling restart of Consul on your servers, leaving the leader agent for last. Only restart one server at a time. After restarting each server, validate -that it has rejoined the cluster and is in sync with the leader by running `consul info` +that it has rejoined the cluster and is in sync with the leader by issuing the `consul info` command, and checking whether the `commit_index` and `last_log_index` fields have the same value. If done properly, this should avoid an unexpected leadership election due to loss of quorum. -~> It's important to run `consul leave` on each server node when shutting +~> It is important to issue a `consul leave` command on each server node when shutting Consul down. Make sure your service management system (e.g., systemd, upstart, etc.) is performing that action. If not, make sure you do it manually or you _will_ end up in a bad cluster state. **4.** Double-check that all servers are showing up in the cluster as expected and are on -the correct version by running: +the correct version by issuing: ``` consul members ``` -You should see output similar to this: +You should receive output similar to this: ``` Node Address Status Type Build Protocol DC @@ -130,13 +128,13 @@ dc1-node2 10.11.0.3:8301 alive server 1.8.3 2 dc1 dc1-node3 10.11.0.4:8301 alive server 1.8.3 2 dc1 ``` -Also double-check the raft state to make sure there's a leader and sufficient voters: +Also double-check the raft state to make sure there is a leader and sufficient voters: ``` consul operator raft list-peers ``` -Which should look similar to this: +You should receive output similar to this: ``` Node ID Address State Voter RaftProtocol @@ -145,12 +143,12 @@ dc1-node2 20e6be1b-f1cb-4aab-929f-f7d2d43d9a96 10.11.0.3:8300 follower true dc1-node3 658c343b-8769-431f-a71a-236f9dbb17b3 10.11.0.4:8300 follower true 3 ``` -**5.** Set your `log_level` back to what you had it at prior to the upgrade and run +**5.** Set your `log_level` back to what you had it at prior to the upgrade and issue `consul reload` again. ## Troubleshooting -Most issues with upgrading occur due to either failing to upgrade the leader agent last +Most problems with upgrading occur due to either failing to upgrade the leader agent last, or due to failing to wait for a follower agent to fully rejoin a cluster before moving on to another server. This can cause a loss of quorum and occasionally can result in all of your servers attempting to kick off leadership elections endlessly without ever @@ -158,21 +156,21 @@ reaching a quorum and electing a leader. Most of these problems can be solved by following the steps outlined in our [Outage Recovery](https://learn.hashicorp.com/tutorials/consul/recovery-outage) document. -If you're still having trouble after trying the recovery steps outlined there, -then you can get further assistance by: +If you are still having trouble after trying the recovery steps outlined there, +then these options for further assistance are available: - OSS users without paid support plans can request help in our [Community Forum](https://discuss.hashicorp.com/c/consul/29) -- Enterprise users with paid support plans can contact [HashiCorp Support](https://support.hashicorp.com/) +- Enterprise and OSS users with paid support plans can contact [HashiCorp Support](https://support.hashicorp.com/) If you end up contacting support, please make sure you include the following information in your support ticket: - Consul version you were upgrading FROM and TO. - [Debug level logs](/docs/agent/options.html#_log_level) from all servers in the cluster - that you're having trouble with. These should include logs from prior to the upgrade attempt + that you are having trouble with. These should include logs from prior to the upgrade attempt up through the current time. If your logs were not set at debug level prior to the - upgrade, please include those logs anyways, but also update your config to use debug logs - and include logs from after that was done as well. + upgrade, please include those logs as well. Also, update your config to use debug logs, + and include logs from after that was done. - Your Consul config files (please redact any secrets). - Output from `consul members -detailed` and `consul operator raft list-peers` from each server in your cluster. diff --git a/website/pages/docs/upgrading/instructions/index.mdx b/website/pages/docs/upgrading/instructions/index.mdx index 6ca888249e..5180dbf71a 100644 --- a/website/pages/docs/upgrading/instructions/index.mdx +++ b/website/pages/docs/upgrading/instructions/index.mdx @@ -11,14 +11,14 @@ description: >- This document is intended to help customers who find themselves many versions behind to upgrade safely. Our recommended upgrade path is moving from version 0.8.5 to 1.2.4 to 1.6.9 to the current version. To get -started, you'll want to choose the version you're currently on below and then follow the instructions -until you're on the latest version. The upgrade guides will mention notable changes and link to relevant -changelogs – we recommend reading through the changelog for versions between the one you're on and the -one you're upgrading to at each step to familiarize yourself with changes. +started, you will want to choose the version you are currently on below and then follow the instructions +until you are on the latest version. The upgrade guides will mention notable changes and link to relevant +changelogs – we recommend reviewing the changelog for versions between the one you are on and the +one you are upgrading to at each step to familiarize yourself with changes. ## Getting Started -To get instructions for your upgrade, please choose the release series you're _currently using_: +To get instructions for your upgrade, please choose the release series you are _currently using_: - [0.8.x](/docs/upgrading/instructions/upgrade-to-1-2-x) - [0.9.x](/docs/upgrading/instructions/upgrade-to-1-2-x) @@ -31,4 +31,4 @@ To get instructions for your upgrade, please choose the release series you're _c - [1.6.x](/docs/upgrading/instructions/upgrade-to-1-8-x) - [1.7.x](/docs/upgrading/instructions/upgrade-to-1-8-x) -If you're using <= 0.7.x, please [contact support](https://support.hashicorp.com) for assistance. +If you are using <= 0.7.x, please [contact support](https://support.hashicorp.com) for assistance. diff --git a/website/pages/docs/upgrading/instructions/upgrade-to-1-2-x.mdx b/website/pages/docs/upgrading/instructions/upgrade-to-1-2-x.mdx index 4d285370e8..658fcd6303 100644 --- a/website/pages/docs/upgrading/instructions/upgrade-to-1-2-x.mdx +++ b/website/pages/docs/upgrading/instructions/upgrade-to-1-2-x.mdx @@ -11,14 +11,13 @@ description: >- ## Introduction -This guide explains how to best upgrade a multi-datacenter Consul deployment that's using -a version of Consul >= 0.8.5 and < 1.2.4 while maintaining replication. If you're on a version +This guide explains how to best upgrade a multi-datacenter Consul deployment that is using +a version of Consul >= 0.8.5 and < 1.2.4 while maintaining replication. If you are on a version older than 0.8.5, but are in the 0.8.x series, please upgrade to 0.8.5 by following our -[General Upgrade Process](/docs/upgrading/instructions/general-process). If you're on a version -older than 0.8.0, please [contact support](https://support.hashicorp.com). As there weren't -any major breaking changes, this upgrade shoul be fairly simple. +[General Upgrade Process](/docs/upgrading/instructions/general-process). If you are on a version +older than 0.8.0, please [contact support](https://support.hashicorp.com). -In this guide, we'll be using an example with two datacenters (DCs) and will be +In this guide, we will be using an example with two datacenters (DCs) and will be referring to them as DC1 and DC2. DC1 will be the primary datacenter. ## Requirements @@ -26,35 +25,35 @@ referring to them as DC1 and DC2. DC1 will be the primary datacenter. - All Consul servers should be on a version of Consul >= 0.8.5 and < 1.2.4. - You need a Consul cluster with at least 3 nodes to perform this upgrade as documented. If you either have a single node cluster or several single node clusters joined via WAN, the - servers will come up in a `No cluster leader` loop after upgrading. If that happens, you'll + servers will come up in a `No cluster leader` loop after upgrading. If that happens, you will need to recover the cluster using the method described [here](https://learn.hashicorp.com/tutorials/consul/recovery-outage#manual-recovery-using-peers-json). You can avoid this issue entirely by growing your cluster to 3 nodes prior to upgrading. ## Assumptions -This guides makes the following assumptions: +This guide makes the following assumptions: -- You have at least two datacenters configured and have ACL replication enabled. If you're +- You have at least two datacenters configured and have ACL replication enabled. If you are not using multiple datacenters, you can follow along and simply skip the instructions related to replication. ## Considerations -There aren't too many major changes that might cause issues upgrading from 1.0.8, but notable changes +There are not too many major changes that might cause issues upgrading from 1.0.8, but notable changes are called out in our [Specific Version Details](/docs/upgrading/upgrade-specific#consul-1-1-0) page. You can find more granular details in the full [changelog](https://github.com/hashicorp/consul/blob/master/CHANGELOG.md#124-november-27-2018). Looking through these changes prior to upgrading is highly recommended. ## Procedure -**1.** Check replication status in DC1 by running the following curl command from a +**1.** Check replication status in DC1 by issuing the following curl command from a consul server in that DC: ```shell curl -s -H "X-Consul-Token: $MASTER_TOKEN" localhost:8500/v1/acl/replication?pretty ``` -You should see output that looks like this: +You should receive output similar to this: ```json { @@ -70,14 +69,14 @@ You should see output that looks like this: -> The primary datacenter (indicated by `acl_datacenter`) will always show as having replication disabled, so this is normal even if replication is happening. -**2.** Check replication status in DC2 by running the following curl command from a +**2.** Check replication status in DC2 by issuing the following curl command from a consul server in that DC: ```shell curl -s -H "X-Consul-Token: $MASTER_TOKEN" localhost:8500/v1/acl/replication?pretty ``` -You should see output that looks like this: +You should receive output similar to this: ```json { @@ -93,14 +92,14 @@ You should see output that looks like this: **3.** Upgrade the Consul agents in all DCs to version 1.2.4 by following our [General Upgrade Process](/docs/upgrading/instructions/general-process). This should be done one DC at a time, leaving the primary DC for last. -**4.** Confirm that replication is still working in DC2 by running the following curl command from a +**4.** Confirm that replication is still working in DC2 by issuing the following curl command from a consul server in that DC: ```shell curl -s -H "X-Consul-Token: $MASTER_TOKEN" localhost:8500/v1/acl/replication?pretty ``` -You should see output that looks like this: +You should receive output similar to this: ```json { @@ -115,17 +114,17 @@ You should see output that looks like this: ## Post-Upgrade Configuration Changes -If you moved from a pre-1.0.0 version of Consul, you'll find that _many_ of the configuration +If you moved from a pre-1.0.0 version of Consul, you will find that _many_ of the configuration options were renamed. Backwards compatibility has been maintained, so your old config options -will continue working after upgrading, but you'll want to update those now to avoid issues when +will continue working after upgrading, but you will want to update those now to avoid issues when moving to newer versions. -You can find the full list of changes here: +The full list of changes is available here: - https://www.consul.io/docs/upgrading/upgrade-specific#deprecated-options-have-been-removed You can make sure your config changes are valid by copying your existing configuration files, -making the changes, and then verifing them by running `consul validate $CONFIG_FILE1_PATH $CONFIG_FILE2_PATH ...`. +making the changes, and then verifing them by using `consul validate $CONFIG_FILE1_PATH $CONFIG_FILE2_PATH ...`. Once your config is passing the validation check, replace your old config files with the new ones and slowly roll your cluster again one server at a time – leaving the leader agent for last in each diff --git a/website/pages/docs/upgrading/instructions/upgrade-to-1-6-x.mdx b/website/pages/docs/upgrading/instructions/upgrade-to-1-6-x.mdx index 687220a65a..97065259b4 100644 --- a/website/pages/docs/upgrading/instructions/upgrade-to-1-6-x.mdx +++ b/website/pages/docs/upgrading/instructions/upgrade-to-1-6-x.mdx @@ -11,9 +11,9 @@ description: >- ## Introduction -This guide explains how to best upgrade a multi-datacenter Consul deployment that's using -a version of Consul >= 1.2.4 and < 1.6.9 while maintaining replication. If you're on a version -older than 1.2.4, please take a look at our [Upgrading to 1.2.4](/docs/upgrading/instructions/upgrade-to-1-2-x) +This guide explains how to best upgrade a multi-datacenter Consul deployment that is using +a version of Consul >= 1.2.4 and < 1.6.9 while maintaining replication. If you are on a version +older than 1.2.4, please review our [Upgrading to 1.2.4](/docs/upgrading/instructions/upgrade-to-1-2-x) guide. Due to changes to the ACL system, an ACL token migration will need to be performed as part of this upgrade. The 1.6.x series is the last series that had support for legacy ACL tokens, so this migration _must_ happen before upgrading past the 1.6.x release series. @@ -25,7 +25,7 @@ Here is some documentation that may prove useful for reference during this upgra around legacy ACL and new ACL configuration options here. Legacy ACL config options will be listed as deprecates as of 1.4.0. -In this guide, we'll be using an example with two datacenters (DCs) and will be +In this guide, we will be using an example with two datacenters (DCs) and will be referring to them as DC1 and DC2. DC1 will be the primary datacenter. ## Requirements @@ -34,9 +34,9 @@ referring to them as DC1 and DC2. DC1 will be the primary datacenter. ## Assumptions -This guides makes the following assumptions: +This guide makes the following assumptions: -- You have at least two datacenters configured and have ACL replication enabled. If you're +- You have at least two datacenters configured and have ACL replication enabled. If you are not using multiple datacenters, you can follow along and simply skip the instructions related to replication. - You have not already performed the ACL token migration. If you have, please skip all related @@ -56,14 +56,14 @@ Two very notable items are: ## Procedure -**1.** Check replication status in DC1 by running the following curl command from a +**1.** Check replication status in DC1 by issuing the following curl command from a consul server in that DC: ```shell curl -s -H "X-Consul-Token: $MASTER_TOKEN" localhost:8500/v1/acl/replication?pretty ``` -You should see output that looks like this: +You should receive output similar to this: ```json { @@ -79,14 +79,14 @@ You should see output that looks like this: -> The primary datacenter (indicated by `acl_datacenter`) will always show as having replication disabled, so this is normal even if replication is happening. -**2.** Check replication status in DC2 by running the following curl command from a +**2.** Check replication status in DC2 by issuing the following curl command from a consul server in that DC: ```shell curl -s -H "X-Consul-Token: $MASTER_TOKEN" localhost:8500/v1/acl/replication?pretty ``` -You should see output that looks like this: +You should receive output similar to this: ```json { @@ -99,20 +99,20 @@ You should see output that looks like this: } ``` -**3.** Upgrade DC2 agents to version 1.6.9 by following our [General Upgrade Process](/docs/upgrading/instructions/general-process). _**Leave all DC1 agents at 1.2.4.**_ You should start seeing log messages like this after that: +**3.** Upgrade DC2 agents to version 1.6.9 by following our [General Upgrade Process](/docs/upgrading/instructions/general-process). _**Leave all DC1 agents at 1.2.4.**_ You should start observing log messages like this after that: ```log 2020/09/08 15:51:29 [DEBUG] acl: Cannot upgrade to new ACLs, servers in acl datacenter have not upgraded - found servers: true, mode: 3 2020/09/08 15:51:32 [ERR] consul: RPC failed to server 192.168.5.2:8300 in DC "dc1": rpc error making call: rpc: can't find service ConfigEntry.ListAll ``` -!> **Warning:** _It's important to upgrade your primary datacenter **last**_ (the one +!> **Warning:** _It is important to upgrade your primary datacenter **last**_ (the one specified in `acl_datacenter`). If you upgrade the primary datacenter first, it will break replication between your other datacenters. If you upgrade your other datacenters -first, they will run in legacy mode and replication from your primary datacenter will +first, they will be in legacy mode and replication from your primary datacenter will continue working. -**4.** Check to see if replication is still working in DC2. +**4.** Check that replication is still working in DC2. From a Consul server in DC2: @@ -147,30 +147,30 @@ curl -s -H "X-Consul-Token: $MASTER_TOKEN" localhost:8500/v1/acl/replication?pre curl -s -H "X-Consul-Token: $MASTER_TOKEN" localhost:8500/v1/acl/list?pretty ``` -`ReplicatedIndex` should have incremented and you should see the new token listed. If you try using CLI ACL commands you'll see this error: +`ReplicatedIndex` should have incremented and you should find the new token listed. If you try using CLI ACL commands you will receive this error: ```log Failed to retrieve the token list: Unexpected response code: 500 (The ACL system is currently in legacy mode.) ``` -This is because Consul is running in legacy mode. ACL CLI commands won't work and you have to hit the old ACL HTTP endpoints (which is why `curl` is being used above rather than the `consul` CLI client). +This is because Consul in legacy mode. ACL CLI commands will not work and you have to hit the old ACL HTTP endpoints (which is why `curl` is being used above rather than the `consul` CLI client). **5.** Upgrade DC1 agents to version 1.6.9 by following our [General Upgrade Process](/docs/upgrading/instructions/general-process). -Once this is complete, you should see a log entry like this from your server agents: +Once this is complete, you should observe a log entry like this from your server agents: ```log 2020/09/10 22:11:49 [DEBUG] acl: transitioning out of legacy ACL mode ``` -**6.** Confirm that replication is still working in DC2 by running the following curl command from a +**6.** Confirm that replication is still working in DC2 by issuing the following curl command from a consul server in that DC: ```shell curl -s -H "X-Consul-Token: $MASTER_TOKEN" localhost:8500/v1/acl/replication?pretty ``` -You should see output that looks like this: +You should receive output similar to this: ```json { @@ -192,15 +192,15 @@ You should see output that looks like this: ## Post-Upgrade Configuration Changes -When moving from a pre-1.4.0 version of Consul, you'll find that several of the ACL-related +When moving from a pre-1.4.0 version of Consul, you will find that several of the ACL-related configuration options were renamed. Backwards compatibility is maintained in the 1.6.x release -series, so you're old config options will continue working after upgrading, but you'll want to +series, so you are old config options will continue working after upgrading, but you will want to update those now to avoid issues when moving to newer versions. -These are the changes you'll need to make: +These are the changes you will need to make: -- `acl_datacenter` is now named `primary_datacenter` (see [docs](https://www.consul.io/docs/agent/options#primary_datacenter) for more info) -- `acl_default_policy`, `acl_down_policy`, `acl_ttl`, `acl_*_token` and `enable_acl_replication` options are now specified like this (see [docs](https://www.consul.io/docs/agent/options#acl) for more info): +- `acl_datacenter` is now named `primary_datacenter` (review our [docs](https://www.consul.io/docs/agent/options#primary_datacenter) for more info) +- `acl_default_policy`, `acl_down_policy`, `acl_ttl`, `acl_*_token` and `enable_acl_replication` options are now specified like this (review our [docs](https://www.consul.io/docs/agent/options#acl) for more info): ```hcl acl { enabled = true/false @@ -221,7 +221,7 @@ These are the changes you'll need to make: ``` You can make sure your config changes are valid by copying your existing configuration files, -making the changes, and then verifing them by running `consul validate $CONFIG_FILE1_PATH $CONFIG_FILE2_PATH ...`. +making the changes, and then verifing them by using `consul validate $CONFIG_FILE1_PATH $CONFIG_FILE2_PATH ...`. Once your config is passing the validation check, replace your old config files with the new ones and slowly roll your cluster again one server at a time – leaving the leader agent for last in each diff --git a/website/pages/docs/upgrading/instructions/upgrade-to-1-8-x.mdx b/website/pages/docs/upgrading/instructions/upgrade-to-1-8-x.mdx index 5a0a7ec3de..63bc4c740c 100644 --- a/website/pages/docs/upgrading/instructions/upgrade-to-1-8-x.mdx +++ b/website/pages/docs/upgrading/instructions/upgrade-to-1-8-x.mdx @@ -11,12 +11,11 @@ description: >- ## Introduction -This guide explains how to best upgrade a multi-datacenter Consul deployment that's using -a version of Consul >= 1.6.9 and < 1.8.4 while maintaining replication. If you're on a version -older than 1.6.9, please follow the link for the version you're on from [here](/docs/upgrading/instructions). -As there weren't any major breaking changes, this upgrade will be fairly simple. +This guide explains how to best upgrade a multi-datacenter Consul deployment that is using +a version of Consul >= 1.6.9 and < 1.8.4 while maintaining replication. If you are on a version +older than 1.6.9, please follow the link for the version you are on from [here](/docs/upgrading/instructions). -In this guide, we'll be using an example with two datacenters (DCs) and will be +In this guide, we will be using an example with two datacenters (DCs) and will be referring to them as DC1 and DC2. DC1 will be the primary datacenter. ## Requirements @@ -27,27 +26,27 @@ referring to them as DC1 and DC2. DC1 will be the primary datacenter. This guides makes the following assumptions: -- You have at least two datacenters configured and have ACL replication enabled. If you're +- You have at least two datacenters configured and have ACL replication enabled. If you are not using multiple datacenters, you can follow along and simply skip the instructions related to replication. ## Considerations -There aren't too many major changes that might cause issues upgrading from 1.6.9, but notable changes +There are not too many major changes that might cause issues upgrading from 1.6.9, but notable changes are called out in our [Specific Version Details](/docs/upgrading/upgrade-specific#consul-1-8-0) page. You can find more granular details in the full [changelog](https://github.com/hashicorp/consul/blob/master/CHANGELOG.md#183-august-12-2020). Looking through these changes prior to upgrading is highly recommended. ## Procedure -**1.** Check replication status in DC1 by running the following curl command from a +**1.** Check replication status in DC1 by issuing the following curl command from a consul server in that DC: ```shell curl -s -H "X-Consul-Token: $MASTER_TOKEN" localhost:8500/v1/acl/replication?pretty ``` -You should see output that looks like this: +You should receive output similar to this: ```json { @@ -66,14 +65,14 @@ You should see output that looks like this: -> The primary datacenter (indicated by `primary_datacenter`) will always show as having replication disabled, so this is normal even if replication is happening. -**2.** Check replication status in DC2 by running the following curl command from a +**2.** Check replication status in DC2 by issuing the following curl command from a consul server in that DC: ```shell curl -s -H "X-Consul-Token: $MASTER_TOKEN" localhost:8500/v1/acl/replication?pretty ``` -You should see output that looks like this: +You should receive output similar to this: ```json { @@ -91,14 +90,14 @@ You should see output that looks like this: **3.** Upgrade the Consul agents in all DCs to version 1.8.4 by following our [General Upgrade Process](/docs/upgrading/instructions/general-process). -**4.** Confirm that replication is still working in DC2 by running the following curl command from a +**4.** Confirm that replication is still working in DC2 by issuing the following curl command from a consul server in that DC: ```shell curl -s -H "X-Consul-Token: $MASTER_TOKEN" localhost:8500/v1/acl/replication?pretty ``` -You should see output that looks like this: +You should receive output similar to this: ```json { From 2265ad41be0f67d4870e6a3e4f879b6de8e559ab Mon Sep 17 00:00:00 2001 From: Joel Watson Date: Fri, 18 Sep 2020 15:18:46 -0500 Subject: [PATCH 3/3] Incorporate changes from engineering review --- website/pages/docs/upgrading/index.mdx | 10 ++-- .../instructions/general-process.mdx | 52 +++++++++++++------ .../docs/upgrading/instructions/index.mdx | 7 ++- .../instructions/upgrade-to-1-6-x.mdx | 2 +- 4 files changed, 46 insertions(+), 25 deletions(-) diff --git a/website/pages/docs/upgrading/index.mdx b/website/pages/docs/upgrading/index.mdx index cc84ac3711..a9875bdf1e 100644 --- a/website/pages/docs/upgrading/index.mdx +++ b/website/pages/docs/upgrading/index.mdx @@ -47,11 +47,11 @@ Consul is A, and version B is released. ## Large Version Jumps -Operating a Consul datacenter that is multiple major versions behind the current major version can increase the risk incured during -upgrades. We encourage our customers to remain no more than two major versions behind -(i.e., if 1.8.x is the current release, do not use versions older than 1.6.x). -If you find yourself in a situation -where you are many major versions behind, and need to upgrade, please review our +Operating a Consul datacenter that is multiple major versions behind the current major +version can increase the risk incurred during upgrades. We encourage our users to +remain no more than two major versions behind (i.e., if 1.8.x is the current release, +do not use versions older than 1.6.x). If you find yourself in a situation where you +are many major versions behind, and need to upgrade, please review our [Upgrade Instructions page](/docs/upgrading/instructions) for information on how to perform those upgrades. diff --git a/website/pages/docs/upgrading/instructions/general-process.mdx b/website/pages/docs/upgrading/instructions/general-process.mdx index 9f7d12b2c8..bcebb94dc9 100644 --- a/website/pages/docs/upgrading/instructions/general-process.mdx +++ b/website/pages/docs/upgrading/instructions/general-process.mdx @@ -76,8 +76,14 @@ this snapshot somewhere safe. More documentation on snapshot usage is available - https://learn.hashicorp.com/tutorials/consul/backup-and-restore **2.** Temporarily modify your Consul configuration so that its [log_level](/docs/agent/options.html#_log_level) -is set to `debug`. After doing this, issue the `consul reload` command on your servers. This will -give you more information to work with in the event something goes wrong. +is set to `debug`. After doing this, issue the following command on your servers to +reload the configuration: + +``` +consul reload +``` + +This change will give you more information to work with in the event something goes wrong. ## Perform the Upgrade @@ -101,16 +107,25 @@ Take note of which agent is the leader. **2.** Copy the new `consul` binary onto your servers and replace the existing binary with the new one. -**3.** Perform a rolling restart of Consul on your servers, leaving the leader agent -for last. Only restart one server at a time. After restarting each server, validate -that it has rejoined the cluster and is in sync with the leader by issuing the `consul info` command, -and checking whether the `commit_index` and `last_log_index` fields have the same value. -If done properly, this should avoid an unexpected leadership election due to loss of quorum. +**3.** The following steps must be done in order on the server agents, leaving the leader +agent for last. First force the server agent to leave the cluster with the following command: -~> It is important to issue a `consul leave` command on each server node when shutting -Consul down. Make sure your service management system (e.g., systemd, upstart, etc.) is -performing that action. If not, make sure you do it manually or you _will_ end up in a -bad cluster state. +``` +consul leave +``` + +Then, use a service management system (e.g., systemd, upstart, etc.) to restart the Consul service. If +you are not using a service management system, you must restart the agent manually. + +To validate that the agent has rejoined the cluster and is in sync with the leader, issue the +following command: + +``` +consul info +``` + +Check whether the `commit_index` and `last_log_index` fields have the same value. If done properly, +this should avoid an unexpected leadership election due to loss of quorum. **4.** Double-check that all servers are showing up in the cluster as expected and are on the correct version by issuing: @@ -143,13 +158,17 @@ dc1-node2 20e6be1b-f1cb-4aab-929f-f7d2d43d9a96 10.11.0.3:8300 follower true dc1-node3 658c343b-8769-431f-a71a-236f9dbb17b3 10.11.0.4:8300 follower true 3 ``` -**5.** Set your `log_level` back to what you had it at prior to the upgrade and issue -`consul reload` again. +**5.** Set your `log_level` back to its original value and issue the following command +on your servers to reload the configuration: + +``` +consul reload +``` ## Troubleshooting Most problems with upgrading occur due to either failing to upgrade the leader agent last, -or due to failing to wait for a follower agent to fully rejoin a cluster before moving +or failing to wait for a follower agent to fully rejoin a cluster before moving on to another server. This can cause a loss of quorum and occasionally can result in all of your servers attempting to kick off leadership elections endlessly without ever reaching a quorum and electing a leader. @@ -157,13 +176,12 @@ reaching a quorum and electing a leader. Most of these problems can be solved by following the steps outlined in our [Outage Recovery](https://learn.hashicorp.com/tutorials/consul/recovery-outage) document. If you are still having trouble after trying the recovery steps outlined there, -then these options for further assistance are available: +then the following options for further assistance are available: - OSS users without paid support plans can request help in our [Community Forum](https://discuss.hashicorp.com/c/consul/29) - Enterprise and OSS users with paid support plans can contact [HashiCorp Support](https://support.hashicorp.com/) -If you end up contacting support, please make sure you include the following information -in your support ticket: +When contacting Hashicorp Support, please include the following information in your ticket: - Consul version you were upgrading FROM and TO. - [Debug level logs](/docs/agent/options.html#_log_level) from all servers in the cluster diff --git a/website/pages/docs/upgrading/instructions/index.mdx b/website/pages/docs/upgrading/instructions/index.mdx index 5180dbf71a..ece299c7b3 100644 --- a/website/pages/docs/upgrading/instructions/index.mdx +++ b/website/pages/docs/upgrading/instructions/index.mdx @@ -9,7 +9,7 @@ description: >- # Upgrade Instructions -This document is intended to help customers who find themselves many versions behind to upgrade safely. +This document is intended to help users who find themselves many versions behind to upgrade safely. Our recommended upgrade path is moving from version 0.8.5 to 1.2.4 to 1.6.9 to the current version. To get started, you will want to choose the version you are currently on below and then follow the instructions until you are on the latest version. The upgrade guides will mention notable changes and link to relevant @@ -31,4 +31,7 @@ To get instructions for your upgrade, please choose the release series you are _ - [1.6.x](/docs/upgrading/instructions/upgrade-to-1-8-x) - [1.7.x](/docs/upgrading/instructions/upgrade-to-1-8-x) -If you are using <= 0.7.x, please [contact support](https://support.hashicorp.com) for assistance. +If you are using <= 0.7.x, please contact support for assistance: + +- OSS users without paid support plans can request help in our [Community Forum](https://discuss.hashicorp.com/c/consul/29) +- Enterprise and OSS users with paid support plans can contact [HashiCorp Support](https://support.hashicorp.com/) diff --git a/website/pages/docs/upgrading/instructions/upgrade-to-1-6-x.mdx b/website/pages/docs/upgrading/instructions/upgrade-to-1-6-x.mdx index 97065259b4..a441e87a41 100644 --- a/website/pages/docs/upgrading/instructions/upgrade-to-1-6-x.mdx +++ b/website/pages/docs/upgrading/instructions/upgrade-to-1-6-x.mdx @@ -56,7 +56,7 @@ Two very notable items are: ## Procedure -**1.** Check replication status in DC1 by issuing the following curl command from a +**1.** Check the replication status of the primary datacenter (DC1) by issuing the following curl command from a consul server in that DC: ```shell