mirror of https://github.com/hashicorp/consul
195 lines
6.2 KiB
Plaintext
195 lines
6.2 KiB
Plaintext
|
---
|
||
|
layout: docs
|
||
|
page_title: General Upgrade Process
|
||
|
sidebar_title: General Process
|
||
|
description: >-
|
||
|
Specific versions of Consul may have additional information about the upgrade
|
||
|
process beyond the standard flow.
|
||
|
---
|
||
|
|
||
|
# General Upgrade Process
|
||
|
|
||
|
## Introduction
|
||
|
|
||
|
This document describes some best practices that you should follow when
|
||
|
upgrading Consul. Some versions also have steps that are specific to that
|
||
|
version, so make sure you also review the [upgrade instructions](/docs/upgrading/instructions)
|
||
|
for the version you are on.
|
||
|
|
||
|
## Download the New Version
|
||
|
|
||
|
First, download the binary for the new version you want.
|
||
|
|
||
|
<Tabs>
|
||
|
<Tab heading="Binary">
|
||
|
|
||
|
All current and past versions of the OSS and Enterprise releases are
|
||
|
available here:
|
||
|
|
||
|
- https://releases.hashicorp.com/consul
|
||
|
|
||
|
</Tab>
|
||
|
<Tab heading="Docker">
|
||
|
|
||
|
Docker containers are available at these locations:
|
||
|
|
||
|
- **OSS:** https://hub.docker.com/_/consul
|
||
|
- **Enterprise:** https://hub.docker.com/r/hashicorp/consul-enterprise
|
||
|
|
||
|
</Tab>
|
||
|
<Tab heading="Kubernetes">
|
||
|
|
||
|
If you are using Kubernetes, then please review our documentation for
|
||
|
[Upgrading Consul on Kubernetes](/docs/k8s/operations/upgrading).
|
||
|
|
||
|
</Tab>
|
||
|
</Tabs>
|
||
|
|
||
|
## Prepare for the Upgrade
|
||
|
|
||
|
**1.** Take a snapshot:
|
||
|
|
||
|
```
|
||
|
consul snapshot save backup.snap
|
||
|
```
|
||
|
|
||
|
You can inspect the snapshot to ensure if was successful with:
|
||
|
|
||
|
```
|
||
|
consul snapshot inspect backup.snap
|
||
|
```
|
||
|
|
||
|
Example output:
|
||
|
|
||
|
```
|
||
|
ID 2-1182-1542056499724
|
||
|
Size 4115
|
||
|
Index 1182
|
||
|
Term 2
|
||
|
Version 1
|
||
|
```
|
||
|
|
||
|
This will ensure you have a safe fallback option in case something goes wrong. Store
|
||
|
this snapshot somewhere safe. More documentation on snapshot usage is available here:
|
||
|
|
||
|
- https://www.consul.io/commands/snapshot
|
||
|
- https://learn.hashicorp.com/tutorials/consul/backup-and-restore
|
||
|
|
||
|
**2.** Temporarily modify your Consul configuration so that its [log_level](/docs/agent/options.html#_log_level)
|
||
|
is set to `debug`. After doing this, issue the following command on your servers to
|
||
|
reload the configuration:
|
||
|
|
||
|
```
|
||
|
consul reload
|
||
|
```
|
||
|
|
||
|
This change will give you more information to work with in the event something goes wrong.
|
||
|
|
||
|
## Perform the Upgrade
|
||
|
|
||
|
**1.** Issue the following command to discover which server is currently the leader:
|
||
|
|
||
|
```
|
||
|
consul operator raft list-peers
|
||
|
```
|
||
|
|
||
|
You should receive output similar to this (exact formatting and content may differ based on version):
|
||
|
|
||
|
```
|
||
|
Node ID Address State Voter RaftProtocol
|
||
|
dc1-node1 ae15858f-7f5f-4dcb-b7d5-710fdcdd2745 10.11.0.2:8300 leader true 3
|
||
|
dc1-node2 20e6be1b-f1cb-4aab-929f-f7d2d43d9a96 10.11.0.3:8300 follower true 3
|
||
|
dc1-node3 658c343b-8769-431f-a71a-236f9dbb17b3 10.11.0.4:8300 follower true 3
|
||
|
```
|
||
|
|
||
|
Take note of which agent is the leader.
|
||
|
|
||
|
**2.** Copy the new `consul` binary onto your servers and replace the existing
|
||
|
binary with the new one.
|
||
|
|
||
|
**3.** The following steps must be done in order on the server agents, leaving the leader
|
||
|
agent for last. First force the server agent to leave the cluster with the following command:
|
||
|
|
||
|
```
|
||
|
consul leave
|
||
|
```
|
||
|
|
||
|
Then, use a service management system (e.g., systemd, upstart, etc.) to restart the Consul service. If
|
||
|
you are not using a service management system, you must restart the agent manually.
|
||
|
|
||
|
To validate that the agent has rejoined the cluster and is in sync with the leader, issue the
|
||
|
following command:
|
||
|
|
||
|
```
|
||
|
consul info
|
||
|
```
|
||
|
|
||
|
Check whether the `commit_index` and `last_log_index` fields have the same value. If done properly,
|
||
|
this should avoid an unexpected leadership election due to loss of quorum.
|
||
|
|
||
|
**4.** Double-check that all servers are showing up in the cluster as expected and are on
|
||
|
the correct version by issuing:
|
||
|
|
||
|
```
|
||
|
consul members
|
||
|
```
|
||
|
|
||
|
You should receive output similar to this:
|
||
|
|
||
|
```
|
||
|
Node Address Status Type Build Protocol DC
|
||
|
dc1-node1 10.11.0.2:8301 alive server 1.8.3 2 dc1
|
||
|
dc1-node2 10.11.0.3:8301 alive server 1.8.3 2 dc1
|
||
|
dc1-node3 10.11.0.4:8301 alive server 1.8.3 2 dc1
|
||
|
```
|
||
|
|
||
|
Also double-check the raft state to make sure there is a leader and sufficient voters:
|
||
|
|
||
|
```
|
||
|
consul operator raft list-peers
|
||
|
```
|
||
|
|
||
|
You should receive output similar to this:
|
||
|
|
||
|
```
|
||
|
Node ID Address State Voter RaftProtocol
|
||
|
dc1-node1 ae15858f-7f5f-4dcb-b7d5-710fdcdd2745 10.11.0.2:8300 leader true 3
|
||
|
dc1-node2 20e6be1b-f1cb-4aab-929f-f7d2d43d9a96 10.11.0.3:8300 follower true 3
|
||
|
dc1-node3 658c343b-8769-431f-a71a-236f9dbb17b3 10.11.0.4:8300 follower true 3
|
||
|
```
|
||
|
|
||
|
**5.** Set your `log_level` back to its original value and issue the following command
|
||
|
on your servers to reload the configuration:
|
||
|
|
||
|
```
|
||
|
consul reload
|
||
|
```
|
||
|
|
||
|
## Troubleshooting
|
||
|
|
||
|
Most problems with upgrading occur due to either failing to upgrade the leader agent last,
|
||
|
or failing to wait for a follower agent to fully rejoin a cluster before moving
|
||
|
on to another server. This can cause a loss of quorum and occasionally can result in
|
||
|
all of your servers attempting to kick off leadership elections endlessly without ever
|
||
|
reaching a quorum and electing a leader.
|
||
|
|
||
|
Most of these problems can be solved by following the steps outlined in our
|
||
|
[Outage Recovery](https://learn.hashicorp.com/tutorials/consul/recovery-outage) document.
|
||
|
If you are still having trouble after trying the recovery steps outlined there,
|
||
|
then the following options for further assistance are available:
|
||
|
|
||
|
- OSS users without paid support plans can request help in our [Community Forum](https://discuss.hashicorp.com/c/consul/29)
|
||
|
- Enterprise and OSS users with paid support plans can contact [HashiCorp Support](https://support.hashicorp.com/)
|
||
|
|
||
|
When contacting Hashicorp Support, please include the following information in your ticket:
|
||
|
|
||
|
- Consul version you were upgrading FROM and TO.
|
||
|
- [Debug level logs](/docs/agent/options.html#_log_level) from all servers in the cluster
|
||
|
that you are having trouble with. These should include logs from prior to the upgrade attempt
|
||
|
up through the current time. If your logs were not set at debug level prior to the
|
||
|
upgrade, please include those logs as well. Also, update your config to use debug logs,
|
||
|
and include logs from after that was done.
|
||
|
- Your Consul config files (please redact any secrets).
|
||
|
- Output from `consul members -detailed` and `consul operator raft list-peers` from each
|
||
|
server in your cluster.
|