remove guides that were moved to learn

pull/7721/head
Jeff Escalante 2020-04-13 16:08:13 -04:00
parent 4a5d67a24e
commit af07d9f006
No known key found for this signature in database
GPG Key ID: 32D23C61AB5450DB
27 changed files with 0 additions and 5861 deletions

View File

@ -1,472 +0,0 @@
---
layout: docs
page_title: Bootstrapping ACLs
description: >-
Consul provides an optional Access Control List (ACL) system which can be used
to control access to data and APIs. The ACL system is a Capability-based
system that relies on tokens which can have fine grained rules applied to
them. It is very similar to AWS IAM in many ways.
---
# Bootstrapping the ACL System
Consul uses Access Control Lists (ACLs) to secure the UI, API, CLI, service communications, and agent communications. For securing gossip and RPC communication please review [this guide](/docs/guides/agent-encryption). When securing your cluster you should configure the ACLs first.
At the core, ACLs operate by grouping rules into policies, then associating one or more policies with a token.
To complete this guide, you should have an operational Consul 1.4+ cluster. We also recommend reading the [ACL System documentation](/docs/agent/acl-system). For securing Consul version 1.3 and older, please read the [legacy ACL documentation](/docs/guides/acl-legacy).
Bootstrapping the ACL system is a multi-step process, we will cover all the necessary steps in this guide.
- [Enable ACLs on all the servers](/docs/guides/acl#step-1-enable-acls-on-all-the-consul-servers).
- [Create the initial bootstrap token](/docs/guides/acl#step-2-create-the-bootstrap-token).
- [Create the agent policy](/docs/guides/acl#step-3-create-an-agent-token-policy).
- [Create the agent token](/docs/guides/acl#step-4-create-an-agent-token).
- [Apply the new token to the servers](/docs/guides/acl#step-5-add-the-agent-token-to-all-the-servers).
- [Enable ACLs on the clients and apply the agent token](/docs/guides/acl#step-6-enable-acls-on-the-consul-clients).
At the end of this guide, there are also several additional and optional steps.
## Step 1: Enable ACLs on all the Consul Servers
The first step for bootstrapping the ACL system is to enable ACLs on the Consul servers in the agent configuration file. In this example, we are configuring the default policy of "deny", which means we are in whitelist mode, and a down policy of "extend-cache", which means that we will ignore token TTLs during an outage.
```json
{
"acl": {
"enabled": true,
"default_policy": "deny",
"down_policy": "extend-cache"
}
}
```
The servers will need to be restarted to load the new configuration. Please take care
to restart the servers one at a time and ensure each server has joined and is operating
correctly before restarting another.
If ACLs are enabled correctly, we will now see the following warnings and info in the leader's logs.
```shell
2018/12/12 01:36:40 [INFO] acl: Created the anonymous token
2018/12/12 01:36:40 [INFO] consul: ACL bootstrap enabled
2018/12/12 01:36:41 [INFO] agent: Synced node info
2018/12/12 01:36:58 [WARN] agent: Coordinate update blocked by ACLs
2018/12/12 01:37:40 [INFO] acl: initializing acls
2018/12/12 01:37:40 [INFO] consul: Created ACL 'global-management' policy
```
If you do not see ACL bootstrap enabled, the anonymous token creation, and the `global-management` policy creation message in the logs, ACLs have not been properly enabled.
Note, now that we have enabled ACLs, we will need a token to complete any operation. We can't do anything else to the cluster until we bootstrap and generate the first master token. For simplicity we will use the master token created during the bootstrap for the remainder of the guide.
## Step 2: Create the Bootstrap Token
Once ACLs have been enabled we can bootstrap our first token, the bootstrap token.
The bootstrap token is a management token with unrestricted privileges. It will
be shared with all the servers in the quorum, since it will be added to the
state store.
```bash
$ consul acl bootstrap
AccessorID: edcaacda-b6d0-1954-5939-b5aceaca7c9a
SecretID: 4411f091-a4c9-48e6-0884-1fcb092da1c8
Description: Bootstrap Token (Global Management)
Local: false
Create Time: 2018-12-06 18:03:23.742699239 +0000 UTC
Policies:
00000000-0000-0000-0000-000000000001 - global-management
```
On the server where the `bootstrap` command was issued we should see the following log message.
```shell
2018/12/11 15:30:23 [INFO] consul.acl: ACL bootstrap completed
2018/12/11 15:30:23 [DEBUG] http: Request PUT /v1/acl/bootstrap (2.347965ms) from=127.0.0.1:40566
```
Since ACLs have been enabled, we will need to use it to complete any additional operations.
For example, even checking the member list will require a token.
```shell
$ consul members -token "4411f091-a4c9-48e6-0884-1fcb092da1c8"
Node Address Status Type Build Protocol DC Segment
fox 172.20.20.10:8301 alive server 1.4.0 2 kc <all>
bear 172.20.20.11:8301 alive server 1.4.0 2 kc <all>
wolf 172.20.20.12:8301 alive server 1.4.0 2 kc <all>
```
Note using the token on the command line with the `-token` flag is not
recommended, instead we will set it as an environment variable once.
```shell
$ export CONSUL_HTTP_TOKEN=4411f091-a4c9-48e6-0884-1fcb092da1c8
```
The bootstrap token can also be used in the server configuration file as
the [`master`](/docs/agent/options#acl_tokens_master) token.
Note, the bootstrap token can only be created once, bootstrapping will be disabled after the master token was created. Once the ACL system is bootstrapped, ACL tokens can be managed through the
[ACL API](/api/acl/acl).
## Step 3: Create an Agent Token Policy
Before we can create a token, we will need to create its associated policy. A policy is a set of rules that can be used to specify granular permissions. To learn more about rules, read the ACL rule specification [documentation](/docs/agent/acl-rules).
```bash
# agent-policy.hcl contains the following:
node_prefix "" {
policy = "write"
}
service_prefix "" {
policy = "read"
}
```
This policy will allow all nodes to be registered and accessed and any service to be read.
Note, this simple policy is not recommended in production.
It is best practice to create separate node policies and tokens for each node in the cluster
with an exact-match node rule.
We only need to create one policy and can do this on any of the servers. If you have not set the
`CONSUL_HTTP_TOKEN` environment variable to the bootstrap token, please refer to the previous step.
```
$ consul acl policy create -name "agent-token" -description "Agent Token Policy" -rules @agent-policy.hcl
ID: 5102b76c-6058-9fe7-82a4-315c353eb7f7
Name: agent-policy
Description: Agent Token Policy
Datacenters:
Rules:
node_prefix "" {
policy = "write"
}
service_prefix "" {
policy = "read"
}
```
The returned value is the newly-created policy that we can now use when creating our agent token.
## Step 4: Create an Agent Token
Using the newly created policy, we can create an agent token. Again we can complete this process on any of the servers. For this guide, all agents will share the same token. Note, the `SecretID` is the token used to authenticate API and CLI commands.
```shell
$ consul acl token create -description "Agent Token" -policy-name "agent-token"
AccessorID: 499ab022-27f2-acb8-4e05-5a01fff3b1d1
SecretID: da666809-98ca-0e94-a99c-893c4bf5f9eb
Description: Agent Token
Local: false
Create Time: 2018-10-19 14:23:40.816899 -0400 EDT
Policies:
fcd68580-c566-2bd2-891f-336eadc02357 - agent-token
```
## Step 5: Add the Agent Token to all the Servers
Our final step for configuring the servers is to assign the token to all of our
Consul servers via the configuration file and reload the Consul service
on all of the servers, one last time.
```json
{
"primary_datacenter": "dc1",
"acl": {
"enabled": true,
"default_policy": "deny",
"down_policy": "extend-cache",
"tokens": {
"agent": "da666809-98ca-0e94-a99c-893c4bf5f9eb"
}
}
}
```
~> Note: In Consul version 1.4.2 and older any ACL updates
in the agent configuration file will require a full restart of the
Consul service.
At this point we should no longer see the coordinate warning in the servers logs, however, we should continue to see that the node information is in sync.
```shell
2018/12/11 15:34:20 [DEBUG] agent: Node info in sync
```
It is important to ensure the servers are configured properly, before enable ACLs
on the clients. This will reduce any duplicate work and troubleshooting, if there
is a misconfiguration.
#### Ensure the ACL System is Configured Properly
Before configuring the clients, we should check that the servers are healthy. To do this, let's view the catalog.
```shell
curl http://127.0.0.1:8500/v1/catalog/nodes -H 'x-consul-token: 4411f091-a4c9-48e6-0884-1fcb092da1c8'
[
{
"Address": "172.20.20.10",
"CreateIndex": 7,
"Datacenter": "kc",
"ID": "881cfb69-2bcd-c2a9-d87c-cb79fc454df9",
"Meta": {
"consul-network-segment": ""
},
"ModifyIndex": 10,
"Node": "fox",
"TaggedAddresses": {
"lan": "172.20.20.10",
"wan": "172.20.20.10"
}
}
]
```
All the values should be as expected. Particularly, if `TaggedAddresses` is `null` it is likely we have not configured ACLs correctly. A good place to start debugging is reviewing the Consul logs on all the servers.
If you encounter issues that are unresolvable, or misplace the bootstrap token, you can reset the ACL system by updating the index. First re-run the bootstrap command to get the index number.
```
$ consul acl bootstrap
Failed ACL bootstrapping: Unexpected response code: 403 (Permission denied: ACL bootstrap no longer allowed (reset index: 13))
```
Then write the reset index into the bootstrap reset file: (here the reset index is 13)
```
$ echo 13 >> <data-directory>/acl-bootstrap-reset
```
After reseting the ACL system you can start again at Step 2.
## Step 6: Enable ACLs on the Consul Clients
Since ACL enforcement also occurs on the Consul clients, we need to also restart them
with a configuration file that enables ACLs. We can use the same ACL agent token that we created for the servers. The same token can be used because we did not specify any node or service prefixes.
```json
{
"acl": {
"enabled": true,
"down_policy": "extend-cache",
"tokens": {
"agent": "da666809-98ca-0e94-a99c-893c4bf5f9eb"
}
}
}
```
To ensure the agent's are configured correctly, we can again use the `/catalog` endpoint.
## Additional ACL Configuration
Now that the nodes have been configured to use ACLs, we can configure the CLI, UI, and nodes to use specific tokens. All of the following steps are optional examples. In your own environment you will likely need to create more fine grained policies.
#### Configure the Anonymous Token (Optional)
The anonymous token is created during the bootstrap process, `consul acl bootstrap`. It is implicitly used if no token is supplied. In this section we will update the existing token with a newly created policy.
At this point ACLs are bootstrapped with ACL agent tokens configured, but there are no
other policies set up. Even basic operations like `consul members` will be restricted
by the ACL default policy of "deny":
```
$ consul members
```
We will not receive an error, since the ACL has filtered what we see and we are not allowed to
see any nodes by default.
If we supply the token we created above we will be able to see a listing of nodes because
it has write privileges to an empty `node` prefix, meaning it has access to all nodes:
```bash
$ CONSUL_HTTP_TOKEN=4411f091-a4c9-48e6-0884-1fcb092da1c8 consul members
Node Address Status Type Build Protocol DC Segment
fox 172.20.20.10:8301 alive server 1.4.0 2 kc <all>
bear 172.20.20.11:8301 alive server 1.4.0 2 kc <all>
wolf 172.20.20.12:8301 alive server 1.4.0 2 kc <all>
```
It is common in many environments to allow listing of all nodes, even without a
token. The policies associated with the special anonymous token can be updated to
configure Consul's behavior when no token is supplied. The anonymous token is managed
like any other ACL token, except that `anonymous` is used for the ID. In this example
we will give the anonymous token read privileges for all nodes:
```bash
$ consul acl policy create -name 'list-all-nodes' -rules 'node_prefix "" { policy = "read" }'
ID: e96d0a33-28b4-d0dd-9b3f-08301700ac72
Name: list-all-nodes
Description:
Datacenters:
Rules:
node_prefix "" { policy = "read" }
$ consul acl token update -id 00000000-0000-0000-0000-000000000002 -policy-name list-all-nodes -description "Anonymous Token - Can List Nodes"
Token updated successfully.
AccessorID: 00000000-0000-0000-0000-000000000002
SecretID: anonymous
Description: Anonymous Token - Can List Nodes
Local: false
Create Time: 0001-01-01 00:00:00 +0000 UTC
Hash: ee4638968d9061647ac8c3c99e9d37bfdd2af4d1eaa07a7b5f80af0389460948
Create Index: 5
Modify Index: 38
Policies:
e96d0a33-28b4-d0dd-9b3f-08301700ac72 - list-all-nodes
```
The anonymous token is implicitly used if no token is supplied, so now we can run
`consul members` without supplying a token and we will be able to see the nodes:
```bash
$ consul members
Node Address Status Type Build Protocol DC Segment
fox 172.20.20.10:8301 alive server 1.4.0 2 kc <all>
bear 172.20.20.11:8301 alive server 1.4.0 2 kc <all>
wolf 172.20.20.12:8301 alive server 1.4.0 2 kc <all>
```
The anonymous token is also used for DNS lookups since there is no way to pass a
token as part of a DNS request. Here's an example lookup for the "consul" service:
```
$ dig @127.0.0.1 -p 8600 consul.service.consul
; <<>> DiG 9.8.3-P1 <<>> @127.0.0.1 -p 8600 consul.service.consul
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 9648
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
;; WARNING: recursion requested but not available
;; QUESTION SECTION:
;consul.service.consul. IN A
;; AUTHORITY SECTION:
consul. 0 IN SOA ns.consul. postmaster.consul. 1499584110 3600 600 86400 0
```
Now we get an `NXDOMAIN` error because the anonymous token doesn't have access to the
"consul" service. Let's update the anonymous token's policy to allow for service reads of the "consul" service.
```bash
$ consul acl policy create -name 'service-consul-read' -rules 'service "consul" { policy = "read" }'
ID: 3c93f536-5748-2163-bb66-088d517273ba
Name: service-consul-read
Description:
Datacenters:
Rules:
service "consul" { policy = "read" }
$ consul acl token update -id 00000000-0000-0000-0000-000000000002 --merge-policies -description "Anonymous Token - Can List Nodes" -policy-name service-consul-read
Token updated successfully.
AccessorID: 00000000-0000-0000-0000-000000000002
SecretID: anonymous
Description: Anonymous Token - Can List Nodes
Local: false
Create Time: 0001-01-01 00:00:00 +0000 UTC
Hash: 2c641c4f73158ef6d62f6467c68d751fccd4db9df99b235373e25934f9bbd939
Create Index: 5
Modify Index: 43
Policies:
e96d0a33-28b4-d0dd-9b3f-08301700ac72 - list-all-nodes
3c93f536-5748-2163-bb66-088d517273ba - service-consul-read
```
With that new policy in place, the DNS lookup will succeed:
```
$ dig @127.0.0.1 -p 8600 consul.service.consul
; <<>> DiG 9.8.3-P1 <<>> @127.0.0.1 -p 8600 consul.service.consul
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 46006
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available
;; QUESTION SECTION:
;consul.service.consul. IN A
;; ANSWER SECTION:
consul.service.consul. 0 IN A 127.0.0.1
```
The next section shows an alternative to the anonymous token.
#### Set Agent-Specific Default Tokens (Optional)
An alternative to the anonymous token is the [`acl.tokens.default`](/docs/agent/options#acl_tokens_default)
configuration item. When a request is made to a particular Consul agent and no token is
supplied, the [`acl.tokens.default`](/docs/agent/options#acl_tokens_default) will be used for the token, instead of being left empty which would normally invoke the anonymous token.
This behaves very similarly to the anonymous token, but can be configured differently on each
agent, if desired. For example, this allows more fine grained control of what DNS requests a
given agent can service or can give the agent read access to some key-value store prefixes by
default.
If using [`acl.tokens.default`](/docs/agent/options#acl_tokens_default), then it's likely the anonymous token will have a more restrictive policy than shown in these examples.
#### Create Tokens for UI Use (Optional)
If you utilize the Consul UI with a restrictive ACL policy, as above, the UI will not function fully using the anonymous ACL token. It is recommended that a UI-specific ACL token is used, which can be set in the UI during the web browser session to authenticate the interface.
First create the new policy.
```bash
$ consul acl policy create -name "ui-policy" \
-description "Necessary permissions for UI functionality" \
-rules 'key_prefix "" { policy = "write" } node_prefix "" { policy = "read" } service_prefix "" { policy = "read" }'
ID: 9cb99b2b-3c20-81d4-a7c0-9ffdc2fbf08a
Name: ui-policy
Description: Necessary permissions for UI functionality
Datacenters:
Rules:
key_prefix "" { policy = "write" } node_prefix "" { policy = "read" } service_prefix "" { policy = "read" }
```
With the new policy, create a token.
```shell
$ consul acl token create -description "UI Token" -policy-name "ui-policy"
AccessorID: 56e605cf-a6f9-5f9d-5c08-a0e1323cf016
SecretID: 117842b6-6208-446a-0d1e-daf93854857d
Description: UI Token
Local: false
Create Time: 2018-10-19 14:55:44.254063 -0400 EDT
Policies:
9cb99b2b-3c20-81d4-a7c0-9ffdc2fbf08a - ui-policy
```
The token can then be set on the "settings" page of the UI.
Note, in this example, we have also given full write access to the KV through the UI.
## Summary
The [ACL API](/api/acl/acl) can be used to create tokens for applications specific to their intended use and to create more specific ACL agent tokens for each agent's expected role.
Now that you have bootstrapped ACLs, learn more about [ACL rules](/docs/agent/acl-rules)
### Notes on Security
In this guide we configured a basic ACL environment with the ability to see all nodes
by default, but with limited access to discover only the "consul" service. If your environment has stricter security requirements we would like to note the following and make some additional recommendations.
1. In this guide we added the agent token to the configuration file. This means the tokens are now saved on disk. If this is a security concern, tokens can be added to agents using the [Consul CLI](/docs/commands/acl/acl-set-agent-token). However, this process is more complicated and takes additional care.
2. It is recommended that each client get an ACL agent token with `node` write privileges for just its own node name, and `service` read privileges for just the service prefixes expected to be registered on that client.
3. [Anti-entropy](/docs/internals/anti-entropy) syncing requires the ACL agent token
to have `service:write` privileges for all services that may be registered with the agent.
We recommend providing `service:write` for each separate service via a separate token that
is used when registering via the API, or provided along with the [registration in the
configuration file](/docs/agent/services). Note that `service:write`
is the privilege required to assume the identity of a service and so Consul Connect's
intentions are only enforceable to the extent that each service instance is unable to gain
`service:write` on any other service name. For more details see the Connect security
[documentation](/docs/connect/security).

View File

@ -1,179 +0,0 @@
---
layout: docs
page_title: Multiple Datacenters - Advanced Federation with Network Areas
description: >-
One of the key features of Consul is its support for multiple datacenters. The
architecture of Consul is designed to promote low coupling of datacenters so
that connectivity issues or failure of any datacenter does not impact the
availability of Consul in other datacenters. This means each datacenter runs
independently, each having a dedicated group of servers and a private LAN
gossip pool.
---
# [Enterprise] Multiple Datacenters: Advanced Federation with Network Areas
~> The network area functionality described here is available only in [Consul Enterprise](https://www.hashicorp.com/products/consul/) version 0.8.0 and later.
One of the key features of Consul is its support for multiple datacenters.
The [architecture](/docs/internals/architecture) of Consul is designed to
promote a low coupling of datacenters so that connectivity issues or
failure of any datacenter does not impact the availability of Consul in other
datacenters. This means each datacenter runs independently, each having a dedicated
group of servers and a private LAN [gossip pool](/docs/internals/gossip).
This guide covers the advanced form of federating Consul clusters using the new
network areas capability added in [Consul Enterprise](https://www.hashicorp.com/products/consul/)
version 0.8.0. For the basic form of federation available in the open source version
of Consul, please see the [Basic Federation Guide](/docs/guides/datacenters)
for more details.
## Network Area Overview
Consul's [Basic Federation](/docs/guides/datacenters) support relies on all
Consul servers in all datacenters having full mesh connectivity via server RPC
(8300/tcp) and Serf WAN (8302/tcp and 8302/udp). Securing this setup requires TLS
in combination with managing a gossip keyring. With massive Consul deployments, it
becomes tricky to support a full mesh with all Consul servers, and to manage the
keyring.
Consul Enterprise version 0.8.0 added support for a new federation model based on
operator-created network areas. Network areas specify a relationship between a
pair of Consul datacenters. Operators create reciprocal areas on each side of the
relationship and then join them together, so a given Consul datacenter can participate
in many areas, even when some of the peer areas cannot contact each other. This
allows for more flexible relationships between Consul datacenters, such as hub/spoke
or more general tree structures. Traffic between areas is all performed via server
RPC (8300/tcp) so it can be secured with just TLS.
Currently, Consul will only route RPC requests to datacenters it is immediately adjacent
to via an area (or via the WAN), but future versions of Consul may add routing support.
The following can be used to manage network areas:
- [Network Areas HTTP Endpoint](/api/operator/area)
- [Network Areas CLI](/docs/commands/operator/area)
### Network Areas and the WAN Gossip Pool
Networks areas can be used alongside the Consul's [Basic Federation](/docs/guides/datacenters)
model and the WAN gossip pool. This helps ease migration, and clusters like the
[primary datacenter](/docs/agent/options#primary_datacenter) are more easily managed via
the WAN because they need to be available to all Consul datacenters.
A peer datacenter can connected via the WAN gossip pool and a network area at the
same time, and RPCs will be forwarded as long as servers are available in either.
## Configure Advanced Federation
To get started, follow the [Deployment guide](https://learn.hashicorp.com/consul/advanced/day-1-operations/deployment-guide/) to
start each datacenter. After bootstrapping, we should have two datacenters now which
we can refer to as `dc1` and `dc2`. Note that datacenter names are opaque to Consul;
they are simply labels that help human operators reason about the Consul clusters.
### Create Areas in both Datacenters
A compatible pair of areas must be created in each datacenter:
```shell
(dc1) $ consul operator area create -peer-datacenter=dc2
Created area "cbd364ae-3710-1770-911b-7214e98016c0" with peer datacenter "dc2"!
```
```shell
(dc2) $ consul operator area create -peer-datacenter=dc1
Created area "2aea3145-f1e3-cb1d-a775-67d15ddd89bf" with peer datacenter "dc1"!
```
Now you can query for the members of the area:
```shell
(dc1) $ consul operator area members
Area Node Address Status Build Protocol DC RTT
cbd364ae-3710-1770-911b-7214e98016c0 node-1.dc1 127.0.0.1:8300 alive 0.8.0_entrc1 2 dc1 0s
```
### Join Servers
Consul will automatically make sure that all servers within the datacenter where
the area was created are joined to the area using the LAN information. We need to
join with at least one Consul server in the other datacenter to complete the area:
```shell
(dc1) $ consul operator area join -peer-datacenter=dc2 127.0.0.2
Address Joined Error
127.0.0.2 true (none)
```
With a successful join, we should now see the remote Consul servers as part of the
area's members:
```shell
(dc1) $ consul operator area members
Area Node Address Status Build Protocol DC RTT
cbd364ae-3710-1770-911b-7214e98016c0 node-1.dc1 127.0.0.1:8300 alive 0.8.0_entrc1 2 dc1 0s
cbd364ae-3710-1770-911b-7214e98016c0 node-2.dc2 127.0.0.2:8300 alive 0.8.0_entrc1 2 dc2 581.649µs
```
### Route RPCs
Now we can route RPC commands in both directions. Here's a sample command to set a KV
entry in dc2 from dc1:
```shell
(dc1) $ consul kv put -datacenter=dc2 hello world
Success! Data written to: hello
```
### DNS Lookups
The DNS interface supports federation as well:
```shell
(dc1) $ dig @127.0.0.1 -p 8600 consul.service.dc2.consul
; <<>> DiG 9.8.3-P1 <<>> @127.0.0.1 -p 8600 consul.service.dc2.consul
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49069
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available
;; QUESTION SECTION:
;consul.service.dc2.consul. IN A
;; ANSWER SECTION:
consul.service.dc2.consul. 0 IN A 127.0.0.2
;; Query time: 3 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: Wed Mar 29 11:27:35 2017
;; MSG SIZE rcvd: 59
```
There are a few networking requirements that must be satisfied for this to
work. Of course, all server nodes must be able to talk to each other via their server
RPC ports (8300/tcp). If service discovery is to be used across datacenters, the
network must be able to route traffic between IP addresses across regions as well.
Usually, this means that all datacenters must be connected using a VPN or other
tunneling mechanism. Consul does not handle VPN or NAT traversal for you.
The [`translate_wan_addrs`](/docs/agent/options#translate_wan_addrs) configuration
provides a basic address rewriting capability.
## Data Replication
In general, data is not replicated between different Consul datacenters. When a
request is made for a resource in another datacenter, the local Consul servers forward
an RPC request to the remote Consul servers for that resource and return the results.
If the remote datacenter is not available, then those resources will also not be
available, but that won't otherwise affect the local datacenter. There are some special
situations where a limited subset of data can be replicated, such as with Consul's built-in
[ACL replication](/docs/guides/acl#outages-and-acl-replication/) capability, or
external tools like [consul-replicate](https://github.com/hashicorp/consul-replicate/).
## Summary
In this guide, you setup advanced federation using
network areas. You then learned how to route RPC commands and use
the DNS interface with multiple datacenters.

View File

@ -1,204 +0,0 @@
---
layout: docs
page_title: Agent Communication Encryption
description: This guide covers how to encrypt both gossip and RPC communication.
---
# Agent Communication Encryption
There are two different systems that need to be configured separately to encrypt communication within the cluster: gossip encryption and TLS. TLS is used to secure the RPC calls between agents. Gossip encryption is secured with a symmetric key, since gossip between nodes is done over UDP. In this guide we will configure both.
To complete the RPC encryption section, you must have [configured agent certificates](/docs/guides/creating-certificates).
## Gossip Encryption
To enable gossip encryption, you need to use an encryption key when starting the Consul agent. The key can be simple set with the `encrypt` parameter in the agent configuration file. Alternatively, the encryption key can be placed in a separate configuration file with only the `encrypt` field, since the agent can merge multiple configuration files. The key must be 32-bytes, Base64 encoded.
You can use the Consul CLI command, [`consul keygen`](/docs/commands/keygen), to generate a cryptographically suitable key.
```shell
$ consul keygen
pUqJrVyVRj5jsiYEkM/tFQYfWyJIv4s3XkvDwy7Cu5s=
```
### Enable Gossip Encryption: New Cluster
To enable gossip on a new cluster, we will add the encryption key parameter to the
agent configuration file and then pass the file at startup with the [`-config-dir`](/docs/agent/options#_config_dir) flag.
```javascript
{
"data_dir": "/opt/consul",
"log_level": "INFO",
"node_name": "bulldog",
"server": true,
"encrypt": "pUqJrVyVRj5jsiYEkM/tFQYfWyJIv4s3XkvDwy7Cu5s="
}
```
```shell
$ consul agent -config-dir=/etc/consul.d/
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
Node name: 'Armons-MacBook-Air.local'
Datacenter: 'dc1'
Server: false (bootstrap: false)
Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: 8600, RPC: 8400)
Cluster Addr: 10.1.10.12 (LAN: 8301, WAN: 8302)
Gossip encrypt: true, RPC-TLS: false, TLS-Incoming: false
...
```
"Encrypt: true" will be included in the output, if encryption is properly configured.
Note: all nodes within a cluster must share the same encryption key in order to send and receive cluster information, including clients and servers. Additionally, if you're using multiple WAN joined datacenters, be sure to use _the same encryption key_ in all datacenters.
### Enable Gossip Encryption: Existing Cluster
Gossip encryption can also be enabled on an existing cluster, but requires several extra steps. The additional configuration of the agent configuration parameters, [`encrypt_verify_incoming`](/docs/agent/options#encrypt_verify_incoming) and [`encrypt_verify_outgoing`](/docs/agent/options#encrypt_verify_outgoing) is necessary.
**Step 1**: Generate an encryption key using `consul keygen`.
```shell
$ consul keygen
pUqJrVyVRj5jsiYEkM/tFQYfWyJIv4s3XkvDwy7Cu5s=
```
**Step 2**: Set the [`encrypt`](/docs/agent/options#_encrypt) key, and set `encrypt_verify_incoming` and `encrypt_verify_outgoing` to `false` in the agent configuration file. Then initiate a rolling update of the cluster with these new values. After this step, the agents will be able to decrypt gossip but will not yet be sending encrypted traffic.
```javascript
{
"data_dir": "/opt/consul",
"log_level": "INFO",
"node_name": "bulldog",
"server": true,
"encrypt": "pUqJrVyVRj5jsiYEkM/tFQYfWyJIv4s3XkvDwy7Cu5s=",
"encrypt_verify_incoming": false,
"encrypt_verify_outgoing": false
}
```
A rolling update can be made by restarting the Consul agents (clients and servers) in turn. `consul reload` or `kill -HUP <process_id>` is _not_ sufficient to change the gossip configuration.
**Step 3**: Update the `encrypt_verify_outgoing` setting to `true` and perform another rolling update of the cluster by restarting Consul on each agent. The agents will now be sending encrypted gossip but will still allow incoming unencrypted traffic.
```javascript
{
"data_dir": "/opt/consul",
"log_level": "INFO",
"node_name": "bulldog",
"server": true,
"encrypt": "pUqJrVyVRj5jsiYEkM/tFQYfWyJIv4s3XkvDwy7Cu5s=",
"encrypt_verify_incoming": false,
"encrypt_verify_outgoing": true
}
```
**Step 4**: The previous step, enabling verify outgoing, must be completed on all agents before continuing. Update the `encrypt_verify_incoming` setting to `true` and perform a final rolling update of the cluster.
```javascript
{
"data_dir": "/opt/consul",
"log_level": "INFO",
"node_name": "bulldog",
"server": true,
"encrypt": "pUqJrVyVRj5jsiYEkM/tFQYfWyJIv4s3XkvDwy7Cu5s=",
"encrypt_verify_incoming": true,
"encrypt_verify_outgoing": true
}
```
All the agents will now be strictly enforcing encrypted gossip. Note, the default
behavior of both `encrypt_verify_incoming` and `encrypt_verify_outgoing` is `true`.
We have set them in the configuration file as an explicit example.
## TLS Encryption for RPC
Consul supports using TLS to verify the authenticity of servers and clients. To enable TLS,
Consul requires that all servers have certificates that are signed by a single
Certificate Authority. Clients may optionally authenticate with a client certificate generated by the same CA. Please see
[this tutorial on creating a CA and signing certificates](/docs/guides/creating-certificates).
TLS can be used to verify the authenticity of the servers with [`verify_outgoing`](/docs/agent/options#verify_outgoing) and [`verify_server_hostname`](/docs/agent/options#verify_server_hostname). It can also optionally verify client certificates when using [`verify_incoming`](/docs/agent/options#verify_incoming)
Review the [docs for specifics](/docs/agent/encryption).
In Consul version 0.8.4 and newer, migrating to TLS encrypted traffic on a running cluster
is supported.
### Enable TLS: New Cluster
After TLS has been configured on all the agents, you can start the agents and RPC communication will be encrypted.
```javascript
{
"data_dir": "/opt/consul",
"log_level": "INFO",
"node_name": "bulldog",
"server": true,
"encrypt": "pUqJrVyVRj5jsiYEkM/tFQYfWyJIv4s3XkvDwy7Cu5s=",
"verify_incoming": true,
"verify_outgoing": true,
"verify_server_hostname": true,
"ca_file": "consul-agent-ca.pem",
"cert_file": "dc1-server-consul-0.pem",
"key_file": "dc1-server-consul-0-key.pem"
}
```
Note, for clients, the default `cert_file` and `key_file` will be named according to their cluster for. For example, `dc1-client-consul-0.pem`.
The `verify_outgoing` parameter enables agents to verify the authenticity of Consul servers for outgoing connections. The `verify_server_hostname` parameter requires outgoing connections to perform hostname verification and is critically important to prevent compromised client agents from becoming servers and revealing all state to the attacker. Finally, the `verify_incoming` parameter enables the servers to verify the authenticity of all incoming connections.
### Enable TLS: Existing Cluster
Enabling TLS on an existing cluster is supported. This process assumes a starting point of a running cluster with no TLS settings configured, and involves an intermediate step in order to get to full TLS encryption.
**Step 1**: [Generate the necessary keys and certificates](/docs/guides/creating-certificates), then set the `ca_file` or `ca_path`, `cert_file`, and `key_file` settings in the configuration for each agent. Make sure the `verify_outgoing` and `verify_incoming` options are set to `false`. HTTPS for the API can be enabled at this point by setting the [`https`](/docs/agent/options#http_port) port.
```javascript
{
"data_dir": "/opt/consul",
"log_level": "INFO",
"node_name": "bulldog",
"server": true,
"encrypt": "pUqJrVyVRj5jsiYEkM/tFQYfWyJIv4s3XkvDwy7Cu5s=",
"verify_incoming": false,
"verify_outgoing": false,
"ca_file": "consul-agent-ca.pem",
"cert_file": "dc1-server-consul-0.pem",
"key_file": "dc1-server-consul-0-key.pem"
}
```
Next, perform a rolling restart of each agent in the cluster. After this step, TLS should be enabled everywhere but the agents will not yet be enforcing TLS. Again, `consul reload` or `kill -HUP <process_id>` is _not_ sufficient to update the configuration.
**Step 2**: (Optional, Enterprise-only) If applicable, set the `Use TLS` setting in any network areas to `true`. This can be done either through the [`consul operator area update`](/docs/commands/operator/area) command or the [Operator API](/api/operator/area).
**Step 3**: Change the `verify_incoming`, `verify_outgoing`, and `verify_server_hostname` to `true` then perform another rolling restart of each agent in the cluster.
```javascript
{
"data_dir": "/opt/consul",
"log_level": "INFO",
"node_name": "bulldog",
"server": true,
"encrypt": "pUqJrVyVRj5jsiYEkM/tFQYfWyJIv4s3XkvDwy7Cu5s=",
"verify_incoming": true,
"verify_outgoing": true,
"verify_server_hostname": true,
"ca_file": "consul-agent-ca.pem",
"cert_file": "dc1-server-consul-0.pem",
"key_file": "dc1-server-consul-0-key.pem"
}
```
At this point, full TLS encryption for RPC communication is enabled. To disable `HTTP`
connections, which may still be in use by clients for API and CLI communications, update
the [agent configuration](/docs/agent/options#ports).
## Summary
In this guide we configured both gossip encryption and TLS for RPC. Securing agent communication is a recommended set in setting up a production ready cluster.

View File

@ -1,297 +0,0 @@
---
layout: docs
page_title: Autopilot
description: This guide covers how to configure and use Autopilot features.
---
# Autopilot
Autopilot features allow for automatic,
operator-friendly management of Consul servers. It includes cleanup of dead
servers, monitoring the state of the Raft cluster, and stable server introduction.
To enable Autopilot features (with the exception of dead server cleanup),
the [`raft_protocol`](/docs/agent/options#_raft_protocol) setting in
the Agent configuration must be set to 3 or higher on all servers. In Consul
0.8 this setting defaults to 2; in Consul 1.0 it will default to 3. For more
information, see the [Version Upgrade section](/docs/upgrade-specific#raft_protocol)
on Raft Protocol versions.
In this guide we will learn more about Autopilot's features.
- Dead server cleanup
- Server Stabilization
- Redundancy zone tags
- Upgrade migration
Finally, we will review how to ensure Autopilot is healthy.
Note, in this guide we are using examples from a Consul 1.4 cluster, we
are starting with Autopilot enabled by default.
## Default Configuration
The configuration of Autopilot is loaded by the leader from the agent's
[Autopilot settings](/docs/agent/options#autopilot) when initially
bootstrapping the cluster. Since Autopilot and it's features are already
enabled, we only need to update the configuration to disable them. The
following are the defaults.
```
{
"cleanup_dead_servers": true,
"last_contact_threshold": "200ms",
"max_trailing_logs": 250,
"server_stabilization_time": "10s",
"redundancy_zone_tag": "",
"disable_upgrade_migration": false,
"upgrade_version_tag": ""
}
```
All Consul servers should have Autopilot and its features either enabled
or disabled to ensure consistency across servers in case of a failure. Additionally,
Autopilot must be enabled to use any of the features, but the features themselves
can be configured independently. Meaning you can enable or disable any of the features
separately, at any time.
After bootstrapping, the configuration can be viewed or modified either via the
[`operator autopilot`](/docs/commands/operator/autopilot) subcommand or the
[`/v1/operator/autopilot/configuration`](/api/operator#autopilot-configuration)
HTTP endpoint.
```
$ consul operator autopilot get-config
CleanupDeadServers = true
LastContactThreshold = 200ms
MaxTrailingLogs = 250
ServerStabilizationTime = 10s
RedundancyZoneTag = ""
DisableUpgradeMigration = false
UpgradeVersionTag = ""
```
In the example above, we used the `operator autopilot get-config` subcommand to check
the autopilot configuration. You can see we still have all the defaults.
## Dead Server Cleanup
If Autopilot is disabled, it will take 72 hours for dead servers to be automatically reaped
or an operator had to script a `consul force-leave`. If another server failure occurred
it could jeopardize the quorum, even if the failed Consul server had been automatically
replaced. Autopilot helps prevent these kinds of outages by quickly removing failed
servers as soon as a replacement Consul server comes online. When servers are removed
by the cleanup process they will enter the "left" state.
With Autopilot's dead server cleanup enabled, dead servers will periodically be
cleaned up and removed from the Raft peer set to prevent them from interfering with
the quorum size and leader elections. The cleanup process will also be automatically
triggered whenever a new server is successfully added to the cluster.
To update the dead server cleanup feature use `consul operator autopilot set-config`
with the `-cleanup-dead-servers` flag.
```shell
$ consul operator autopilot set-config -cleanup-dead-servers=false
Configuration updated!
$ consul operator autopilot get-config
CleanupDeadServers = false
LastContactThreshold = 200ms
MaxTrailingLogs = 250
ServerStabilizationTime = 10s
RedundancyZoneTag = ""
DisableUpgradeMigration = false
UpgradeVersionTag = ""
```
We have disabled dead server cleanup, but sill have all the other Autopilot defaults.
## Server Stabilization
When a new server is added to the cluster, there is a waiting period where it
must be healthy and stable for a certain amount of time before being promoted
to a full, voting member. This can be configured via the `ServerStabilizationTime`
setting.
```shell
consul operator autopilot set-config -server-stabilization-time=5s
Configuration updated!
$ consul operator autopilot get-config
CleanupDeadServers = false
LastContactThreshold = 200ms
MaxTrailingLogs = 250
ServerStabilizationTime = 5s
RedundancyZoneTag = ""
DisableUpgradeMigration = false
UpgradeVersionTag = ""
```
Now we have disabled dead server cleanup and set the server stabilization time to 5 seconds.
When a new server is added to our cluster, it will only need to be healthy and stable for
5 seconds.
## Redundancy Zones
Prior to Autopilot, it was difficult to deploy servers in a way that took advantage of
isolated failure domains such as AWS Availability Zones; users would be forced to either
have an overly-large quorum (2-3 nodes per AZ) or give up redundancy within an AZ by
deploying just one server in each.
If the `RedundancyZoneTag` setting is set, Consul will use its value to look for a
zone in each server's specified [`-node-meta`](/docs/agent/options#_node_meta)
tag. For example, if `RedundancyZoneTag` is set to `zone`, and `-node-meta zone:east1a`
is used when starting a server, that server's redundancy zone will be `east1a`.
```
$ consul operator autopilot set-config -redundancy-zone-tag=zone
Configuration updated!
$ consul operator autopilot get-config
CleanupDeadServers = false
LastContactThreshold = 200ms
MaxTrailingLogs = 250
ServerStabilizationTime = 5s
RedundancyZoneTag = "zone"
DisableUpgradeMigration = false
UpgradeVersionTag = ""
```
For our Autopilot features, we now have disabled dead server cleanup, server stabilization time to 5 seconds, and
the redundancy zone tag is zone.
Consul will then use these values to partition the servers by redundancy zone, and will
aim to keep one voting server per zone. Extra servers in each zone will stay as non-voters
on standby to be promoted if the active voter leaves or dies.
## Upgrade Migrations
Autopilot in Consul _Enterprise_ supports upgrade migrations by default. To disable this
functionality, set `DisableUpgradeMigration` to true.
```shell
$ consul operator autopilot set-config -disable-upgrade-migration=true
Configuration updated!
$ consul operator autopilot get-config
CleanupDeadServers = false
LastContactThreshold = 200ms
MaxTrailingLogs = 250
ServerStabilizationTime = 5s
RedundancyZoneTag = "uswest1"
DisableUpgradeMigration = true
UpgradeVersionTag = ""
```
With upgrade migration enabled, when a new server is added and Autopilot detects that
its Consul version is newer than that of the existing servers, Autopilot will avoid
promoting the new server until enough newer-versioned servers have been added to the
cluster. When the count of new servers equals or exceeds that of the old servers,
Autopilot will begin promoting the new servers to voters and demoting the old servers.
After this is finished, the old servers can be safely removed from the cluster.
To check the consul version of the servers, you can either use the [autopilot health](/api/operator#autopilot-health) endpoint or the `consul members`
command.
```
$ consul members
Node Address Status Type Build Protocol DC Segment
node1 127.0.0.1:8301 alive server 1.4.0 2 dc1 <all>
node2 127.0.0.1:8703 alive server 1.4.0 2 dc1 <all>
node3 127.0.0.1:8803 alive server 1.4.0 2 dc1 <all>
node4 127.0.0.1:8203 alive server 1.3.0 2 dc1 <all>
```
### Migrations Without a Consul Version Change
The `UpgradeVersionTag` can be used to override the version information used during
a migration, so that the migration logic can be used for updating the cluster when
changing configuration.
If the `UpgradeVersionTag` setting is set, Consul will use its value to look for a
version in each server's specified [`-node-meta`](/docs/agent/options#_node_meta)
tag. For example, if `UpgradeVersionTag` is set to `build`, and `-node-meta build:0.0.2`
is used when starting a server, that server's version will be `0.0.2` when considered in
a migration. The upgrade logic will follow semantic versioning and the version string
must be in the form of either `X`, `X.Y`, or `X.Y.Z`.
```shell
$ consul operator autopilot set-config -upgrade-version-tag=1.4.0
Configuration updated!
$ consul operator autopilot get-config
CleanupDeadServers = false
LastContactThreshold = 200ms
MaxTrailingLogs = 250
ServerStabilizationTime = 5s
RedundancyZoneTag = "uswest1"
DisableUpgradeMigration = true
UpgradeVersionTag = "1.4.0"
```
## Server Health Checking
An internal health check runs on the leader to track the stability of servers.
<br />A server is considered healthy if all of the following conditions are
true.
- It has a SerfHealth status of 'Alive'.
- The time since its last contact with the current leader is below
`LastContactThreshold`.
- Its latest Raft term matches the leader's term.
- The number of Raft log entries it trails the leader by does not exceed
`MaxTrailingLogs`.
The status of these health checks can be viewed through the [`/v1/operator/autopilot/health`](/api/operator#autopilot-health) HTTP endpoint, with a top level
`Healthy` field indicating the overall status of the cluster:
```
$ curl localhost:8500/v1/operator/autopilot/health
{
"Healthy": true,
"FailureTolerance": 0,
"Servers": [
{
"ID": "e349749b-3303-3ddf-959c-b5885a0e1f6e",
"Name": "node1",
"Address": "127.0.0.1:8300",
"SerfStatus": "alive",
"Version": "0.8.0",
"Leader": true,
"LastContact": "0s",
"LastTerm": 2,
"LastIndex": 10,
"Healthy": true,
"Voter": true,
"StableSince": "2017-03-28T18:28:52Z"
},
{
"ID": "e35bde83-4e9c-434f-a6ef-453f44ee21ea",
"Name": "node2",
"Address": "127.0.0.1:8705",
"SerfStatus": "alive",
"Version": "0.8.0",
"Leader": false,
"LastContact": "35.371007ms",
"LastTerm": 2,
"LastIndex": 10,
"Healthy": true,
"Voter": false,
"StableSince": "2017-03-28T18:29:10Z"
}
]
}
```
## Summary
In this guide we configured most of the Autopilot features; dead server cleanup, server
stabilization, redundancy zone tags, upgrade migration, and upgrade version tag.
To learn more about the Autopilot settings we did not configure,
[last_contact_threshold](https://www.consul.io/docs/agent/options.html#last_contact_threshold)
and [max_trailing_logs](https://www.consul.io/docs/agent/options.html#max_trailing_logs),
either read the agent configuration documentation or use the help flag with the
operator autopilot `consul operator autopilot set-config -h`.

View File

@ -1,89 +0,0 @@
---
layout: docs
page_title: Datacenter Backups
description: >-
Consul provide the snapshot tool for backing up and restoring data. In this
guide you will learn how to use both.
---
# Datacenter Backups
Creating datacenter backups is an important step in production deployments. Backups provide a mechanism for the Consul server to recover from an outage (network loss, operator error, or a corrupted data directory). All servers write to the `-data-dir` before commit on write requests. The same directory is used on client agents to persist local state too, but this is not critical and can be rebuilt when recreating an agent. Local client state is not backed up in this guide and doesn't need to be in general, only the server's Raft store state.
Consul provides the [snapshot](https://consul.io/docs/commands/snapshot.html) command which can be run using the CLI or the API. The `snapshot` command saves a point-in-time snapshot of the state of the Consul servers which includes, but is not limited to:
- KV entries
- the service catalog
- prepared queries
- sessions
- ACLs
With [Consul Enterprise](/docs/commands/snapshot/agent), the `snapshot agent` command runs periodically and writes to local or remote storage (such as Amazon S3).
By default, all snapshots are taken using `consistent` mode where requests are forwarded to the leader which verifies that it is still in power before taking the snapshot. Snapshots will not be saved if the datacenter is degraded or if no leader is available. To reduce the burden on the leader, it is possible to [run the snapshot](/docs/commands/snapshot/save) on any non-leader server using `stale` consistency mode.
This spreads the load across nodes at the possible expense of losing full consistency guarantees. Typically this means that a very small number of recent writes may not be included. The omitted writes are typically limited to data written in the last `100ms` or less from the recovery point. This is usually suitable for disaster recovery. However, the system cant guarantee how stale this may be if executed against a partitioned server.
## Create Your First Backup
The `snapshot save` command for backing up the datacenter state has many configuration options. In a production environment, you will want to configure ACL tokens and client certificates for security. The configuration options also allow you to specify the datacenter and server to collect the backup data from. Below are several examples.
First, we will run the basic snapshot command on one of our servers with the all the defaults, including `consistent` mode.
```shell
$ consul snapshot save backup.snap
Saved and verified snapshot to index 1176
```
The backup will be saved locally in the directory where we ran the command.
You can view metadata about the backup with the `inspect` subcommand.
```shell
$ consul snapshot inspect backup.snap
ID 2-1182-1542056499724
Size 4115
Index 1182
Term 2
Version 1
```
To understand each field review the inspect [documentation](https://www.consul.io/docs/commands/snapshot/inspect.html). Notably, the `Version` field does not correspond to the version of the data. Rather it is the snapshot format version.
Next, lets collect the datacenter data from a non-leader server by specifying stale mode.
```shell
$ consul snapshot save -stale backup.snap
Saved and verified snapshot to index 2276
```
Once ACLs and agent certificates are configured, they can be passed in as environtmennt variables or flags.
```shell
$ export CONSUL_HTTP_TOKEN=<your ACL token>
$ consul snapshot save -stale -ca-file=</path/to/file> backup.snap
Saved and verified snapshot to index 2287
```
In the above example, we set the token as an ENV and the ca-file with a command line flag.
For production use, the `snapshot save` command or [API](https://www.consul.io/api/snapshot.html) should be scripted and run frequently. In addition to frequently backing up the datacenter state, there are several use cases when you would also want to manually execute `snapshot save`. First, you should always backup the datacenter before upgrading. If the upgrade does not go according to plan it is often not possible to downgrade due to changes in the state store format. Restoring from a backup is the only option so taking one before the upgrade will ensure you have the latest data. Second, if the datacenter loses quorum it may be beneficial to save the state before the servers become divergent. Finally, you can manually snapshot a datacenter and use that to bootstrap a new datacenter with the same state.
Operationally, the backup process does not need to be executed on every server. Additionally, you can use the configuration options to save the backups to a mounted filesystem. The mounted filesystem can even be cloud storage, such as Amazon S3. The enterprise command `snapshot agent` automates this process.
## Restore from Backup
Running the `restore` process should be straightforward. However, there are a couple of actions you can take to ensure the process goes smoothly. First, make sure the datacenter you are restoring is stable and has a leader. You can see this using `consul operator raft list-peers` and checking server logs and telemetry for signs of leader elections or network issues.
You will only need to run the process once, on the leader. The Raft consensus protocol ensures that all servers restore the same state.
```shell
$ consul snapshot restore backup.snap
Restored snapshot
```
Like the `save` subcommand, restore has many configuration options. In production, you would again want to use ACLs and certificates for security.
## Summary
In this guide, we learned about the `snapshot save` and `snapshot restore` commands. If you are testing the backup and restore process, you can add an extra dummy value to Consul KV. Another indicator that the backup was saved correctly is the size of the backup artifact.

View File

@ -1,263 +0,0 @@
---
layout: docs
page_title: Consul Cluster Monitoring & Metrics
description: >-
After setting up your first datacenter, it is an ideal time to make sure your
cluster is healthy and establish a baseline.
---
# Consul Cluster Monitoring and Metrics
After setting up your first datacenter, it is an ideal time to make sure your cluster is healthy and establish a baseline. This guide will cover several types of metrics in two sections: Consul health and server health.
**Consul health**:
- Transaction timing
- Leadership changes
- Autopilot
- Garbage collection
**Server health**:
- File descriptors
- CPU usage
- Network activity
- Disk activity
- Memory usage
For each type of metric, we will review their importance and help identify when a metric is indicating a healthy or unhealthy state.
First, we need to understand the three methods for collecting metrics. We will briefly cover using SIGUSR1, the HTTP API, and telemetry.
Before starting this guide, we recommend configuring [ACLs](/docs/guides/acl).
## How to Collect Metrics
There are three methods for collecting metrics. The first, and simplest, is to use `SIGUSR1` for a one-time dump of current telemetry values. The second method is to get a similar one-time dump using the HTTP API. The third method, and the one most commonly used for long-term monitoring, is to enable telemetry in the Consul configuration file.
### SIGUSR1 for Local Use
To get a one-time dump of current metric values, we can send the `SIGUSR1` signal to the Consul process.
```shell
$ kill -USR1 <process_id>
```
This will send the output to the system logs, such as `/var/log/messages` or to `journald`. If you are monitoring the Consul process in the terminal via `consul monitor`, you will see the metrics in the output.
Although this is the easiest way to get a quick read of a single Consul agents health, it is much more useful to look at how the values change over time.
### API GET Request
Next lets use the HTTP API to quickly collect metrics with curl.
```ssh
$ curl http://127.0.0.1:8500/v1/agent/metrics
```
In production you will want to set up credentials with an ACL token and [enable TLS](/docs/agent/encryption) for secure communications. Once ACLs have been configured, you can pass a token with the request.
```shell
$ curl \
--header "X-Consul-Token: <YOUR_ACL_TOKEN>" \
https://127.0.0.1:8500/v1/agent/metrics
```
In addition to being a good way to quickly collect metrics, it can be added to a script or it can be used with monitoring agents that support HTTP scraping, such as Prometheus, to visualize the data.
### Enable Telemetry
Finally, Consul can be configured to send telemetry data to a remote monitoring system. This allows you to monitor the health of agents over time, spot trends, and plan for future needs. You will need a monitoring agent and console for this.
Consul supports the following telemetry agents:
- Circonus
- DataDog (via `dogstatsd`)
- StatsD (via `statsd`, `statsite`, `telegraf`, etc.)
If you are using StatsD, you will also need a compatible database and server, such as Grafana, Chronograf, or Prometheus.
Telemetry can be enabled in the agent configuration file, for example `server.hcl`. Telemetry can be enabled on any agent, client or server. Normally, you would at least enable it on all the servers (both voting and non-voting) to monitor the health of the entire cluster.
An example snippet of `server.hcl` to send telemetry to DataDog looks like this:
```json
"telemetry": {
"dogstatsd_addr": "localhost:8125",
"disable_hostname": true
}
```
When enabling telemetry on an existing cluster, the Consul process will need to be reloaded. This can be done with `consul reload` or `kill -HUP <process_id>`. It is recommended to reload the servers one at a time, starting with the non-leaders.
## Consul Health
The Consul health metrics reveal information about the Consul cluster. They include performance metrics for the key value store, transactions, raft, leadership changes, autopilot tuning, and garbage collection.
### Transaction Timing
The following metrics indicate how long it takes to complete write operations
in various parts, including Consul KV and Raft from the Consul server. Generally, these values should remain reasonably consistent and no more than a few milliseconds each.
| Metric Name | Description |
| :----------------------- | :------------------------------------------------------------------------------ |
| `consul.kvs.apply` | Measures the time it takes to complete an update to the KV store. |
| `consul.txn.apply` | Measures the time spent applying a transaction operation. |
| `consul.raft.apply` | Counts the number of Raft transactions occurring over the interval. |
| `consul.raft.commitTime` | Measures the time it takes to commit a new entry to the Raft log on the leader. |
Sudden changes in any of the timing values could be due to unexpected load on the Consul servers or due to problems on the hosts themselves. Specifically, if any of these metrics deviate more than 50% from the baseline over the previous hour, this indicates an issue. Below are examples of healthy transaction metrics.
```shell
'consul.raft.apply': Count: 1 Sum: 1.000 LastUpdated: 2018-11-16 10:55:03.673805766 -0600 CST m=+97598.238246167
'consul.raft.commitTime': Count: 1 Sum: 0.017 LastUpdated: 2018-11-16 10:55:03.673840104 -0600 CST m=+97598.238280505
```
### Leadership Changes
In a healthy environment, your Consul cluster should have a stable leader. There shouldnt be any leadership changes unless you manually change leadership (by taking a server out of the cluster, for example). If there are unexpected elections or leadership changes, you should investigate possible network issues between the Consul servers. Another possible cause could be that the Consul servers are unable to keep up with the transaction load.
Note: These metrics are reported by the follower nodes, not by the leader.
| Metric Name | Description |
| :------------------------------- | :------------------------------------------------------------------------------------------------------------- |
| `consul.raft.leader.lastContact` | Measures the time since the leader was last able to contact the follower nodes when checking its leader lease. |
| `consul.raft.state.candidate` | Increments when a Consul server starts an election process. |
| `consul.raft.state.leader` | Increments when a Consul server becomes a leader. |
If the `candidate` or `leader` metrics are greater than 0 or the `lastContact` metric is greater than 200ms, you should look into one of the possible causes described above. Below are examples of healthy leadership metrics.
```shell
'consul.raft.leader.lastContact': Count: 4 Min: 10.000 Mean: 31.000 Max: 50.000 Stddev: 17.088 Sum: 124.000 LastUpdated: 2018-12-17 22:06:08.872973122 +0000 UTC m=+3553.639379498
'consul.raft.state.leader': Count: 1 Sum: 1.000 LastUpdated: 2018-12-17 22:05:49.104580236 +0000 UTC m=+3533.870986584
'consul.raft.state.candidate': Count: 1 Sum: 1.000 LastUpdated: 2018-12-17 22:05:49.097186444 +0000 UTC m=+3533.863592815
```
### Autopilot
The autopilot metric is a boolean. A value of 1 indicates a healthy cluster and 0 indicates an unhealthy state.
| Metric Name | Description |
| :------------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `consul.autopilot.healthy` | Tracks the overall health of the local server cluster. If all servers are considered healthy by autopilot, this will be set to 1. If any are unhealthy, this will be 0. |
An alert should be setup for a returned value of 0. Below is an example of a healthy cluster according to the autopilot metric.
```shell
[2018-12-17 13:03:40 -0500 EST][G] 'consul.autopilot.healthy': 1.000
```
### Garbage Collection
Garbage collection (GC) pauses are a "stop-the-world" event, all runtime threads are blocked until GC completes. In a healthy environment these pauses should only last a few nanoseconds. If memory usage is high, the Go runtime may start the GC process so frequently that it will slow down Consul. You might observe more frequent leader elections or longer write times.
| Metric Name | Description |
| :--------------------------------- | :---------------------------------------------------------------------------------------------------- |
| `consul.runtime.total_gc_pause_ns` | Number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started. |
If the value return is more than 2 seconds/minute, you should start investigating the cause. If it exceeds 5 seconds per minute, you should consider the cluster to be in a critical state and start ensuring failure recovery procedures are up-to-date and start investigating. Below is an example of healthy GC pause.
```shell
'consul.runtime.total_gc_pause_ns': 136603664.000
```
Note, `total_gc_pause_ns` is a cumulative counter, so in order to calculate rates, such as GC/minute, you will need to apply a function such as [non_negative_difference](https://docs.influxdata.com/influxdb/v1.5/query_language/functions/#non-negative-difference).
## Server Health
The server metrics provide information about the health of your cluster including file handles, CPU usage, network activity, disk activity, and memory usage.
### File Descriptors
The majority of Consul operations require a file descriptor handle, including receiving a connection from another host, sending data between servers, and writing snapshots to disk. If Consul runs out of handles, it will stop accepting connections.
| Metric Name | Description |
| :------------------------- | :------------------------------------------------------------------ |
| `linux_sysctl_fs.file-nr` | Number of file handles being used across all processes on the host. |
| `linux_sysctl_fs.file-max` | Total number of available file handles. |
By default, process and kernel limits are conservative, you may want to increase the limits beyond the defaults. If the `linux_sysctl_fs.file-nr` value exceeds 80% of `linux_sysctl_fs.file-max`, the file handles should be increased. Below is an example of a file handle metric.
```shell
linux_sysctl_fs, host=statsbox, file-nr=768i, file-max=96763i
```
### CPU Usage
Consul should not be demanding of CPU time on either server or clients. A spike in CPU usage could indicate too many operations taking place at once.
| Metric Name | Description |
| :--------------- | :------------------------------------------------------------------------ |
| `cpu.user_cpu` | Percentage of CPU being used by user processes (such as Vault or Consul). |
| `cpu.iowait_cpu` | Percentage of CPU time spent waiting for I/O tasks to complete. |
If `cpu.iowait_cpu` is greater than 10%, it should be considered critical as Consul is waiting for data to be written to disk. This could be a sign that Raft is writing snapshots to disk too often. Below is an example of a healthy CPU metric.
```shell
cpu, cpu=cpu-total, usage_idle=99.298, usage_user=0.400, usage_system=0.300, usage_iowait=0, usage_steal=0
```
### Network Activity
Network activity should be consistent. A sudden spike in network traffic to Consul might be the result of a misconfigured client, such as Vault, that is causing too many requests.
Most agents will report separate metrics for each network interface, so be sure you are monitoring the right one.
| Metric Name | Description |
| :--------------- | :------------------------------------------- |
| `net.bytes_recv` | Bytes received on each network interface. |
| `net.bytes_sent` | Bytes transmitted on each network interface. |
Sudden increases to the `net` metrics, greater than 50% deviation from baseline, indicates too many requests that are not being handled. Below is an example of a network activity metric.
```shell
net, interface=enp0s5, bytes_sent=6183357i, bytes_recv=262313256i
```
Note: The `net` metrics are counters, so in order to calculate rates, such as bytes/second,
you will need to apply a function such as [non_negative_difference](https://docs.influxdata.com/influxdb/v1.5/query_language/functions/#non-negative-difference).
### Disk Activity
Normally, there is low disk activity, because Consul keeps everything in memory. If the Consul host is writing a large amount of data to disk, it could mean that Consul is under heavy write load and consequently is checkpointing Raft snapshots to disk frequently. It could also mean that debug/trace logging has accidentally been enabled in production, which can impact performance.
| Metric Name | Description |
| :------------------- | :-------------------------------------------------------- |
| `diskio.read_bytes` | Bytes read from each block device. |
| `diskio.write_bytes` | Bytes written to each block device. |
| `diskio.read_time` | Time spent reading from disk, in cumulative milliseconds. |
| `diskio.write_time` | Time spent writing to disk, in cumulative milliseconds. |
Sudden, large changes to the `diskio` metrics, greater than 50% deviation from baseline
or more than 3 standard deviations from baseline indicates Consul has too much disk I/O. Too much disk I/O can cause the rest of the system to slow down or become unavailable, as the kernel spends all its time waiting for I/O to complete. Below are examples of disk activity metrics.
```shell
diskio, name=sda5, read_bytes=522298368i, write_bytes=1726865408i, read_time=7248i, write_time=133364i
```
Note: The `diskio` metrics are counters, so in order to calculate rates (such as bytes/second),you will need to apply a function such as [non_negative_difference][].
### Memory Usage
As noted previously, Consul keeps all of its data -- the KV store, the catalog, etc -- in memory. If Consul consumes all available memory, it will crash. You should monitor total available RAM to make sure some RAM is available for other system processes and swap usage should remain at 0% for best performance.
| Metric Name | Description |
| :--------------------------- | :------------------------------------------------------------- |
| `consul.runtime.alloc_bytes` | Measures the number of bytes allocated by the Consul process. |
| `consul.runtime.sys_bytes` | The total number of bytes of memory obtained from the OS. |
| `mem.total` | Total amount of physical memory (RAM) available on the server. |
| `mem.used_percent` | Percentage of physical memory in use. |
| `swap.used_percent` | Percentage of swap space in use. |
Consul servers are running low on memory if `sys_bytes` exceeds 90% of `total_bytes`, `mem.used_percent` is over 90%, or `swap.used_percent` is greater than 0. You should increase the memory available to Consul if any of these three conditions are met. Below are examples of memory usage metrics.
```shell
'consul.runtime.alloc_bytes': 11199928.000
'consul.runtime.sys_bytes': 24627448.000
mem, used_percent=31.492, total=1036312576i
swap, used_percent=1.343
```
## Summary
In this guide we reviewed the three methods for collecting metrics. SIGUSR1 and agent HTTP API are both quick methods for collecting metrics, but enabling telemetry is the best method for moving data into monitoring software. Additionally, we outlined the various metrics collected and their significance.

View File

@ -1,252 +0,0 @@
---
layout: docs
page_title: Using Envoy with Connect
description: This guide walks though getting started running Envoy as a Connect Proxy.
---
# Using Connect with Envoy Proxy
Consul Connect has first class support for using
[Envoy](https://www.envoyproxy.io/) as a proxy. This guide will describe how to
setup a development-mode Consul server and two services that use Envoy proxies
on a single machine with [Docker](https://www.docker.com/). The aim of this
guide is to demonstrate a minimal working setup and the moving parts involved,
it is not intended for production deployments.
For reference documentation on how the integration works and is configured,
please see our [Envoy documentation](/docs/connect/proxies/envoy).
## Setup Overview
We'll start all containers using Docker's `host` network mode and will have a
total of five containers running by the end of this guide.
1. A single Consul server
2. An example TCP `echo` service as a destination
3. An Envoy sidecar proxy for the `echo` service
4. An Envoy sidecar proxy for the `client` service
5. An example `client` service (netcat)
We choose to run in Docker since Envoy is only distributed as a Docker image so
it's the quickest way to get a demo running. The same commands used here will
work in just the same way outside of Docker if you build an Envoy binary
yourself.
## Building an Envoy Image
Starting Envoy requires a bootstrap configuration file that points Envoy to the
local agent for discovering the rest of it's configuration. The Consul binary
includes the [`consul connect envoy` command](/docs/commands/connect/envoy)
which can generate the bootstrap configuration for Envoy and optionally run it
directly.
Envoy's official Docker image can be used with Connect directly however it
requires some additional steps to generate bootstrap configuration and inject it
into the container.
Instead, we'll use Docker multi-stage builds (added in version 17.05) to make a
local image that has both `envoy` and `consul` binaries.
We'll create a local Docker image to use that contains both binaries. First
create a `Dockerfile` containing the following:
```shell
FROM consul:latest
FROM envoyproxy/envoy:v1.10.0
COPY --from=0 /bin/consul /bin/consul
ENTRYPOINT ["dumb-init", "consul", "connect", "envoy"]
```
This takes the Consul binary from the latest release image and copies it into a
new image based on the official Envoy image.
This can be built locally with:
```shell
docker build -t consul-envoy .
```
We will use the `consul-envoy` image we just made to configure and run Envoy
processes later.
## Deploying a Consul Server
Next we need a Consul server. We'll work with a single Consul server in `-dev`
mode for simplicity.
-> **Note:** `-dev` mode enables the gRPC server on port 8502 by default. For a
production agent you'll need to [explicitly configure the gRPC
port](/docs/agent/options#grpc_port).
In order to start a proxy instance, a [proxy service
definition](/docs/connect/proxies) must exist on the local Consul agent.
We'll create one using the [sidecar service
registration](/docs/connect/proxies/sidecar-service) syntax.
Create a configuration file called `envoy_demo.hcl` containing the following
service definitions.
```hcl
services {
name = "client"
port = 8080
connect {
sidecar_service {
proxy {
upstreams {
destination_name = "echo"
local_bind_port = 9191
}
}
}
}
}
services {
name = "echo"
port = 9090
connect {
sidecar_service {}
}
}
```
The Consul container can now be started with that configuration.
```shell
$ docker run --rm -d -v$(pwd)/envoy_demo.hcl:/etc/consul/envoy_demo.hcl \
--network host --name consul-agent consul:latest \
agent -dev -config-file /etc/consul/envoy_demo.hcl
1c90f7fcc83f5390332d7a4fdda2f1bf74cf62762de9ea2f67cd5a09c0573641
```
Running with `-d` like this puts the container into the background so we can
continue in the same terminal. Log output can be seen using the name we gave.
```shell
docker logs -f consul-agent
```
Note that the Consul server has registered two services `client` and `echo`, but
also registered two proxies `client-sidecar-proxy` and `echo-sidecar-proxy`.
Next we'll need to run those services and proxies.
## Running the Echo Service
Next we'll run the `echo` service. We can use an existing TCP echo utility image
for this.
Start the echo service on port 9090 as registered before.
```shell
$ docker run -d --network host abrarov/tcp-echo --port 9090
1a0b0c569016d00aadc4fc2b2954209b32b510966083f2a9e17d3afc6d185d87
```
## Running the Proxies
We can now run "sidecar" proxy instances.
```shell
$ docker run --rm -d --network host --name echo-proxy \
consul-envoy -sidecar-for echo
3f213a3cf9b7583a194dd0507a31e0188a03fc1b6e165b7f9336b0b1bb2baccb
$ docker run --rm -d --network host --name client-proxy \
consul-envoy -sidecar-for client -admin-bind localhost:19001
d8399b54ee0c1f67d729bc4c8b6e624e86d63d2d9225935971bcb4534233012b
```
The `-admin-bind` flag on the second proxy command is needed because both
proxies are running on the host network and so can't bind to the same port for
their admin API (which cannot be disabled).
Again we can see the output using docker logs. To see more verbose information
from Envoy you can add `-- -l debug` to the end of the commands above. This
passes the `-l` (log level) option directly through to Envoy. With debug level
logs you should see the config being delivered to the proxy in the output.
The [`consul connect envoy` command](/docs/commands/connect/envoy) here is
connecting to the local agent, getting the proxy configuration from the proxy
service registration and generating the required Envoy bootstrap configuration
before `exec`ing the envoy binary directly to run it with the generated
configuration.
Envoy uses the bootstrap configuration to connect to the local agent directly
via gRPC and use it's xDS protocol to retrieve the actual configuration for
listeners, TLS certificates, upstream service instances and so on. The xDS API
allows the Envoy instance to watch for any changes so certificate rotations or
changes to the upstream service instances are immediately sent to the proxy.
## Running the Client Service
Finally, we can see the connectivity by running a dummy "client" service. Rather
than run a full service that itself can listen, we'll simulate the service with
a simple netcat process that will only talk to the `client-sidecar-proxy` Envoy
instance.
Recall that we configured the `client` sidecar with one declared "upstream"
dependency (the `echo` service). In that declaration we also requested that the
`echo` service should be exposed to the client on local port 9191.
This configuration causes the `client-sidecar-proxy` to start a TCP proxy
listening on `localhost:9191` and proxying to the `echo` service. Importantly,
the listener will use the correct `client` service mTLS certificate to authorize
the connection. It discovers the IP addresses of instances of the echo service
via Consul service discovery.
We can now see this working if we run netcat.
```shell
$ docker run -ti --rm --network host gophernet/netcat localhost 9191
Hello World!
Hello World!
^C
```
## Testing Authorization
To demonstrate that Connect is controlling authorization for the echo service,
we can add an explicit deny rule.
```shell
$ docker run -ti --rm --network host consul:latest intention create -deny client echo
Created: client => echo (deny)
```
Now, new connections will be denied. Depending on a few factors, netcat may not
see the connection being closed but will not get a response from the service.
```shell
$ docker run -ti --rm --network host gophernet/netcat localhost 9191
Hello?
Anyone there?
^C
```
-> **Note:** Envoy will not currently re-authenticate already established TCP
connections so if you still have the netcat terminal open from before, that will
still be able to communicate with "echo". _New_ connections should be denied
though.
Removing the intention restores connectivity.
```
$ docker run -ti --rm --network host consul:latest intention delete client echo
Intention deleted.
$ docker run -ti --rm --network host gophernet/netcat localhost 9191
Hello?
Hello?
^C
```
## Summary
In this guide we walked through getting a minimal working example of two plain
TCP processes communicating over mTLS using Envoy sidecars configured by
Connect.
For more details on how the Envoy integration works, please see the [Envoy
reference documentation](/docs/connect/proxies/envoy).
To see how to get Consul Connect working in different environments like
Kubernetes see the [Connect Getting
Started](/docs/connect#getting-started-with-connect) overview.

View File

@ -1,184 +0,0 @@
---
layout: docs
page_title: Connect in Production
description: This guide describes best practices for running Consul Connect in production.
---
# Running Connect in Production
Consul Connect can secure all inter-service communication with mutual TLS. It's
designed to work with [minimal configuration out of the
box](https://learn.hashicorp.com/consul/getting-started/connect), however, completing the [security
checklist](/docs/connect/security) and understanding the [Consul security
model](/docs/internals/security) are prerequisites for production
deployments.
After completing this guide, you will be able to configure Connect to
secure services. First, you will secure your Consul cluster with ACLs and
TLS encryption. Next, you will configure Connect on the servers and host.
Finally, you will configure your services to use Connect.
~> Note: To complete this guide you should already have a Consul cluster
with an appropriate number of servers and
clients deployed according to the other reference material including the
[deployment](/docs/guides/deployment) and
[performance](/docs/install/performance) guides.
The steps we need to get to a secure Connect cluster are:
1. [Configure ACLs](#configure-acls)
1. [Configure Agent Transport Encryption](#configure-agent-transport-encryption)
1. [Bootstrap Connect's Certificate Authority](#bootstrap-certificate-authority)
1. [Setup Host Firewall](#setup-host-firewall)
1. [Configure Service Instances](#configure-service-instances)
For existing Consul deployments, it may be necessary to incrementally adopt Connect
service-by-service. In this case, step one and two should already be complete.
However, we recommend reviewing all steps since the final deployment goal is to be compliant with all the security recommendations in this guide.
## Configure ACLs
Consul Connect's security is based on service identity. In practice, the identity
of the service is only enforcible with sufficiently restrictive ACLs.
This section will not replace reading the full [ACL
guide](/docs/guides/acl) but will highlight the specific requirements
Connect relies on to ensure it's security properties.
A service's identity, in the form of an x.509 certificate, will only be issued
to an API client that has `service:write` permission for that service. In other
words, any client that has permission to _register_ an instance of a service
will be able to identify as that service and access all of the resources that that
service is allowed to access.
A secure ACL setup must meet the following criteria.
1. **[ACL default
policy](/docs/agent/options#acl_default_policy)
must be `deny`.** If for any reason you cannot use the default policy of
`deny`, you must add an explicit ACL denying anonymous `service:write`. Note, in this case the Connect intention graph will also default to
`allow` and explicit `deny` intentions will be needed to restrict service
access. Also note that explicit rules to limit who can manage intentions are
necessary in this case. It is assumed for the remainder of this guide that
ACL policy defaults to `deny`.
2. **Each service must have a unique ACL token** that is restricted to
`service:write` only for the named service. You can review the [Securing Consul with ACLs](https://learn.hashicorp.com/consul/advanced/day-1-operations/production-acls#apply-individual-tokens-to-the-services) guide for a
service token example. Note, it is best practices for each instance to get a unique token as described below.
~> Individual Service Tokens: It is best practice to create a unique ACL token per service _instance_ because
it limits the blast radius of a compromise. However, since Connect intentions manage access based only on service identity, it is
possible to create only one ACL token per _service_ and share it between
instances.
In practice, managing per-instance tokens requires automated ACL provisioning,
for example using [HashiCorp's
Vault](https://www.vaultproject.io/docs/secrets/consul).
## Configure Agent Transport Encryption
Consul's gossip (UDP) and RPC (TCP) communications need to be encrypted
otherwise attackers may be able to see ACL tokens while in flight
between the server and client agents (RPC) or between client agent and
application (HTTP). Certificate private keys never leave the host they
are used on but are delivered to the application or proxy over local
HTTP so local agent traffic should be encrypted where potentially
untrusted parties might be able to observe localhost agent API traffic.
Follow the [encryption guide](https://learn.hashicorp.com/consul/advanced/day-1-operations/agent-encryption) to ensure
both gossip encryption and RPC/HTTP TLS are configured securely.
## Bootstrap Connect's Certificate Authority
Consul Connect comes with a built-in Certificate Authority (CA) that will
bootstrap by default when you first [enable](https://www.consul.io/docs/agent/options.html#connect_enabled) Connect on your servers.
To use the built-in CA, enable it in the server's configuration.
```text
connect {
enabled = true
}
```
This configuration change requires a Consul server restart, which you can perform one server at a time
to maintain availability in an existing cluster.
As soon as a server that has Connect enabled becomes the leader, it will
bootstrap a new CA and generate it's own private key which is written to the
Raft state.
Alternatively, an external private key can be provided via the [CA
configuration](/docs/connect/ca#specifying-a-private-key-and-root-certificate).
~> External CAs: Connect has been designed with a pluggable CA component so external CAs can be
integrated. For production workloads we recommend using [Vault or another external
CA](/docs/connect/ca#external-ca-certificate-authority-providers) once
available such that the root key is not stored within Consul state at all.
## Setup Host Firewall
In order to enable inbound connections to connect proxies, you may need to
configure host or network firewalls to allow incoming connections to proxy
ports.
In addition to Consul agent's [communication
ports](/docs/agent/options#ports) any
[proxies](/docs/connect/proxies) will need to have
ports open to accept incoming connections.
If using [sidecar service
registration](/docs/connect/proxies/sidecar-service) Consul will by default
assign ports from [a configurable
range](/docs/agent/options#sidecar_min_port) the default range is 21000 -
21255. If this feature is used, the agent assumes all ports in that range are
both free to use (no other processes listening on them) and are exposed in the
firewall to accept connections from other service hosts.
It is possible to prevent automated port selection by [configuring
`sidecar_min_port` and
`sidecar_max_port`](/docs/agent/options#sidecar_min_port) to both be `0`,
forcing any sidecar service registrations to need an explicit port configured.
It then becomes the same problem as opening ports necessary for any other
application and might be managed by configuration management or a scheduler.
## Configure Service Instances
With [necessary ACL tokens](#configure-acls) in place, all service registrations
need to have an appropriate ACL token present.
For on-disk configuration the `token` parameter of the service definition must
be set.
```json
{
"service": {
"name": "cassandra_db",
"port": 9002,
"token: "<your_token_here>"
}
}
```
For registration via the API the token is passed in the [request
header](/api#authentication), `X-Consul-Token`, or by using the [Go
client configuration](https://godoc.org/github.com/hashicorp/consul/api#Config).
To avoid the overhead of a proxy, applications may [natively
integrate](/docs/connect/native) with connect.
~> Protect Application Listener: If using any kind of proxy for connect, the application must ensure no untrusted
connections can be made to it's unprotected listening port. This is typically
done by binding to `localhost` and only allowing loopback traffic, but may also
be achieved using firewall rules or network namespacing.
For examples of proxy service definitions see the [proxy
documentation](/docs/connect/proxies).
## Summary
After securing your Consul cluster with ACLs and TLS encryption, you
can use Connect to secure service-to-service communication. If you
encounter any issues while setting up Consul Connect, there are
many [community](https://www.consul.io/community.html) resources where you can find help.

View File

@ -1,89 +0,0 @@
---
layout: docs
page_title: Consul-AWS
description: >-
Consul-AWS provides a tool, which syncs Consul's and AWS Cloud Map's service
catalog
---
# Consul-AWS
[Consul-AWS](https://github.com/hashicorp/consul-aws/) syncs the services in an AWS Cloud Map namespace to a Consul datacenter. Consul services will be created in AWS Cloud Map and the other way around. This enables native service discovery across Consul and AWS Cloud Map.
This guide will describe how to configure and how to start the sync.
## Authentication
`consul-aws` needs access to Consul and AWS for uni- and bidirectional sync.
For Consul, the process accepts both the standard CLI flags, `-token` and the environment variables `CONSUL_HTTP_TOKEN`. This should be set to a Consul ACL token if ACLs are enabled.
For AWS, `consul-aws` uses the default credential provider chain to find AWS credentials. The default provider chain looks for credentials in the following order:
1. Environment variables.
2. Shared credentials file.
3. If your application is running on an Amazon EC2 instance, IAM role for Amazon EC2.
## Configuration
There are two subcommands available on `consul-aws`:
- version: display version number
- sync-catalog: start syncing the catalogs
The version subcommand doesnt do anything besides showing the version, so lets focus on sync-catalog. The following flags are available:
- A set of parameters to connect to your Consul Cluster like `-http-addr`, `-token`, `-ca-file`, `-client-cert`, and everything else you might need in order to do that
- `-aws-namespace-id`: The AWS namespace to sync with Consul services.
- `-aws-service-prefix`: A prefix to prepend to all services written to AWS from Consul. If this is not set then services will have no prefix.
- `-consul-service-prefix`: A prefix to prepend to all services written to Consul from AWS. If this is not set then services will have no prefix.
- `-to-aws`: If true, Consul services will be synced to AWS (defaults to false).
- `-to-consul`: If true, AWS services will be synced to Consul (defaults to false).
- `-aws-pull-interval`: The interval between fetching from AWS Cloud Map. Accepts a sequence of decimal numbers, each with optional fraction and a unit suffix, such as "300ms", "10s", "1.5m" (defaults to 30s).
- `-aws-dns-ttl`: DNS TTL for services created in AWS Cloud Map in seconds (defaults to 60).
Independent of how you want to use `consul-aws` it needs to be able to connect to Consul and AWS. Apart from making sure you setup up authenticated access, `-aws-namespace-id` is mandatory.
## Syncing Consul services to AWS Cloud Map
Assuming authenticated access is set up, there is little left to do before starting the sync. Using `-to-aws` command line flag will start the sync to AWS Cloud Map. If `-aws-service-prefix` is provided, every imported service from Consul will be prefixed. For example:
```shell
$ consul-aws -aws-namespace-id ns-hjrgt3bapp7phzff -to-aws -consul-service-prefix consul_
```
At this point `consul-aws` will start importing services into AWS Cloud Map. A service in Consul named `web` will end up becoming `consul_web` in AWS. The individual service instances from Consul will be created in AWS as well.
Services in AWS Cloud Map that were imported from Consul have the following properties:
- Description: “Imported from Consul”
- Record types: A and SRV
- DNS routing policy: Multivalue answer routing
## Syncing AWS Cloud Map services to Consul
Similar to the previous chapter, there are two relevant flags: `-to-consul` to turn on the sync and optionally `-consul-service-prefix` to prefix every service imported into Consul. For example:
```shell
$ consul-aws -aws-namespace-id ns-hjrgt3bapp7phzff -to-consul -aws-service-prefix aws_
```
At this point `consul-aws` will start importing services into Consul. A service in AWS named `redis` will end up becoming `aws_redis` in Consul. The individual service instances from AWS will be created in Consul as well.
- Services in Consul that were imported from AWS Cloud Map have the following properties:
- Tag: aws
- Meta-Data: has aws as the source set, as well as the aws-id, the aws-namespace and every custom attribute the instance had in AWS Cloud Map
- Node: the node name is consul-aws
## Syncing both directions
To enable bidirectional sync only put together the previous two sections and provide `-to-consul` and `-to-aws` as well as optionally `-aws-service-prefix` and `-consul-service-prefix`:
```shell
$ consul-aws -aws-namespace-id ns-hjrgt3bapp7phzff -to-consul -aws-service-prefix aws_ -to-aws -consul-service-prefix consul_
```
At this point `consul-aws` will start importing services into Consul from AWS Cloud Map and from AWS Cloud Map to Consul.
## Summary
At this point, either uni- or bidirectional sync is set up and service discovery is available across Consul and AWS seamlessly. If you havent enabled [ACL](/docs/guides/acl), now is a good time to read about it.

View File

@ -1,120 +0,0 @@
---
layout: docs
page_title: Using Consul with Containers
description: >-
This guide describes how to run Consul on containers, with Docker as the
primary focus. It also describes best practices when running a Consul cluster
in production on Docker.
---
# Consul with Containers
This guide describes critical aspects of operating a Consul cluster that's run inside containers. It primarily focuses on the Docker container runtime, but the principles largely apply to rkt, oci, and other container runtimes as well.
## Consul Official Docker Image
Consul's official Docker images are tagged with version numbers. For example, `docker pull consul:1.4.4` will pull the 1.4.4 Consul release image.
For major releases, make sure to read our [upgrade guides](/docs/upgrade-specific) before upgrading a cluster.
To get a development mode Consul instance running the latest version, run `docker run consul`.
More instructions on how to get started using this image are available at the [official Docker repository page](https://store.docker.com/images/consul)
## Data Directory Persistence
The container exposes its data directory, `/consul/data`, as a [volume](https://docs.docker.com/engine/tutorials/dockervolumes/). This is where Consul will store its persisted state.
For clients, this stores some information about the cluster and the client's services and health checks in case the container is restarted. If the volume on a client disappears, it doesn't affect cluster operations.
For servers, this stores the client information plus snapshots and data related to the consensus algorithm and other state like Consul's key/value store and catalog. **Servers need the volume's data to be available when restarting containers to recover from outage scenarios.** Therefore, care must be taken by operators to make sure that volumes containing consul cluster data are not destroyed during container restarts.
~> We also recommend taking additional backups via [`consul snapshot`](/docs/commands/snapshot), and storing them externally.
## Configuration
The container has a Consul configuration directory set up at `/consul/config` and the agent will load any configuration files placed here by binding a volume or by composing a new image and adding files.
Note that the configuration directory is not exposed as a volume, and will not persist. Consul uses it only during start up and does not store any state there.
Configuration can also be added by passing the configuration JSON via environment variable CONSUL_LOCAL_CONFIG. Example:
```shell
$ docker run \
-d \
-e CONSUL_LOCAL_CONFIG='{
"datacenter":"us_west",
"server":true,
"enable_debug":true
}' \
consul agent -server -bootstrap-expect=3
```
## Networking
When running inside a container, Consul must be configured with an appropriate _cluster address_ and _client address_. In some cases, it may also require configuring an _advertise address_.
- **Cluster Address** - The address at which other Consul agents may contact a given agent. This is also referred to as the bind address.
- **Client Address** - The address where other processes on the host contact Consul in order to make HTTP or DNS requests. Consider setting this to localhost or `127.0.0.1` to only allow processes on the same container to make HTTP/DNS requests.
- **Advertise Address** - The advertise address is used to change the address that we advertise to other nodes in the cluster. This defaults to the bind address. Consider using this if you use NAT in your environment, or in scenarios where you have a routable address that cannot be bound.
You will need to tell Consul what its cluster address is when starting so that it binds to the correct interface and advertises a workable interface to the rest of the Consul agents. There are two ways of doing this:
1. Environment Variables: Use the `CONSUL_CLIENT_INTERFACE` and `CONSUL_BIND_INTERFACE` environment variables. In the following example `eth0` is the network interface of the container.
```shell
$ docker run \
-d \
-e CONSUL_CLIENT_INTERFACE='eth0' \
-e CONSUL_BIND_INTERFACE='eth0' \
consul agent -server -bootstrap-expect=3
```
2. Address Templates: You can declaratively specify the client and cluster addresses using the formats described in the [go-socketaddr](https://github.com/hashicorp/go-sockaddr) library.
In the following example, the client and bind addresses are declaratively specified for the container network interface 'eth0'
```shell
$ docker run \
consul agent -server \
-client='{{ GetInterfaceIP "eth0" }}' \
-bind='{{ GetInterfaceIP "eth0" }}' \
-bootstrap-expect=3
```
## Stopping and Restarting Containers
The official Consul container supports stopping, starting, and restarting. To stop a container, run `docker stop`:
```shell
$ docker stop <container_id>
```
To start a container, run `docker start`:
```shell
$ docker start <container_id>
```
To do an in-memory reload, send a SIGHUP to the container:
```shell
$ docker kill --signal=HUP <container_id>
```
As long as there are enough servers in the cluster to maintain [quorum](/docs/internals/consensus#deployment-table), Consul's [Autopilot](/docs/guides/autopilot) feature will handle removing servers whose containers were stopped. Autopilot's default settings are already configured correctly. If you override them, make sure that the following [settings](/docs/agent/options#autopilot) are appropriate.
- `cleanup_dead_servers` must be set to true to make sure that a stopped container is removed from the cluster.
- `last_contact_threshold` should be reasonably small, so that dead servers are removed quickly.
- `server_stabilization_time` should be sufficiently large (on the order of several seconds) so that unstable servers are not added to the cluster until they stabilize.
If the container running the currently-elected Consul server leader is stopped, a leader election will trigger. This event will cause a new Consul server in the cluster to assume leadership.
When a previously stopped server container is restarted using `docker start <container_id>`, and it is configured to obtain a new IP, Autopilot will add it back to the set of Raft peers with the same node-id and the new IP address, after which it can participate as a server again.
## Known Issues
**All nodes changing IP addresses** Prior to Consul 0.9.3, Consul did not gracefully handle the situation where all nodes in the cluster running inside a container are restarted at the same time, and they all obtain new IP addresses. This has been [fixed](https://github.com/hashicorp/consul/issues/1580) since Consul 0.9.3, and requires `"raft_protocol"` to be set to `"3"` in the configs in Consul 0.9.3. Consul 1.0 makes raft protocol 3 the default.
**Snapshot close error** Due to a [known issue](https://github.com/docker/libnetwork/issues/1204) with half close support in Docker, you will see an error message `[ERR] consul: Failed to close snapshot: write tcp <source>-><destination>: write: broken pipe` when saving snapshots. This does not affect saving and restoring snapshots when running in Docker.

View File

@ -1,191 +0,0 @@
---
layout: docs
page_title: Consul Template
description: >-
Consul template provides a programmatic method for rendering configuration
files from Consul data.
---
# Consul Template
The Consul template tool provides a programmatic method
for rendering configuration files from a variety of locations,
including Consul KV. It is an ideal option for replacing complicated API
queries that often require custom formatting.
The template tool is based on Go templates and shares many
of the same attributes.
Consul template is a useful tool with several uses, we will focus on two
of it's use cases.
1. _Update configuration files_. The Consul template tool can be used
to update service configuration files. A common use case is managing load
balancer configuration files that need to be updated regularly in a dynamic
infrastructure on machines which may not be able to directly connect to the Consul cluster.
1. _Discover data about the Consul cluster and service_. It is possible to collect
information about the services in your Consul cluster. For example, you could
collect a list of all services running on the cluster or you could discover all
service addresses for the Redis service. Note, this use case has limited
scope for production.
In this guide we will briefly discuss how `consul-template` works,
how to install it, and two use cases.
Before completing this guide, we assume some familiarity with
[Consul KV](https://learn.hashicorp.com/consul/getting-started/kv)
and [Go templates](https://golang.org/pkg/text/template/).
## Introduction to Consul Template
Consul template is a simple, yet powerful tool. When initiated, it
reads one or more template files and queries Consul for all
data needed to render them. Typically, you run `consul-template` as a
daemon which will fetch the initial values and then continue to watch
for updates, re-rendering the template whenever there are relevant changes in
the cluster. You can alternatively use the `-once` flag to fetch and render
the template once which is useful for testing and
setup scripts that are triggered by some other automation for example a
provisioning tool. Finally, the template can also run arbitrary commands after the update
process completes. For example, it can send the HUP signal to the
load balancer service after a configuration change has been made.
The Consul template tool is flexible, it can fit into many
different environments and workflows. Depending on the use-case, you
may have a single `consul-template` instance on a handful of hosts
or may need to run several instances on every host. Each `consul-template`
process can manage multiple unrelated files though and will de-duplicate
the fetches as needed if those files share data dependencies so it can
reduce the load on Consul servers to share where possible.
## Install Consul Template
For this guide, we are using a local Consul agent in development
mode which can be started with `consul agent -dev`. To quickly set
up a local Consul agent, refer to the getting started [guide](https://learn.hashicorp.com/consul/getting-started/install). The
Consul agent must be running to complete all of the following
steps.
The Consul template tool is not included with the Consul binary and will
need to be installed separately. It can be installed from a precompiled
binary or compiled from source. We will be installing the precompiled binary.
First, download the binary from the [Consul Template releases page](https://releases.hashicorp.com/consul-template/).
```shell
curl -O https://releases.hashicorp.com/consul-template/0.19.5/consul-template<_version_OS>.tgz
```
Next, extract the binary and move it into your `$PATH`.
```shell
tar -zxf consul-template<_version_OS>.tgz
```
To compile from source, please see the instructions in the
[contributing section in GitHub](https://github.com/hashicorp/consul-template#contributing).
## Use Case: Consul KV
In this first use case example, we will render a template that pulls the HashiCorp address
from Consul KV. To do this we will create a simple template that contains the HashiCorp
address, run `consul-template`, add a value to Consul KV for HashiCorp's address, and
finally view the rendered file.
First, we will need to create a template file `find_address.tpl` to query
Consul's KV store:
```liquid
{{ key "/hashicorp/street_address" }}
```
Next, we will run `consul-template` specifying both
the template to use and the file to update.
```shell
$ consul-template -template "find_address.tpl:hashicorp_address.txt"
```
The `consul-template` process will continue to run until you kill it with `CRTL+c`.
For now, we will leave it running.
Finally, open a new terminal so we can write data to the key in Consul using the command
line interface.
```shell
$ consul kv put hashicorp/street_address "101 2nd St"
Success! Data written to: hashicorp/street_address
```
We can ensure the data was written by viewing the `hashicorp_address.txt`
file which will be located in the same directory where `consul-template`
was run.
```shell
$ cat hashicorp_address.txt
101 2nd St
```
If you update the key `hashicorp/street_address`, you can see the changes to the file
immediately. Go ahead and try `consul kv put hashicorp/street_address "22b Baker ST"`.
You can see that this simple process can have powerful implications. For example, it is
possible to use this same process for updating your [HAProxy load balancer
configuration](https://github.com/hashicorp/consul-template/blob/master/examples/haproxy.md).
You can now kill the `consul-template` process with `CTRL+c`.
## Use Case: Discover All Services
In this use case example, we will discover all the services running in the Consul cluster.
To follow along, you use the local development agent from the previous example.
First, we will need to create a new template `all-services.tpl` to query all services.
```liquid
{{range services}}# {{.Name}}{{range service .Name}}
{{.Address}}{{end}}
{{end}}
```
Next, run Consul template specifying the template we just created and the `-once` flag.
The `-once` flag will tell the process to run once and then quit.
```shell
$ consul-template -template="all-services.tpl:all-services.txt" -once
```
If you complete this on your local development agent, you should
still see the `consul` service when viewing `all-services.txt`.
```text
# consul
127.0.0.7
```
On a development or production cluster, you would see a list of all the services.
For example:
```text
# consul
104.131.121.232
# redis
104.131.86.92
104.131.109.224
104.131.59.59
# web
104.131.86.92
104.131.109.224
104.131.59.59
```
## Conclusion
In this guide we learned how to set up and use the Consul template tool.
To see additional examples, refer to the examples folder
in [GitHub](https://github.com/hashicorp/consul-template/tree/master/examples).

View File

@ -1,427 +0,0 @@
---
layout: docs
page_title: Creating and Configuring TLS Certificates
description: Learn how to create certificates for Consul.
---
# Creating and Configuring TLS Certificates
Setting you cluster up with TLS is an important step towards a secure
deployment. Correct TLS configuration is a prerequisite of our [Security
Model](/docs/internals/security). Correctly configuring TLS can be a
complex process however, especially given the wide range of deployment
methodologies. This guide will provide you with a production ready TLS
configuration.
~> More advanced topics like key management and rotation are not covered by this
guide. [Vault][vault] is the suggested solution for key generation and
management.
This guide has the following chapters:
1. [Creating Certificates](#creating-certificates)
1. [Configuring Agents](#configuring-agents)
1. [Configuring the Consul CLI for HTTPS](#configuring-the-consul-cli-for-https)
1. [Configuring the Consul UI for HTTPS](#configuring-the-consul-ui-for-https)
This guide is structured in way that you build knowledge with every step. It is
recommended to read the whole guide before starting with the actual work,
because you can save time if you are aware of some of the more advanced things
in Chapter [3](#configuring-the-consul-cli-for-https) and
[4](#configuring-the-consul-ui-for-https).
### Reference Material
- [Encryption](/docs/agent/encryption)
- [Security Model](/docs/internals/security)
## Creating Certificates
### Estimated Time to Complete
2 minutes
### Prerequisites
This guide assumes you have Consul 1.4.1 (or newer) in your PATH.
### Introduction
The first step to configuring TLS for Consul is generating certificates. In
order to prevent unauthorized cluster access, Consul requires all certificates
be signed by the same Certificate Authority (CA). This should be a _private_ CA
and not a public one like [Let's Encrypt][letsencrypt] as any certificate
signed by this CA will be allowed to communicate with the cluster.
### Step 1: Create a Certificate Authority
There are a variety of tools for managing your own CA, [like the PKI secret
backend in Vault][vault-pki], but for the sake of simplicity this guide will
use Consul's builtin TLS helpers:
```shell
$ consul tls ca create
==> Saved consul-agent-ca.pem
==> Saved consul-agent-ca-key.pem
```
The CA certificate (`consul-agent-ca.pem`) contains the public key necessary to
validate Consul certificates and therefore must be distributed to every node
that runs a consul agent.
~> The CA key (`consul-agent-ca-key.pem`) will be used to sign certificates for Consul
nodes and must be kept private. Possession of this key allows anyone to run Consul as
a trusted server and access all Consul data including ACL tokens.
### Step 2: Create individual Server Certificates
Create a server certificate for datacenter `dc1` and domain `consul`, if your
datacenter or domain is different please use the appropriate flags:
```shell
$ consul tls cert create -server
==> WARNING: Server Certificates grants authority to become a
server and access all state in the cluster including root keys
and all ACL tokens. Do not distribute them to production hosts
that are not server nodes. Store them as securely as CA keys.
==> Using consul-agent-ca.pem and consul-agent-ca-key.pem
==> Saved dc1-server-consul-0.pem
==> Saved dc1-server-consul-0-key.pem
```
Please repeat this process until there is an _individual_ certificate for each
server. The command can be called over and over again, it will automatically add
a suffix.
In order to authenticate Consul servers, servers are provided with a special
certificate - one that contains `server.dc1.consul` in the `Subject Alternative Name`. If you enable
[`verify_server_hostname`](/docs/agent/options#verify_server_hostname),
only agents that provide such certificate are allowed to boot as a server.
Without `verify_server_hostname = true` an attacker could compromise a Consul
client agent and restart the agent as a server in order to get access to all the
data in your cluster! This is why server certificates are special, and only
servers should have them provisioned.
~> Server keys, like the CA key, must be kept private - they effectively allow
access to all Consul data.
### Step 3: Create Client Certificates
Create a client certificate:
```shell
$ consul tls cert create -client
==> Using consul-agent-ca.pem and consul-agent-ca-key.pem
==> Saved dc1-client-consul-0.pem
==> Saved dc1-client-consul-0-key.pem
```
Client certificates are also signed by your CA, but they do not have that
special `Subject Alternative Name` which means that if `verify_server_hostname`
is enabled, they cannot start as a server.
## Configuring Agents
### Prerequisites
For this section you need access to your existing or new Consul cluster and have
the certificates from the previous chapters available.
### Notes on example configurations
The example configurations from this as well as the following chapters are in
json. You can copy each one of the examples in its own file in a directory
([`-config-dir`](/docs/agent/options#_config_dir)) from where consul will
load all the configuration. This is just one way to do it, you can also put them
all into one file if you prefer that.
### Introduction
By now you have created the certificates you need to enable TLS in your cluster.
The next steps show how to configure TLS for a brand new cluster. If you already
have a cluster in production without TLS please see the [encryption
guide][guide] for the steps needed to introduce TLS without downtime.
### Step 1: Setup Consul servers with certificates
This step describes how to setup one of your consul servers, you want to make
sure to repeat the process for the other ones as well with their individual
certificates.
The following files need to be copied to your Consul server:
- `consul-agent-ca.pem`: CA public certificate.
- `dc1-server-consul-0.pem`: Consul server node public certificate for the `dc1` datacenter.
- `dc1-server-consul-0-key.pem`: Consul server node private key for the `dc1` datacenter.
Here is an example agent TLS configuration for Consul servers which mentions the
copied files:
```json
{
"verify_incoming": true,
"verify_outgoing": true,
"verify_server_hostname": true,
"ca_file": "consul-agent-ca.pem",
"cert_file": "dc1-server-consul-0.pem",
"key_file": "dc1-server-consul-0-key.pem",
"ports": {
"http": -1,
"https": 8501
}
}
```
This configuration disables the HTTP port to make sure there is only encryted
communication. Existing clients that are not yet prepared to talk HTTPS won't be
able to connect afterwards. This also affects builtin tooling like `consul members` and the UI. The next chapters will demonstrate how to setup secure
access.
After a Consul agent restart, your servers should be only talking TLS.
### Step 2: Setup Consul clients with certificates
Now copy the following files to your Consul clients:
- `consul-agent-ca.pem`: CA public certificate.
- `dc1-client-consul-0.pem`: Consul client node public certificate.
- `dc1-client-consul-0-key.pem`: Consul client node private key.
Here is an example agent TLS configuration for Consul agents which mentions the
copied files:
```json
{
"verify_incoming": true,
"verify_outgoing": true,
"verify_server_hostname": true,
"ca_file": "consul-agent-ca.pem",
"cert_file": "dc1-client-consul-0.pem",
"key_file": "dc1-client-consul-0-key.pem",
"ports": {
"http": -1,
"https": 8501
}
}
```
This configuration disables the HTTP port to make sure there is only encryted
communication. Existing clients that are not yet prepared to talk HTTPS won't be
able to connect afterwards. This also affects builtin tooling like `consul members` and the UI. The next chapters will demonstrate how to setup secure
access.
After a Consul agent restart, your agents should be only talking TLS.
## Configuring the Consul CLI for HTTPS
If your cluster is configured to only communicate via HTTPS, you will need to
create additional certificates in order to be able to continue to access the API
and the UI:
```shell
$ consul tls cert create -cli
==> Using consul-agent-ca.pem and consul-agent-ca-key.pem
==> Saved dc1-cli-consul-0.pem
==> Saved dc1-cli-consul-0-key.pem
```
If you are trying to get members of you cluster, the CLI will return an error:
```shell
$ consul members
Error retrieving members:
Get http://127.0.0.1:8500/v1/agent/members?segment=_all:
dial tcp 127.0.0.1:8500: connect: connection refused
$ consul members -http-addr="https://localhost:8501"
Error retrieving members:
Get https://localhost:8501/v1/agent/members?segment=_all:
x509: certificate signed by unknown authority
```
But it will work again if you provide the certificates you provided:
```shell
$ consul members -ca-file=consul-agent-ca.pem -client-cert=dc1-cli-consul-0.pem \
-client-key=dc1-cli-consul-0-key.pem -http-addr="https://localhost:8501"
Node Address Status Type Build Protocol DC Segment
...
```
This process can be cumbersome to type each time, so the Consul CLI also
searches environment variables for default values. Set the following
environment variables in your shell:
```shell
$ export CONSUL_HTTP_ADDR=https://localhost:8501
$ export CONSUL_CACERT=consul-agent-ca.pem
$ export CONSUL_CLIENT_CERT=dc1-cli-consul-0.pem
$ export CONSUL_CLIENT_KEY=dc1-cli-consul-0-key.pem
```
- `CONSUL_HTTP_ADDR` is the URL of the Consul agent and sets the default for
`-http-addr`.
- `CONSUL_CACERT` is the location of your CA certificate and sets the default
for `-ca-file`.
- `CONSUL_CLIENT_CERT` is the location of your CLI certificate and sets the
default for `-client-cert`.
- `CONSUL_CLIENT_KEY` is the location of your CLI key and sets the default for
`-client-key`.
After these environment variables are correctly configured, the CLI will
respond as expected.
### Note on SANs for Server and Client Certificates
Using `localhost` and `127.0.0.1` as `Subject Alternative Names` in server
and client certificates allows tools like `curl` to be able to communicate with
Consul's HTTPS API when run on the same host. Other SANs may be added during
server/client certificates creation with `-additional-dnsname` or
`-additional-ipaddress`to allow remote HTTPS requests from other hosts.
## Configuring the Consul UI for HTTPS
If your servers and clients are configured now like above, you won't be able to
access the builtin UI anymore. We recommend that you pick one (or two for
availability) Consul agent you want to run the UI on and follow the instructions
to get the UI up and running again.
### Step 1: Which interface to bind to?
Depending on your setup you might need to change to which interface you are
binding because thats `127.0.0.1` by default for the UI. Either via the
[`addresses.https`](/docs/agent/options#https) or
[client_addr](/docs/agent/options#client_addr) option which also impacts
the DNS server. The Consul UI is unproteced which means you need to put some
auth in front of it if you want to make it publicly available!
Binding to `0.0.0.0` should work:
```json
{
"ui": true,
"client_addr": "0.0.0.0",
"enable_script_checks": false,
"disable_remote_exec": true
}
```
~> Since your Consul agent is now available to the network, please make sure
that [`enable_script_checks`](/docs/agent/options#_enable_script_checks) is
set to `false` and
[`disable_remote_exec`](https://www.consul.io/docs/agent/options.html#disable_remote_exec)
is set to `true`.
### Step 2: verify_incoming_rpc
Your Consul agent will deny the connection straight away because
`verify_incoming` is enabled.
> If set to true, Consul requires that all incoming connections make use of TLS
> and that the client provides a certificate signed by a Certificate Authority
> from the ca_file or ca_path. This applies to both server RPC and to the HTTPS
> API.
Since the browser doesn't present a certificate signed by our CA, you cannot
access the UI. If you `curl` your HTTPS UI the following happens:
```shell
$ curl https://localhost:8501/ui/ -k -I
curl: (35) error:14094412:SSL routines:SSL3_READ_BYTES:sslv3 alert bad certificate
```
This is the Consul HTTPS server denying your connection because you are not
presenting a client certificate signed by your Consul CA. There is a combination
of options however that allows us to keep using `verify_incoming` for RPC, but
not for HTTPS:
```json
{
"verify_incoming": false,
"verify_incoming_rpc": true
}
```
~> This is the only time we are changing the value of the existing option
`verify_incoming` to false. Make sure to only change it on the agent running the
UI!
With the new configuration, it should work:
```shell
$ curl https://localhost:8501/ui/ -k -I
HTTP/2 200
...
```
### Step 3: Subject Alternative Name
This step will take care of setting up the domain you want to use to access the
Consul UI. Unless you only need to access the UI over localhost or 127.0.0.1 you
will need to go complete this step.
```shell
$ curl https://consul.example.com:8501/ui/ \
--resolve 'consul.example.com:8501:127.0.0.1' \
--cacert consul-agent-ca.pem
curl: (51) SSL: no alternative certificate subject name matches target host name 'consul.example.com'
...
```
The above command simulates a request a browser is making when you are trying to
use the domain `consul.example.com` to access your UI. The problem this time is
that your domain is not in `Subject Alternative Name` of the Certificate. We can
fix that by creating a certificate that has our domain:
```shell
$ consul tls cert create -server -additional-dnsname consul.example.com
...
```
And if you put your new cert into the configuration of the agent you picked to
serve the UI and restart Consul, it works now:
```shell
$ curl https://consul.example.com:8501/ui/ \
--resolve 'consul.example.com:8501:127.0.0.1' \
--cacert consul-agent-ca.pem -I
HTTP/2 200
...
```
### Step 4: Trust the Consul CA
So far we have provided curl with our CA so that it can verify the connection,
but if we stop doing that it will complain and so will our browser if you visit
your UI on https://consul.example.com:
```shell
$ curl https://consul.example.com:8501/ui/ \
--resolve 'consul.example.com:8501:127.0.0.1'
curl: (60) SSL certificate problem: unable to get local issuer certificate
...
```
You can fix that by trusting your Consul CA (`consul-agent-ca.pem`) on your machine,
please use Google to find out how to do that on your OS.
```shell
$ curl https://consul.example.com:8501/ui/ \
--resolve 'consul.example.com:8501:127.0.0.1' -I
HTTP/2 200
...
```
## Summary
When you have completed this guide, your Consul cluster will have TLS enabled
and will encrypt all RPC and HTTP traffic (assuming you disabled the HTTP port).
The other pre-requisites for a secure Consul deployment are:
- [Enable gossip encryption](/docs/agent/encryption#gossip-encryption)
- [Configure ACLs][acl] with default deny
[letsencrypt]: https://letsencrypt.org/
[vault]: https://www.vaultproject.io/
[vault-pki]: https://www.vaultproject.io/docs/secrets/pki
[guide]: /docs/agent/encryption.html#configuring-tls-on-an-existing-cluster
[acl]: /docs/guides/acl.html

View File

@ -1,182 +0,0 @@
---
layout: docs
page_title: Multiple Datacenters - Basic Federation with the WAN Gossip Pool
description: >-
One of the key features of Consul is its support for multiple datacenters. The
architecture of Consul is designed to promote low coupling of datacenters so
that connectivity issues or failure of any datacenter does not impact the
availability of Consul in other datacenters. This means each datacenter runs
independently, each having a dedicated group of servers and a private LAN
gossip pool.
---
# Multiple Datacenters: Basic Federation with the WAN Gossip
One of the key features of Consul is its support for multiple datacenters.
The [architecture](/docs/internals/architecture) of Consul is designed to
promote a low coupling of datacenters so that connectivity issues or
failure of any datacenter does not impact the availability of Consul in other
datacenters. This means each datacenter runs independently, each having a dedicated
group of servers and a private LAN [gossip pool](/docs/internals/gossip).
## The WAN Gossip Pool
This guide covers the basic form of federating Consul clusters using a single
WAN gossip pool, interconnecting all Consul servers.
[Consul Enterprise](https://www.hashicorp.com/products/consul/) version 0.8.0 added support
for an advanced multiple datacenter capability. Please see the
[Advanced Federation Guide](/docs/guides/advanced-federation) for more details.
## Setup Two Datacenters
To get started, follow the [
Deployment guide](https://learn.hashicorp.com/consul/advanced/day-1-operations/deployment-guide/) to
start each datacenter. After bootstrapping, we should have two datacenters now which
we can refer to as `dc1` and `dc2`. Note that datacenter names are opaque to Consul;
they are simply labels that help human operators reason about the Consul clusters.
To query the known WAN nodes, we use the [`members`](/docs/commands/members)
command with the `-wan` parameter on either datacenter.
```shell
$ consul members -wan
```
This will provide a list of all known members in the WAN gossip pool. In
this case, we have not connected the servers so there will be no output.
`consul members -wan` should
only contain server nodes. Client nodes send requests to a datacenter-local server,
so they do not participate in WAN gossip. Client requests are forwarded by local
servers to a server in the target datacenter as necessary.
## Join the Servers
The next step is to ensure that all the server nodes join the WAN gossip pool (include all the servers in all the datacenters).
```shell
$ consul join -wan <server 1> <server 2> ...
```
The [`join`](/docs/commands/join) command is used with the `-wan` flag to indicate
we are attempting to join a server in the WAN gossip pool. As with LAN gossip, you only
need to join a single existing member, and the gossip protocol will be used to exchange
information about all known members. For the initial setup, however, each server
will only know about itself and must be added to the cluster. Consul 0.8.0 added WAN join
flooding, so if one Consul server in a datacenter joins the WAN, it will automatically
join the other servers in its local datacenter that it knows about via the LAN.
### Persist Join with Retry Join
In order to persist the `join` information, the following can be added to each server's configuration file, in both datacenters. For example, in `dc1` server nodes.
```json
"retry_join_wan":[
"dc2-server-1",
"dc2-server-2"
],
```
## Verify Multi-DC Configuration
Once the join is complete, the [`members`](/docs/commands/members) command can be
used to verify that all server nodes gossiping over WAN.
```shell
$ consul members -wan
Node Address Status Type Build Protocol DC Segment
dc1-server-1 127.0.0.1:8701 alive server 1.4.3 2 dc1 <all>
dc2-server-1 127.0.0.1:8702 alive server 1.4.3 2 dc2 <all>
```
We can also verify that both datacenters are known using the
[HTTP Catalog API](/api/catalog#catalog_datacenters):
```shell
$ curl http://localhost:8500/v1/catalog/datacenters
["dc1", "dc2"]
```
As a simple test, you can try to query the nodes in each datacenter:
```shell
$ curl http://localhost:8500/v1/catalog/nodes?dc=dc1
{
"ID": "ee8b5f7b-9cc1-a382-978c-5ce4b1219a55",
"Node": "dc1-server-1",
"Address": "127.0.0.1",
"Datacenter": "dc1",
"TaggedAddresses": {
"lan": "127.0.0.1",
"wan": "127.0.0.1"
},
"Meta": {
"consul-network-segment": ""
},
"CreateIndex": 12,
"ModifyIndex": 14
}
```
```shell
$ curl http://localhost:8500/v1/catalog/nodes?dc=dc2
{
"ID": "ee8b5f7b-9cc1-a382-978c-5ce4b1219a55",
"Node": "dc2-server-1",
"Address": "127.0.0.1",
"Datacenter": "dc1",
"TaggedAddresses": {
"lan": "127.0.0.1",
"wan": "127.0.0.1"
},
"Meta": {
"consul-network-segment": ""
},
"CreateIndex": 11,
"ModifyIndex": 16
}
```
## Network Configuration
There are a few networking requirements that must be satisfied for this to
work. Of course, all server nodes must be able to talk to each other. Otherwise,
the gossip protocol as well as RPC forwarding will not work. If service discovery
is to be used across datacenters, the network must be able to route traffic
between IP addresses across regions as well. Usually, this means that all datacenters
must be connected using a VPN or other tunneling mechanism. Consul does not handle
VPN or NAT traversal for you.
Note that for RPC forwarding to work the bind address must be accessible from remote nodes.
Configuring `serf_wan`, `advertise_wan_addr` and `translate_wan_addrs` can lead to a
situation where `consul members -wan` lists remote nodes but RPC operations fail with one
of the following errors:
- `No path to datacenter`
- `rpc error getting client: failed to get conn: dial tcp <LOCAL_ADDR>:0-><REMOTE_ADDR>:<REMOTE_RPC_PORT>: i/o timeout`
The most likely cause of these errors is that `bind_addr` is set to a private address preventing
the RPC server from accepting connections across the WAN. Setting `bind_addr` to a public
address (or one that can be routed across the WAN) will resolve this issue. Be aware that
exposing the RPC server on a public port should only be done **after** firewall rules have
been established.
The [`translate_wan_addrs`](/docs/agent/options#translate_wan_addrs) configuration
provides a basic address rewriting capability.
## Data Replication
In general, data is not replicated between different Consul datacenters. When a
request is made for a resource in another datacenter, the local Consul servers forward
an RPC request to the remote Consul servers for that resource and return the results.
If the remote datacenter is not available, then those resources will also not be
available, but that won't otherwise affect the local datacenter. There are some special
situations where a limited subset of data can be replicated, such as with Consul's built-in
[ACL replication](/docs/guides/acl#outages-and-acl-replication) capability, or
external tools like [consul-replicate](https://github.com/hashicorp/consul-replicate/).
## Summary
In this guide you setup WAN gossip across two datacenters to create
basic federation. You also used the Consul HTTP API to ensure the
datacenters were properly configured.

View File

@ -1,279 +0,0 @@
---
layout: docs
page_title: Consul Deployment Guide
description: |-
This deployment guide covers the steps required to install and
configure a single HashiCorp Consul cluster as defined in the
Consul Reference Architecture.
ea_version: 1.4
---
# Consul Deployment Guide
This deployment guide covers the steps required to install and configure a single HashiCorp Consul cluster as defined in the [Consul Reference Architecture](/docs/guides/deployment).
These instructions are for installing and configuring Consul on Linux hosts running the systemd system and service manager.
## Reference Material
This deployment guide is designed to work in combination with the [Consul Reference Architecture](/docs/guides/deployment). Although not a strict requirement to follow the Consul Reference Architecture, please ensure you are familiar with the overall architecture design; for example installing Consul server agents on multiple physical or virtual (with correct anti-affinity) hosts for high-availability.
## Overview
To provide a highly-available single cluster architecture, we recommend Consul server agents be deployed to more than one host, as shown in the [Consul Reference Architecture](/docs/guides/deployment).
![Reference Diagram](/img/consul-arch-single.png 'Reference Diagram')
These setup steps should be completed on all Consul hosts.
- [Download Consul](#download-consul)
- [Install Consul](#install-consul)
- [Configure systemd](#configure-systemd)
- Configure Consul [(server)](#configure-consul-server-) or [(client)](#configure-consul-client-)
- [Start Consul](#start-consul)
## Download Consul
Precompiled Consul binaries are available for download at [https://releases.hashicorp.com/consul/](https://releases.hashicorp.com/consul/) and Consul Enterprise binaries are available for download by following the instructions made available to HashiCorp Consul customers.
You should perform checksum verification of the zip packages using the SHA256SUMS and SHA256SUMS.sig files available for the specific release version. HashiCorp provides [a guide on checksum verification](https://www.hashicorp.com/security.html) for precompiled binaries.
```text
CONSUL_VERSION="x.x.x"
curl --silent --remote-name https://releases.hashicorp.com/consul/${CONSUL_VERSION}/consul_${CONSUL_VERSION}_linux_amd64.zip
curl --silent --remote-name https://releases.hashicorp.com/consul/${CONSUL_VERSION}/consul_${CONSUL_VERSION}_SHA256SUMS
curl --silent --remote-name https://releases.hashicorp.com/consul/${CONSUL_VERSION}/consul_${CONSUL_VERSION}_SHA256SUMS.sig
```
## Install Consul
Unzip the downloaded package and move the `consul` binary to `/usr/local/bin/`. Check `consul` is available on the system path.
```text
unzip consul_${CONSUL_VERSION}_linux_amd64.zip
sudo chown root:root consul
sudo mv consul /usr/local/bin/
consul --version
```
The `consul` command features opt-in autocompletion for flags, subcommands, and arguments (where supported). Enable autocompletion.
```text
consul -autocomplete-install
complete -C /usr/local/bin/consul consul
```
Create a unique, non-privileged system user to run Consul and create its data directory.
```text
sudo useradd --system --home /etc/consul.d --shell /bin/false consul
sudo mkdir --parents /opt/consul
sudo chown --recursive consul:consul /opt/consul
```
## Configure systemd
Systemd uses [documented sane defaults](https://www.freedesktop.org/software/systemd/man/systemd.directives.html) so only non-default values must be set in the configuration file.
Create a Consul service file at /etc/systemd/system/consul.service.
```text
sudo touch /etc/systemd/system/consul.service
```
Add this configuration to the Consul service file:
```text
[Unit]
Description="HashiCorp Consul - A service mesh solution"
Documentation=https://www.consul.io/
Requires=network-online.target
After=network-online.target
ConditionFileNotEmpty=/etc/consul.d/consul.hcl
[Service]
Type=notify
User=consul
Group=consul
ExecStart=/usr/local/bin/consul agent -config-dir=/etc/consul.d/
ExecReload=/usr/local/bin/consul reload
KillMode=process
Restart=on-failure
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
```
The following parameters are set for the `[Unit]` stanza:
- [`Description`](https://www.freedesktop.org/software/systemd/man/systemd.unit.html#Description=) - Free-form string describing the consul service
- [`Documentation`](https://www.freedesktop.org/software/systemd/man/systemd.unit.html#Documentation=) - Link to the consul documentation
- [`Requires`](https://www.freedesktop.org/software/systemd/man/systemd.unit.html#Requires=) - Configure a requirement dependency on the network service
- [`After`](https://www.freedesktop.org/software/systemd/man/systemd.unit.html#Before=) - Configure an ordering dependency on the network service being started before the consul service
- [`ConditionFileNotEmpty`](https://www.freedesktop.org/software/systemd/man/systemd.unit.html#ConditionArchitecture=) - Check for a non-zero sized configuration file before consul is started
The following parameters are set for the `[Service]` stanza:
- [`User`, `Group`](https://www.freedesktop.org/software/systemd/man/systemd.exec.html#User=) - Run consul as the consul user
- [`ExecStart`](https://www.freedesktop.org/software/systemd/man/systemd.service.html#ExecStart=) - Start consul with the `agent` argument and path to the configuration file
- [`ExecReload`](https://www.freedesktop.org/software/systemd/man/systemd.service.html#ExecReload=) - Send consul a reload signal to trigger a configuration reload in consul
- [`KillMode`](https://www.freedesktop.org/software/systemd/man/systemd.kill.html#KillMode=) - Treat consul as a single process
- [`Restart`](https://www.freedesktop.org/software/systemd/man/systemd.service.html#RestartSec=) - Restart consul unless it returned a clean exit code
- [`LimitNOFILE`](https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Process%20Properties) - Set an increased Limit for File Descriptors
The following parameters are set for the `[Install]` stanza:
- [`WantedBy`](https://www.freedesktop.org/software/systemd/man/systemd.unit.html#WantedBy=) - Creates a weak dependency on consul being started by the multi-user run level
## Configure Consul (server)
Consul uses [documented sane defaults](/docs/agent/options) so only non-default values must be set in the configuration file. Configuration can be read from multiple files and is loaded in lexical order. See the [full description](/docs/agent/options) for more information about configuration loading and merge semantics.
Consul server agents typically require a superset of configuration required by Consul client agents. We will specify common configuration used by all Consul agents in `consul.hcl` and server specific configuration in `server.hcl`.
### General configuration
Create a configuration file at `/etc/consul.d/consul.hcl`:
```text
sudo mkdir --parents /etc/consul.d
sudo touch /etc/consul.d/consul.hcl
sudo chown --recursive consul:consul /etc/consul.d
sudo chmod 640 /etc/consul.d/consul.hcl
```
Add this configuration to the `consul.hcl` configuration file:
~> **NOTE** Replace the `datacenter` parameter value with the identifier you will use for the datacenter this Consul cluster is deployed in. Replace the `encrypt` parameter value with the output from running `consul keygen` on any host with the `consul` binary installed.
```hcl
datacenter = "dc1"
data_dir = "/opt/consul"
encrypt = "pUqJrVyVRj5jsiYEkM/tFQYfWyJIv4s3XkvDwy7Cu5s="
```
- [`datacenter`](/docs/agent/options#_datacenter) - The datacenter in which the agent is running.
- [`data_dir`](/docs/agent/options#_data_dir) - The data directory for the agent to store state.
- [`encrypt`](/docs/agent/options#_encrypt) - Specifies the secret key to use for encryption of Consul network traffic.
### ACL configuration
The [ACL](/docs/guides/acl) guide provides instructions on configuring and enabling ACLs.
### Cluster auto-join
The `retry_join` parameter allows you to configure all Consul agents to automatically form a cluster using a common Consul server accessed via DNS address, IP address or using Cloud Auto-join. This removes the need to manually join the Consul cluster nodes together.
Add the retry_join parameter to the `consul.hcl` configuration file:
~> **NOTE** Replace the `retry_join` parameter value with the correct DNS address, IP address or [cloud auto-join configuration](/docs/agent/cloud-auto-join) for your environment.
```hcl
retry_join = ["172.16.0.11"]
```
- [`retry_join`](/docs/agent/options#retry-join) - Address of another agent to join upon starting up.
### Performance stanza
The [`performance`](/docs/agent/options#performance) stanza allows tuning the performance of different subsystems in Consul.
Add the performance configuration to the `consul.hcl` configuration file:
```hcl
performance {
raft_multiplier = 1
}
```
- [`raft_multiplier`](/docs/agent/options#raft_multiplier) - An integer multiplier used by Consul servers to scale key Raft timing parameters. Setting this to a value of 1 will configure Raft to its highest-performance mode, equivalent to the default timing of Consul prior to 0.7, and is recommended for production Consul servers.
For more information on Raft tuning and the `raft_multiplier` setting, see the [server performance](/docs/install/performance) documentation.
### Telemetry stanza
The [`telemetry`](/docs/agent/options#telemetry) stanza specifies various configurations for Consul to publish metrics to upstream systems.
If you decide to configure Consul to publish telemtery data, you should review the [telemetry configuration section](/docs/agent/options#telemetry) of our documentation.
### TLS configuration
The [Creating Certificates](/docs/guides/creating-certificates) guide provides instructions on configuring and enabling TLS.
### Server configuration
Create a configuration file at `/etc/consul.d/server.hcl`:
```text
sudo mkdir --parents /etc/consul.d
sudo touch /etc/consul.d/server.hcl
sudo chown --recursive consul:consul /etc/consul.d
sudo chmod 640 /etc/consul.d/server.hcl
```
Add this configuration to the `server.hcl` configuration file:
~> **NOTE** Replace the `bootstrap_expect` value with the number of Consul servers you will use; three or five [is recommended](/docs/internals/consensus#deployment-table).
```hcl
server = true
bootstrap_expect = 3
```
- [`server`](/docs/agent/options#_server) - This flag is used to control if an agent is in server or client mode.
- [`bootstrap-expect`](/docs/agent/options#_bootstrap_expect) - This flag provides the number of expected servers in the datacenter. Either this value should not be provided or the value must agree with other servers in the cluster.
### Consul UI
Consul features a web-based user interface, allowing you to easily view all services, nodes, intentions and more using a graphical user interface, rather than the CLI or API.
~> **NOTE** You should consider running the Consul UI on select Consul hosts rather than all hosts.
Optionally, add the UI configuration to the `server.hcl` configuration file to enable the Consul UI:
```hcl
ui = true
```
## Configure Consul (client)
Consul client agents typically require a subset of configuration required by Consul server agents. All Consul clients can use the `consul.hcl` file created when [configuring the Consul servers](#general-configuration). If you have added host-specific configuration such as identifiers, you will need to set these individually.
## Start Consul
Enable and start Consul using the systemctl command responsible for controlling systemd managed services. Check the status of the consul service using systemctl.
```text
sudo systemctl enable consul
sudo systemctl start consul
sudo systemctl status consul
```
## Backups
Creating server backups is an important step in production deployments. Backups provide a mechanism for the server to recover from an outage (network loss, operator error, or a corrupted data directory). All agents write to the `-data-dir` before commit. This directory persists the local agents state and &mdash; in the case of servers &mdash; it also holds the Raft information.
Consul provides the [snapshot](/docs/commands/snapshot) command which can be run using the CLI command or the API. The `snapshot` command saves the point-in-time snapshot of the state of the Consul servers which includes KV entries, the service catalog, prepared queries, sessions, and ACL.
With [Consul Enterprise](/docs/commands/snapshot/agent), the `snapshot agent` command runs periodically and writes to local or remote storage (such as Amazon S3).
By default, all snapshots are taken using `consistent` mode where requests are forwarded to the leader which verifies that it is still in power before taking the snapshot. Snapshots will not be saved if the clusted is degraded or if no leader is available. To reduce the burden on the leader, it is possible to [run the snapshot](/docs/commands/snapshot/save) on any non-leader server using `stale` consistency mode:
```text
consul snapshot save -stale backup.snap
```
This spreads the load across nodes at the possible expense of losing full consistency guarantees. Typically this means that a very small number of recent writes may not be included. The omitted writes are typically limited to data written in the last `100ms` or less from the recovery point. This is usually suitable for disaster recovery. However, the system cant guarantee how stale this may be if executed against a partitioned server.
## Next Steps
- Read [Monitoring Consul with Telegraf](/docs/guides/monitoring-telegraf)
for an example guide to monitoring Consul for improved operational visibility.
- Read [Outage Recovery](/docs/guides/outage) to learn the steps required
for recovery from a Consul outage due to a majority of server nodes in a
datacenter being lost.
- Read [Server Performance](/docs/install/performance) to learn about
additional configuration that benefits production deployments.

View File

@ -1,121 +0,0 @@
---
layout: docs
page_title: Consul Reference Architecture
description: |-
This document provides recommended practices and a reference
architecture for HashiCorp Consul production deployments.
ea_version: 1.4
---
# Consul Reference Architecture
As applications are migrated to dynamically provisioned infrastructure, scaling services and managing the communications between them becomes challenging. Consuls service discovery capabilities provide the connectivity between dynamic applications. Consul also monitors the health of each node and its applications to ensure that only healthy service instances are discovered. Consuls distributed runtime configuration store allows updates across global infrastructure.
This document provides recommended practices and a reference architecture, including system requirements, datacenter design, networking, and performance optimizations for Consul production deployments.
## Infrastructure Requirements
### Consul Servers
Consul server agents are responsible for maintaining the cluster state, responding to RPC queries (read operations), and for processing all write operations. Given that Consul server agents do most of the heavy lifting, server sizing is critical for the overall performance efficiency and health of the Consul cluster.
The following table provides high-level server guidelines. Of particular
note is the strong recommendation to avoid non-fixed performance CPUs,
or "Burstable CPU".
| Type | CPU | Memory | Disk | Typical Cloud Instance Types |
| ----- | -------- | ------------ | ----- | ----------------------------------------- |
| Small | 2 core | 8-16 GB RAM | 50GB | **AWS**: m5.large, m5.xlarge |
| | | | | **Azure**: Standard_A4_v2, Standard_A8_v2 |
| | | | | **GCE**: n1-standard-8, n1-standard-16 |
| Large | 4-8 core | 32-64 GB RAM | 100GB | **AWS**: m5.2xlarge, m5.4xlarge |
| | | | | **Azure**: Standard_D4_v3, Standard_D5_v3 |
| | | | | **GCE**: n1-standard-32, n1-standard-64 |
#### Hardware Sizing Considerations
- The small size would be appropriate for most initial production
deployments, or for development/testing environments.
- The large size is for production environments where there is a
consistently high workload.
~> **NOTE** For large workloads, ensure that the disks support a high number of IOPS to keep up with the rapid Raft log update rate.
For more information on server requirements, review the [server performance](/docs/install/performance) documentation.
## Infrastructure Diagram
![Reference Diagram](/img/consul-arch.png 'Reference Diagram')
## Datacenter Design
A Consul cluster (typically three or five servers plus client agents) may be deployed in a single physical datacenter or it may span multiple datacenters. For a large cluster with high runtime reads and writes, deploying servers in the same physical location improves performance. In cloud environments, a single datacenter may be deployed across multiple availability zones i.e. each server in a separate availability zone on a single host. Consul also supports multi-datacenter deployments via separate clusters joined by WAN links. In some cases, one may also deploy two or more Consul clusters in the same LAN environment.
### Single Datacenter
A single Consul cluster is recommended for applications deployed in the same datacenter. Consul supports traditional three-tier applications as well as microservices.
Typically, there must be three or five servers to balance between availability and performance. These servers together run the Raft-driven consistent state store for catalog, session, prepared query, ACL, and KV updates.
The recommended maximum cluster size for a single datacenter is 5,000 nodes. For a write-heavy and/or a read-heavy cluster, the maximum number of nodes may need to be reduced further, considering the impact of the number and the size of KV pairs and the number of watches. The time taken for gossip to converge increases as more client machines are added. Similarly, the time taken by the new server to join an existing multi-thousand node cluster with a large KV store and update rate may increase as they are replicated to the new servers log.
-> **TIP** For write-heavy clusters, consider scaling vertically with larger machine instances and lower latency storage.
One must take care to use service tags in a way that assists with the kinds of queries that will be run against the cluster. If two services (e.g. blue and green) are running on the same cluster, appropriate service tags must be used to identify between them. If a query is made without tags, nodes running both blue and green services may show up in the results of the query.
In cases where a full mesh among all agents cannot be established due to network segmentation, Consuls own [network segments](/docs/enterprise/network-segments) can be used. Network segments is a Consul Enterprise feature that allows the creation of multiple tenants which share Raft servers in the same cluster. Each tenant has its own gossip pool and doesnt communicate with the agents outside this pool. The KV store, however, is shared between all tenants. If Consul network segments cannot be used, isolation between agents can be accomplished by creating discrete [Consul datacenters](/docs/guides/datacenters).
### Multiple Datacenters
Consul clusters in different datacenters running the same service can be joined by WAN links. The clusters operate independently and only communicate over the WAN on port `8302`. Unless explicitly configured via CLI or API, the Consul server will only return results from the local datacenter. Consul does not replicate data between multiple datacenters. The [consul-replicate](https://github.com/hashicorp/consul-replicate) tool can be used to replicate the KV data periodically.
-> A good practice is to enable TLS server name checking to avoid accidental cross-joining of agents.
Advanced federation can be achieved with the [network areas](/api/operator/area) feature in Consul Enterprise.
A typical use case is where datacenter1 (dc1) hosts share services like LDAP (or ACL datacenter) which are leveraged by all other datacenters. However, due to compliance issues, servers in dc2 must not connect with servers in dc3. This cannot be accomplished with the basic WAN federation. Basic federation requires that all the servers in dc1, dc2 and dc3 are connected in a full mesh and opens both gossip (`8302 tcp/udp`) and RPC (`8300`) ports for communication.
Network areas allows peering between datacenters to make the services discoverable over WAN. With network areas, servers in dc1 can communicate with those in dc2 and dc3. However, no connectivity needs to be established between dc2 and dc3 which meets the compliance requirement of the organization in this use case. Servers that are part of the network area communicate over RPC only. This removes the overhead of sharing and maintaining the symmetric key used by the gossip protocol across datacenters. It also reduces the attack surface at the gossip ports since they no longer need to be opened in security gateways or firewalls.
#### Prepared Queries
Consuls [prepared queries](/api/query) allow clients to do a datacenter failover for service discovery. For example, if a service `payment` in the local datacenter dc1 goes down, a prepared query lets users define a geographic fallback order to the nearest datacenter to check for healthy instances of the same service.
~> **NOTE** Consul clusters must be WAN linked for a prepared query to work across datacenters.
Prepared queries, by default, resolve the query in the local datacenter first. Querying KV store features is not supported by the prepared query. Prepared queries work with ACL. Prepared query config/templates are maintained consistently in Raft and are executed on the servers.
#### Connect
Consul [Connect](/docs/guides/connect-production) supports multi-datacenter connections and replicates [intentions](/docs/connect/intentions). This allows WAN federated DCs to provide connections from source and destination proxies in any DC.
## Network Connectivity
LAN gossip occurs between all agents in a single datacenter with each agent sending a periodic probe to random agents from its member list. Agents run in either client or server mode, both participate in the gossip. The initial probe is sent over UDP every second. If a node fails to acknowledge within `200ms`, the agent pings over TCP. If the TCP probe fails (10 second timeout), it asks configurable number of random nodes to probe the same node (also known as an indirect probe). If there is no response from the peers regarding the status of the node, that agent is marked as down.
The agent's status directly affects the service discovery results. If an agent is down, the services it is monitoring will also be marked as down.
In addition, the agent also periodically performs a full state sync over TCP which gossips each agents understanding of the member list around it (node names, IP addresses, and health status). These operations are expensive relative to the standard gossip protocol mentioned above and are synced at a rate determined by cluster size to keep overhead low. It's typically between 30 seconds and 5 minutes. For more details, refer to [Serf Gossip docs](https://www.serf.io/docs/internals/gossip.html)
In a larger network that spans L3 segments, traffic typically traverses through a firewall and/or a router. ACL or firewall rules must be updated to allow the following ports:
| Name | Port | Flag | Description |
| ------------- | ---- | ------------------------------------------- | ----------------------------------------------------------------------------- |
| Server RPC | 8300 | | Used by servers to handle incoming requests from other agents. TCP only. |
| Serf LAN | 8301 | | Used to handle gossip in the LAN. Required by all agents. TCP and UDP. |
| Serf WAN | 8302 | `-1` to disable (available in Consul 1.0.7) | Used by servers to gossip over the LAN and WAN to other servers. TCP and UDP. |
| HTTP API | 8500 | `-1` to disable | Used by clients to talk to the HTTP API. TCP only. |
| DNS Interface | 8600 | `-1` to disable | Used to resolve DNS queries. TCP and UDP. |
-> As mentioned in the [datacenter design section](#datacenter-design), network areas and network segments can be used to prevent opening up firewall ports between different subnets.
By default agents will only listen for HTTP and DNS traffic on the local interface.
## Next steps
- Read [Deployment Guide](/docs/guides/deployment-guide) to learn
the steps required to install and configure a single HashiCorp Consul cluster.
- Read [Server Performance](/docs/install/performance) to learn about
additional configuration that benefits production deployments.

View File

@ -1,181 +0,0 @@
---
layout: docs
page_title: DNS Caching
description: >-
One of the main interfaces to Consul is DNS. Using DNS is a simple way to
integrate Consul into an existing infrastructure without any high-touch
integration.
---
# DNS Caching
One of the main interfaces to Consul is DNS. Using DNS is a simple way to
integrate Consul into an existing infrastructure without any high-touch
integration.
By default, Consul serves all DNS results with a 0 TTL value. This prevents
any caching. The advantage is that each DNS lookup is always re-evaluated,
so the most timely information is served. However, this adds a latency hit
for each lookup and can potentially exhaust the query throughput of a cluster.
For this reason, Consul provides a number of tuning parameters that can
customize how DNS queries are handled.
In this guide, we will review important parameters for tuning
stale reads, negative response caching, and TTL. All of the DNS config
parameters must be set in set in the agent's configuration file.
<a name="stale"></a>
## Stale Reads
Stale reads can be used to reduce latency and increase the throughput
of DNS queries. The [settings](/docs/agent/options) used to control stale reads
are:
- [`dns_config.allow_stale`](/docs/agent/options#allow_stale) must be
set to true to enable stale reads.
- [`dns_config.max_stale`](/docs/agent/options#max_stale) limits how stale results
are allowed to be when querying DNS.
With these two settings you can allow or prevent stale reads. Below we will discuss
the advanatages and disadvatages of both.
### Allow Stale Reads
Since Consul 0.7.1, `allow_stale` is enabled by default and uses a `max_stale`
value that defaults to a near-indefinite threshold (10 years).
This allows DNS queries to continue to be served in the event
of a long outage with no leader. A new telemetry counter has also been added at
`consul.dns.stale_queries` to track when agents serve DNS queries that are stale
by more than 5 seconds.
```javascript
"dns_config" {
"allow_stale" = true
"max_stale" = "87600h"
}
```
~> NOTE: The above example is the default setting. You do not need to set it explicitly.
Doing a stale read allows any Consul server to
service a query, but non-leader nodes may return data that is
out-of-date. By allowing data to be slightly stale, we get horizontal
read scalability. Now any Consul server can service the request, so we
increase throughput by the number of servers in a cluster.
### Prevent Stale Reads
If you want to prevent stale reads or limit how stale they can be, you can set `allow_stale`
to false or use a lower value for `max_stale`. Doing the first will ensure that
all reads are serviced by a [single leader node](/docs/internals/consensus).
The reads will then be strongly consistent but will be limited by the throughput
of a single node.
```javascript
"dns_config" {
"allow_stale" = false
}
```
## Negative Response Caching
Some DNS clients cache negative responses - that is, Consul returning a "not
found" style response because a service exists but there are no healthy
endpoints. In practice, this could mean that the cached negative responses may
cause that service to appear "down" for longer than they are actually unavailable
when using DNS for service discovery.
### Configure SOA
In Consul 1.3.0 and newer, it is now possible to tune SOA
responses and modify the negative TTL cache for some resolvers. It can
be achieved using the [`soa.min_ttl`](/docs/agent/options#soa_min_ttl)
configuration within the [`soa`](/docs/agent/options#soa) configuration.
```javascript
"dns_config" {
"soa" {
"min_ttl" = "60s"
}
}
```
One common example is that Windows will default to caching negative responses
for 15 minutes. DNS forwarders may also cache negative responses, with the same
effect. To avoid this problem, check the negative response cache defaults for
your client operating system and any DNS forwarder on the path between the
client and Consul and set the cache values appropriately. In many cases
"appropriately" simply is turning negative response caching off to get the best
recovery time when a service becomes available again.
<a name="ttl"></a>
## TTL Values
TTL values can be set to allow DNS results to be cached downstream of Consul. Higher
TTL values reduce the number of lookups on the Consul servers and speed lookups for
clients, at the cost of increasingly stale results. By default, all TTLs are zero,
preventing any caching.
```javascript
{
"dns_config": {
"service_ttl" = "0s"
"node_ttl" = "0s"
}
}
```
### Enable Caching
To enable caching of node lookups (e.g. "foo.node.consul"), we can set the
[`dns_config.node_ttl`](/docs/agent/options#node_ttl) value. This can be set to
"10s" for example, and all node lookups will serve results with a 10 second TTL.
Service TTLs can be specified in a more granular fashion. You can set TTLs
per-service, with a wildcard TTL as the default. This is specified using the
[`dns_config.service_ttl`](/docs/agent/options#service_ttl) map. The "_"
is supported at the end of any prefix and a lower precedence than strict match,
so 'my-service-x' has precedence over 'my-service-_', when performing wildcard
match, the longest path is taken into account, thus 'my-service-_' TTL will
be used instead of 'my-_' or '_'. With the same rule, '_' is the default value
when nothing else matches. If no match is found the TTL defaults to 0.
For example, a [`dns_config`](/docs/agent/options#dns_config) that provides
a wildcard TTL and a specific TTL for a service might look like this:
```javascript
{
"dns_config": {
"service_ttl": {
"*": "5s",
"web": "30s",
"db*": "10s",
"db-master": "3s"
}
}
}
```
This sets all lookups to "web.service.consul" to use a 30 second TTL
while lookups to "api.service.consul" will use the 5 second TTL from the wildcard.
All lookups matching "db\*" would get a 10 seconds TTL except "db-master"
that would have a 3 seconds TTL.
### Prepared Queries
[Prepared Queries](/api/query) provide an additional
level of control over TTL. They allow for the TTL to be defined along with
the query, and they can be changed on the fly by updating the query definition.
If a TTL is not configured for a prepared query, then it will fall back to the
service-specific configuration defined in the Consul agent as described above,
and ultimately to 0 if no TTL is configured for the service in the Consul agent.
## Summary
In this guide we covered several of the parameters for tuning DNS queries. We reviewed
how to enable or disable stale reads and how to configure the amount of time when stale
reads are allowed. We also looked at the minimum TTL configuration options
for negative responses from services. Finally, we reviewed how to setup TTLs
for service lookups.

View File

@ -1,81 +0,0 @@
---
layout: docs
page_title: External Services
description: >-
Very few infrastructures are entirely self-contained. Most rely on a multitude
of external service providers. Consul supports this by allowing for the
definition of external services, services that are not provided by a local
node.
---
# Registering an External Service
Very few infrastructures are entirely self-contained. Most rely on a multitude
of external service providers. Consul supports this by allowing for the definition
of external services, services that are not provided by a local node. There's also a
companion project called [Consul ESM](https://github.com/hashicorp/consul-esm) which
is a daemon that functions as an external service monitor that can help run health
checks for external services.
Most services are registered in Consul through the use of a
[service definition](/docs/agent/services). However, this approach registers
the local node as the service provider. In the case of external services, we must
instead register the service with the catalog rather than as part of a standard
node service definition.
Once registered, the DNS interface will be able to return the appropriate A
records or CNAME records for the service. The service will also appear in standard
queries against the API. Consul must be configured with a list of
[recursors](/docs/agent/options#recursors) for it to be able to resolve
external service addresses.
Let us suppose we want to register a "search" service that is provided by
"www.google.com". We might accomplish that like so:
```text
$ curl -X PUT -d '{"Datacenter": "dc1", "Node": "google",
"Address": "www.google.com",
"Service": {"Service": "search", "Port": 80}}'
http://127.0.0.1:8500/v1/catalog/register
```
Add an upstream DNS server to the list of recursors to Consul's configuration. Example with Google's public DNS server:
```text
"recursors":["8.8.8.8"]
```
If we do a DNS lookup now, we can see the new search service:
```text
; <<>> DiG 9.8.3-P1 <<>> @127.0.0.1 -p 8600 search.service.consul.
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 13313
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;search.service.consul. IN A
;; ANSWER SECTION:
search.service.consul. 0 IN CNAME www.google.com.
www.google.com. 264 IN A 74.125.239.114
www.google.com. 264 IN A 74.125.239.115
www.google.com. 264 IN A 74.125.239.116
;; Query time: 41 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: Tue Feb 25 17:45:12 2014
;; MSG SIZE rcvd: 178
```
If at any time we want to deregister the service, we simply do:
```text
$ curl -X PUT -d '{"Datacenter": "dc1", "Node": "google"}' http://127.0.0.1:8500/v1/catalog/deregister
```
This will deregister the `google` node along with all services it provides.
For more information, please see the [HTTP Catalog API](/api/catalog).

View File

@ -1,354 +0,0 @@
---
layout: docs
page_title: Forwarding
description: >-
By default, DNS is served from port 53. On most operating systems, this
requires elevated privileges. Instead of running Consul with an administrative
or root account, it is possible to instead forward appropriate queries to
Consul, running on an unprivileged port, from another DNS server or port
redirect.
---
# Forwarding DNS
By default, DNS is served from port 53. On most operating systems, this
requires elevated privileges. Instead of running Consul with an administrative
or root account, it is possible to instead forward appropriate queries to Consul,
running on an unprivileged port, from another DNS server or port redirect.
In this guide, we will demonstrate forwarding from:
- [BIND](#bind-setup)
- [dnsmasq](#dnsmasq-setup)
- [Unbound](#unbound-setup)
- [systemd-resolved](#systemd-resolved-setup)
- [iptables](#iptables-setup)
- [macOS](#macos-setup)
After configuring forwarding, we will demonstrate how to test the configuration. Finally, we will also provide some troubleshooting
guidance.
~> Note, by default, Consul does not resolve DNS
records outside the `.consul.` zone unless the
[recursors](/docs/agent/options#recursors) configuration option
has been set. As an example of how this changes Consul's behavior,
suppose a Consul DNS reply includes a CNAME record pointing outside
the `.consul` TLD. The DNS reply will only include CNAME records by
default. By contrast, when `recursors` is set and the upstream resolver is
functioning correctly, Consul will try to resolve CNAMEs and include
any records (e.g. A, AAAA, PTR) for them in its DNS reply.
## BIND Setup
Note, in this example, BIND and Consul are running on the same machine.
First, you have to disable DNSSEC so that Consul and [BIND](https://www.isc.org/downloads/bind/) can communicate. Here is an example of such a configuration:
```text
options {
listen-on port 53 { 127.0.0.1; };
listen-on-v6 port 53 { ::1; };
directory "/var/named";
dump-file "/var/named/data/cache_dump.db";
statistics-file "/var/named/data/named_stats.txt";
memstatistics-file "/var/named/data/named_mem_stats.txt";
allow-query { localhost; };
recursion yes;
dnssec-enable no;
dnssec-validation no;
/* Path to ISC DLV key */
bindkeys-file "/etc/named.iscdlv.key";
managed-keys-directory "/var/named/dynamic";
};
include "/etc/named/consul.conf";
```
### Zone File
Then we set up a zone for our Consul managed records in `consul.conf`:
```text
zone "consul" IN {
type forward;
forward only;
forwarders { 127.0.0.1 port 8600; };
};
```
Here we assume Consul is running with default settings and is serving
DNS on port 8600.
## Dnsmasq Setup
[Dnsmasq](http://www.thekelleys.org.uk/dnsmasq/doc.html) is typically configured via a `dnsmasq.conf` or a series of files in
the `/etc/dnsmasq.d` directory. In Dnsmasq's configuration file
(e.g. `/etc/dnsmasq.d/10-consul`), add the following:
```text
# Enable forward lookup of the 'consul' domain:
server=/consul/127.0.0.1#8600
# Uncomment and modify as appropriate to enable reverse DNS lookups for
# common netblocks found in RFC 1918, 5735, and 6598:
#rev-server=0.0.0.0/8,127.0.0.1#8600
#rev-server=10.0.0.0/8,127.0.0.1#8600
#rev-server=100.64.0.0/10,127.0.0.1#8600
#rev-server=127.0.0.1/8,127.0.0.1#8600
#rev-server=169.254.0.0/16,127.0.0.1#8600
#rev-server=172.16.0.0/12,127.0.0.1#8600
#rev-server=192.168.0.0/16,127.0.0.1#8600
#rev-server=224.0.0.0/4,127.0.0.1#8600
#rev-server=240.0.0.0/4,127.0.0.1#8600
```
Once that configuration is created, restart the `dnsmasq` service.
Additional useful settings in `dnsmasq` to consider include (see
[`dnsmasq(8)`](http://www.thekelleys.org.uk/dnsmasq/docs/dnsmasq-man.html)
for additional details):
```
# Accept DNS queries only from hosts whose address is on a local subnet.
#local-service
# Don't poll /etc/resolv.conf for changes.
#no-poll
# Don't read /etc/resolv.conf. Get upstream servers only from the command
# line or the dnsmasq configuration file (see the "server" directive below).
#no-resolv
# Specify IP address(es) of other DNS servers for queries not handled
# directly by consul. There is normally one 'server' entry set for every
# 'nameserver' parameter found in '/etc/resolv.conf'. See dnsmasq(8)'s
# 'server' configuration option for details.
#server=1.2.3.4
#server=208.67.222.222
#server=8.8.8.8
# Set the size of dnsmasq's cache. The default is 150 names. Setting the
# cache size to zero disables caching.
#cache-size=65536
```
## Unbound Setup
[Unbound](https://www.unbound.net/) is typically configured via a `unbound.conf` or a series of files in
the `/etc/unbound/unbound.conf.d` directory. In an Unbound configuration file
(e.g. `/etc/unbound/unbound.conf.d/consul.conf`), add the following:
```text
#Allow insecure queries to local resolvers
server:
do-not-query-localhost: no
domain-insecure: "consul"
#Add consul as a stub-zone
stub-zone:
name: "consul"
stub-addr: 127.0.0.1@8600
```
You may have to add the following line to the bottom of your
`/etc/unbound/unbound.conf` file for the new configuration to be included:
```text
include: "/etc/unbound/unbound.conf.d/*.conf"
```
## systemd-resolved Setup
[`systemd-resolved`](https://www.freedesktop.org/wiki/Software/systemd/resolved/) is typically configured with `/etc/systemd/resolved.conf`.
To configure systemd-resolved to send queries for the consul domain to
Consul, configure resolved.conf to contain the following:
```
DNS=127.0.0.1
Domains=~consul
```
The main limitation with this configuration is that the DNS field
cannot contain ports. So for this to work either Consul must be
[configured to listen on port 53](https://www.consul.io/docs/agent/options.html#dns_port)
instead of 8600 or you can use iptables to map port 53 to 8600.
The following iptables commands are sufficient to do the port
mapping.
```
[root@localhost ~]# iptables -t nat -A OUTPUT -d localhost -p udp -m udp --dport 53 -j REDIRECT --to-ports 8600
[root@localhost ~]# iptables -t nat -A OUTPUT -d localhost -p tcp -m tcp --dport 53 -j REDIRECT --to-ports 8600
```
Binding to port 53 will usually require running either as a privileged user (or on Linux running with the
CAP_NET_BIND_SERVICE capability). If using the Consul docker image you will need to add the following to the
environment to allow Consul to use the port: `CONSUL_ALLOW_PRIVILEGED_PORTS=yes`
Note: With this setup, PTR record queries will still be sent out to the other configured resolvers in
addition to Consul. If you wish to restrict this behavior, your `resolved.conf` should be modified to
```
DNS=127.0.0.1
Domains=~consul ~0.10.in-addr.arpa
```
where the example corresponds to reverse lookups of addresses in the IP range `10.0.0.0/16`. Your
configuration should match your networks.
## iptables Setup
Note, for iptables, the rules must be set on the same host as the Consul
instance and relay hosts should not be on the same host or the redirects will
intercept the traffic.
On Linux systems that support it, incoming requests and requests to
the local host can use [`iptables`](http://www.netfilter.org/) to forward ports on the same machine
without a secondary service. Since Consul, by default, only resolves
the `.consul` TLD, it is especially important to use the `recursors`
option if you wish the `iptables` setup to resolve for other domains.
The recursors should not include the local host as the redirects would
just intercept the requests.
The iptables method is suited for situations where an external DNS
service is already running in your infrastructure and is used as the
recursor or if you want to use an existing DNS server as your query
endpoint and forward requests for the consul domain to the Consul
server. In both of those cases you may want to query the Consul server
but not need the overhead of a separate service on the Consul host.
```
[root@localhost ~]# iptables -t nat -A PREROUTING -p udp -m udp --dport 53 -j REDIRECT --to-ports 8600
[root@localhost ~]# iptables -t nat -A PREROUTING -p tcp -m tcp --dport 53 -j REDIRECT --to-ports 8600
[root@localhost ~]# iptables -t nat -A OUTPUT -d localhost -p udp -m udp --dport 53 -j REDIRECT --to-ports 8600
[root@localhost ~]# iptables -t nat -A OUTPUT -d localhost -p tcp -m tcp --dport 53 -j REDIRECT --to-ports 8600
```
## macOS Setup
On macOS systems, you can use the macOS system resolver to point all .consul requests to consul.
Just add a resolver entry in /etc/resolver/ to point at consul.
documentation for this feature is available via: `man5 resolver`.
To setup create a new file `/etc/resolver/consul` (you will need sudo/root access) and put in the file:
```
nameserver 127.0.0.1
port 8600
```
This is telling the macOS resolver daemon for all .consul TLD requests, ask 127.0.0.1 on port 8600.
## Testing
First, perform a DNS query against Consul directly to be sure that the record exists:
```text
[root@localhost ~]# dig @localhost -p 8600 primary.redis.service.dc-1.consul. A
; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.23.rc1.32.amzn1 <<>> @localhost primary.redis.service.dc-1.consul. A
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 11536
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;primary.redis.service.dc-1.consul. IN A
;; ANSWER SECTION:
primary.redis.service.dc-1.consul. 0 IN A 172.31.3.234
;; Query time: 4 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Wed Apr 9 17:36:12 2014
;; MSG SIZE rcvd: 76
```
Then run the same query against your BIND instance and make sure you get a
valid result:
```text
[root@localhost ~]# dig @localhost -p 53 primary.redis.service.dc-1.consul. A
; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.23.rc1.32.amzn1 <<>> @localhost primary.redis.service.dc-1.consul. A
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 11536
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;primary.redis.service.dc-1.consul. IN A
;; ANSWER SECTION:
primary.redis.service.dc-1.consul. 0 IN A 172.31.3.234
;; Query time: 4 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Wed Apr 9 17:36:12 2014
;; MSG SIZE rcvd: 76
```
If desired, verify reverse DNS using the same methodology:
```text
[root@localhost ~]# dig @127.0.0.1 -p 8600 133.139.16.172.in-addr.arpa. PTR
; <<>> DiG 9.10.3-P3 <<>> @127.0.0.1 -p 8600 133.139.16.172.in-addr.arpa. PTR
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 3713
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available
;; QUESTION SECTION:
;133.139.16.172.in-addr.arpa. IN PTR
;; ANSWER SECTION:
133.139.16.172.in-addr.arpa. 0 IN PTR consul1.node.dc1.consul.
;; Query time: 3 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: Sun Jan 31 04:25:39 UTC 2016
;; MSG SIZE rcvd: 109
[root@localhost ~]# dig @127.0.0.1 +short -x 172.16.139.133
consul1.node.dc1.consul.
```
## Troubleshooting
If you don't get an answer from your DNS server (e.g. BIND, Dnsmasq) but you
do get an answer from Consul, your best bet is to turn on your DNS server's
query log to see what's happening.
For BIND:
```text
[root@localhost ~]# rndc querylog
[root@localhost ~]# tail -f /var/log/messages
```
The log may show errors like this:
```text
error (no valid RRSIG) resolving
error (no valid DS) resolving
```
This indicates that DNSSEC is not disabled properly.
If you see errors about network connections, verify that there are no firewall
or routing problems between the servers running BIND and Consul.
For Dnsmasq, see the `log-queries` configuration option and the `USR1`
signal.
## Summary
In this guide we provided examples of configuring DNS forwarding with many
common, third-party tools. It is the responsibility of the operator to ensure
which ever tool they select is configured properly prior to integration
with Consul.

View File

@ -1,191 +0,0 @@
---
layout: docs
page_title: Geo Failover
description: >-
Consul provides a prepared query capability that makes it easy to implement
automatic geo failover policies for services.
---
# Geo Failover with Prepared Queries
Within a single datacenter, Consul provides automatic failover for services by omitting failed service instances from DNS lookups and by providing service health information in APIs.
When there are no more instances of a service available in the local datacenter, it can be challenging to implement failover policies to other datacenters because typically that logic would need to be written into each application. Fortunately, Consul has a [prepared query](/api/query) API
that provides the capability to let users define failover policies in a centralized way. It's easy to expose these to applications using Consul's DNS interface and it's also available to applications that consume Consul's APIs.
Failover policies are flexible and can be applied in a variety of ways including:
- Fully static lists of alternate datacenters.
- Fully dynamic policies that make use of Consul's [network coordinate](/docs/internals/coordinates) subsystem.
- Automatically determine the next best datacenter to failover to based on network round trip time.
Prepared queries can be made with policies specific to certain services and prepared query templates can allow one policy to apply to many, or even all services, with just a small number of templates.
This guide shows how to build geo failover policies using prepared queries through a set of examples. It also includes information on how to use prepared
query templates to simplify the failover process.
## Prepared Query Introduction
Prepared queries are objects that are defined at the datacenter level. They
only need to be created once and are stored on the Consul servers. This method is similar to the values in Consul's KV store.
Once created, prepared quires can then invoked by applications to perform the query and get the latest results.
Here's an example request to create a prepared query:
```shell
$ curl \
--request POST \
--data \
'{
"Name": "payments",
"Service": {
"Service": "payments",
"Tags": ["v1.2.3"]
}
}' http://127.0.0.1:8500/v1/query
{"ID":"fe3b8d40-0ee0-8783-6cc2-ab1aa9bb16c1"}
```
This creates a prepared query called "payments" that does a lookup for all instances of the "payments" service with the tag "v1.2.3". This policy could be used to control which version of a "payments" applications should be using in a centralized way. By [updating this prepared query](/api/query#update-prepared-query) to look for the tag "v1.2.4" applications could start to find the newer version of the service without having to reconfigure anything.
Applications can make use of this query in two ways.
1. Since we gave the prepared query a name, they can simply do a DNS lookup for "payments.query.consul" instead of "payments.service.consul". Now with the prepared query, there's the additional filter policy working behind the scenes that the application doesn't have to know about.
1. Queries can also be executed using the [prepared query execute API](/api/query#execute-prepared-query) for applications that integrate with Consul's APIs directly.
## Failover Policy Types
Using the techniques in this section you will develop prepared queries with failover policies where simply changing application configurations to look up "payments.query.consul" instead of "payments.service.consul" via DNS will result in automatic geo failover to the next closest [federated](/docs/guides/datacenters) Consul datacenters, in order of increasing network round trip time.
Failover is just another policy choice for a prepared query, it works in the same manner as the previous example and is similarly transparent to applications. The failover policy is configured using the `Failover` structure, which contains two fields, both of which are optional, and determine what happens if no healthy nodes are available in the local datacenter when the query is executed.
- `NearestN` `(int: 0)` - Specifies that the query will be forwarded to up to `NearestN` other datacenters based on their estimated network round trip time using [network coordinates](/docs/internals/coordinates).
- `Datacenters` `(array<string>: nil)` - Specifies a fixed list of remote datacenters to forward the query to if there are no healthy nodes in the local datacenter. Datacenters are queried in the order given in the list.
The following examples use those fields to implement different geo failover methods.
### Static Policy
A static failover policy includes a fixed list of datacenters to contact once there are no healthy instances in the local datacenter.
Here's the example from the introduction, expanded with a static failover policy:
```shell
$ curl \
--request POST \
--data \
'{
"Name": "payments",
"Service": {
"Service": "payments",
"Tags": ["v1.2.3"],
"Failover": {
"Datacenters": ["dc2", "dc3"]
}
}
}' http://127.0.0.1:8500/v1/query
{"ID":"fe3b8d40-0ee0-8783-6cc2-ab1aa9bb16c1"}
```
When this query is executed, such as with a DNS lookup to "payments.query.consul", the following actions will occur:
1. Consul servers in the local datacenter will attempt to find healthy instances of the "payments" service with the required tag.
2. If none are available locally, the Consul servers will make an RPC request to the Consul servers in "dc2" to perform the query there.
3. If none are available in "dc2", then an RPC will be made to the Consul servers in "dc3" to perform the query there.
4. Finally an error will be returned if none of these datacenters had any instances available.
### Dynamic Policy
In a complex federated environment with many Consul datacenters, it can be cumbersome to set static failover policies, so Consul offers a dynamic option based on Consul's [network coordinate](/docs/internals/coordinates) subsystem.
Consul continuously maintains an estimate of the network round trip time from the local datacenter to the servers in other datacenters it is federated with. Each server uses the median round trip time from itself to the servers in the remote datacenter. This means that failover can simply try other remote datacenters in order of increasing network round trip time, and if datacenters come and go, or experience network issues, this order will adjust automatically.
Here's the example from the introduction, expanded with a dynamic failover policy:
```shell
$ curl \
--request POST \
--data \
'{
"Name": "payments",
"Service": {
"Service": "payments",
"Tags": ["v1.2.3"],
"Failover": {
"NearestN": 2
}
}
}' http://127.0.0.1:8500/v1/query
{"ID":"fe3b8d40-0ee0-8783-6cc2-ab1aa9bb16c1"}
```
This query is resolved in a similar fashion to the previous example, except the choice of "dc1" or "dc2", or possibly some other datacenter, is made automatically.
### Hybrid Policy
It is possible to combine `Datacenters` and `NearestN` in the same policy. The `NearestN` queries will be performed first, followed by the list given by `Datacenters`.
```shell
$ curl \
--request POST \
--data \
'{
"Name": "payments",
"Service": {
"Service": "payments",
"Tags": ["v1.2.3"],
"Failover": {
"NearestN": 2,
"Datacenters": ["dc2", "dc3"]
}
}
}' http://127.0.0.1:8500/v1/query
{"ID":"fe3b8d40-0ee0-8783-6cc2-ab1aa9bb16c1"}
```
Note, a given datacenter will only be queried one time during a failover, even if it is selected by both `NearestN` and is listed in `Datacenters`. This is useful for allowing a limited number of round trip-based attempts, followed by a static configuration for some known datacenter to failover to.
### Prepared Query Template
For datacenters with many services, it can be challenging to define a geo failover policy for each service. To relieve this challenge, Consul provides a [prepared query template](/api/query#prepared-query-templates) that allows one prepared query to apply to many, and even all, services.
Templates can match on prefixes or use full regular expressions to determine which services they match.
Below is an example request to create a prepared query template that applies a dynamic geo failover policy to all services. We've chosen the `name_prefix_match` type and given it an empty name, which means that it will match any service.
```shell
$ curl \
--request POST \
--data \
'{
"Name": "",
"Template": {
"Type": "name_prefix_match"
},
"Service": {
"Service": "${name.full}",
"Failover": {
"NearestN": 2
}
}
}' http://127.0.0.1:8500/v1/query
{"ID":"fe3b8d40-0ee0-8783-6cc2-ab1aa9bb16c1"}
```
~> Note: If multiple queries are registered, the most specific one will be selected, so it's possible to have a template like this as a catch-all, and then apply more specific policies to certain services.
With this one prepared query template in place, simply changing application configurations to look up "payments.query.consul" instead of "payments.service.consul" via DNS will result in automatic geo failover to the next closest federated Consul datacenters, in order of increasing network round trip time.
## Summary
In this guide you learned how to use three different policy tpes for failover;
static, dynamic, and hybrid. You also learned how to create a prepared query template which will help you reduce some complexity of creating policies for
services.

View File

@ -1,188 +0,0 @@
---
layout: docs
page_title: Kubernetes Consul Reference Architecture
description: This document provides recommended practices and a reference architecture.
---
# Consul and Kubernetes Reference Architecture
Preparing your Kubernetes cluster to successfully deploy and run Consul is an
important first step in your production deployment process. In this guide you
will prepare your Kubernetes cluster, that can be running on any platform
(AKS, EKS, GKE, etc). However, we will call out cloud specific differences when
applicable. Before starting this guide you should have experience with
Kubernetes, and have `kubectl` and helm configured locally.
By the end of this guide, you will be able to select the right resource limits
for Consul pods, select the Consul datacenter design that meets your use case,
and understand the minimum networking requirements.
## Infrastructure Requirements
Consul server agents are responsible for the cluster state, responding to RPC
queries, and processing all write operations. Since the Consul servers are
highly active and are responsible for maintaining the cluster state, server
sizing is critical for the overall performance, efficiency, and health of the
Consul cluster. Review the [Consul Reference
Architecture](/consul/advanced/day-1-operations/reference-architecture#consul-servers)
guide for sizing recommendations for small and large Consul datacenters.
The CPU and memory recommendations can be used when you select the resources
limits for the Consul pods. The disk recommendations can also be used when
selecting the resources limits and configuring persistent volumes. You will
need to set both `limits` and `requests` in the Helm chart. Below is an example
snippet of Helm config for a Consul server in a large environment.
```yaml
# values.yaml
server
resources: |
requests:
memory: "32Gi"
cpu: "4"
limits:
memory: "32Gi"
cpu: "4"
storage: 50Gi
...
```
You should also set [resource limits for Consul
clients](https://www.consul.io/docs/platform/k8s/helm.html#v-client-resources),
so that the client pods do not unexpectedly consume more resources than
expected.
[Persistent
volumes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) (PV)
allow you to have a fixed disk location for the Consul data. This ensures that
if a Consul server is lost, the data will not be lost. This is an important
feature of Kubernetes, but may take some additional configuration. If you are
running Kubernetes on one of the major cloud platforms, persistent volumes
should already be configured for you; be sure to read their documentation for more
details. If you are setting up the persistent volumes resource in Kubernetes, you may need
to map the Consul server to that volume with the [storage class
parameter](https://www.consul.io/docs/platform/k8s/helm.html#v-server-storageclass).
Finally, you will need to enable RBAC on your Kubernetes cluster. Review
the [Kubernetes
RBAC](https://kubernetes.io/docs/reference/access-authn-authz/rbac/) documenation. You
should also review RBAC and authentication documentation if your Kubernetes cluster
is running on a major cloud platorom.
- [AWS](https://docs.aws.amazon.com/eks/latest/userguide/managing-auth.html).
- [GCP](https://cloud.google.com/kubernetes-engine/docs/how-to/role-based-access-control).
- [Azure](https://docs.microsoft.com/en-us/cli/azure/aks?view=azure-cli-latest#az-aks-create). In Azure, RBAC is enabled by default.
## Datacenter Design
There are many possible configurations for running Consul with Kubernetes. In this guide
we will cover three of the most common.
1. Consul agents can be solely deployed within Kubernetes.
1. Consul servers
can be deployed outside of Kubernetes and clients inside of Kubernetes.
1. Multiple Consul datacenters with agents inside and outside of Kubernetes.
Review the Consul Kubernetes-specific
[documentation](https://www.consul.io/docs/platform/k8s#use-cases)
for additional use case information.
Since all three use cases will also need catalog sync, review the
implementation [details for catalog sync](https://www.consul.io/docs/platform/k8s/service-sync.html).
### Consul Datacenter Deployed in Kubernetes
Deploying a Consul cluster, servers and clients, in Kubernetes can be done with
the official [Helm
chart](https://www.consul.io/docs/platform/k8s/helm.html#using-the-helm-chart).
This configuration is useful for managing services within Kubernetes and is
common for users who do not already have a production Consul datacenter.
![Reference Diagram](/img/k8s-consul-simple.png 'Consul in Kubernetes Reference Diagram')
The Consul datacenter in Kubernetes will function the same as a platform
independent Consul datacenter, such as Consul clusters deployed on bare metal servers
or virtual machines. Agents will communicate over LAN gossip, servers
will participate in the Raft consensus, and client requests will be
forwarded to the servers via RPCs.
### Consul Datacenter with a Kubernetes Cluster
To use an existing Consul cluster to manage services in Kubernetes, Consul
clients can be deployed within the Kubernetes cluster. This will also allow
Kubernetes-defined services to be synced to Consul. This design allows Consul tools
such as envconsul, consul-template, and more to work on Kubernetes.
![Reference Diagram](/img/k8s-cluster-consul-datacenter.png 'Consul and Kubernetes Reference Diagram')
This type of deployment in Kubernetes can also be set up with the official Helm
chart.
### Multiple Consul Clusters with a Kubernetes Cluster
Consul clusters in different datacenters running the same service can be joined
by WAN links. The clusters can operate independently and only communicate over
the WAN. This type datacenter design is detailed in the [Reference Architecture
guide](/consul/advanced/day-1-operations/reference-architecture#multiple-datacenters).
In this setup, you can have a Consul cluster running outside of Kubernetes and
a Consul cluster running inside of Kubernetes.
### Catalog Sync
To use catalog sync, you must enable it in the [Helm
chart](https://www.consul.io/docs/platform/k8s/helm.html#v-synccatalog).
Catalog sync allows you to sync services between Consul and Kubernetes. The
sync can be unidirectional in either direction or bidirectional. Read the
[documentation](https://www.consul.io/docs/platform/k8s/service-sync.html) to
learn more about the configuration.
Services synced from Kubernetes to Consul will be discoverable, like any other
service within the Consul datacenter. Read more in the [network
connectivity](#networking-connectivity) section to learn more about related
Kubernetes configuration. Services synced from Consul to Kubernetes will be
discoverable with the built-in Kubernetes DNS once a [Consul stub
domain](https://www.consul.io/docs/platform/k8s/dns.html) is deployed. When
bidirectional catalog sync is enabled, it will behave like the two
unidirectional setups.
## Networking Connectivity
When running Consul as a pod inside of Kubernetes, the Consul servers will be
automatically configured with the appropriate addresses. However, when running
Consul servers outside of the Kubernetes cluster and clients inside Kubernetes
as pods, there are additional [networking
considerations](/consul/advanced/day-1-operations/reference-architecture#network-connectivity).
### Network Connectivity for Services
When using Consul catalog sync, to sync Kubernetes services to Consul, you will
need to ensure the Kubernetes services are supported [service
types](https://www.consul.io/docs/platform/k8s/service-sync.html#kubernetes-service-types)
and configure correctly in Kubernetes. If the service is configured correctly,
it will be discoverable by Consul like any other service in the datacenter.
~> Warning: You are responsible for ensuring that external services can communicate
with services deployed in the Kubernetes cluster. For example, `ClusterIP` type services
may not be directly accessible by IP address from outside the Kubernetes cluster
for some Kubernetes configurations.
### Network Security
Finally, you should consider securing your Consul datacenter with
[ACLs](/consul/advanced/day-1-operations/production-acls). ACLs should be used with [Consul
Connect](https://www.consul.io/docs/platform/k8s/connect.html) to secure
service to service communication. The Kubernetes cluster should also be
secured.
## Summary
You are now prepared to deploy Consul with Kubernetes. In this
guide, you were introduced to several a datacenter design for a variety of use
cases. This guide also outlined the Kubernetes prerequisites, resource
requirements for Consul, and networking considerations. Continue onto the
[Deploying Consul with Kubernetes
guide](/consul/getting-started-k8s/helm-deploy) for
information on deploying Consul with the official Helm chart or continue
reading about Consul Operations in the [Day 1 Path](https://learn.hashicorp.com/consul/?track=advanced#advanced).

View File

@ -1,164 +0,0 @@
---
layout: docs
page_title: Application Leader Election with Sessions
description: >-
This guide describes how to build client-side leader election using Consul. If
you are interested in the leader election used internally to Consul, please
refer to the consensus protocol documentation instead.
---
# Application Leader Election with Sessions
For some applications, like HDFS, it is necessary to set one instance as
a leader. This ensures the application data is current and stable.
This guide describes how to build client-side leader elections for service
instances, using Consul. Consul's support for
[sessions](/docs/internals/sessions) allows you to build a system that can gracefully handle failures.
If you
are interested in the leader election used internally by Consul, please refer to the
[consensus protocol](/docs/internals/consensus) documentation instead.
## Contending Service Instances
Imagine you have a set of MySQL service instances who are attempting to acquire leadership. All service instances that are participating should agree on a given
key to coordinate. A good pattern is simply:
```text
service/<service name>/leader
```
This key will be used for all requests to the Consul KV API.
We will use the same, simple pattern for the MySQL services for the remainder of the guide.
```text
service/mysql/leader
```
### Create a Session
The first step is to create a session using the
[Session HTTP API](/api/session#session_create).
```shell
$ curl -X PUT -d '{"Name": "mysql-session"}' http://localhost:8500/v1/session/create
```
This will return a JSON object containing the session ID:
```json
{
"ID": "4ca8e74b-6350-7587-addf-a18084928f3c"
}
```
### Acquire a Session
The next step is to acquire a session for a given key from this instance
using the PUT method on a [KV entry](/api/kv) with the
`?acquire=<session>` query parameter.
The `<body>` of the PUT should be a
JSON object representing the local instance. This value is opaque to
Consul, but it should contain whatever information clients require to
communicate with your application (e.g., it could be a JSON object
that contains the node's name and the application's port).
```shell
$ curl -X PUT -d <body> http://localhost:8500/v1/kv/service/mysql/leader?acquire=4ca8e74b-6350-7587-addf-a18084928f3c
```
This will either return `true` or `false`. If `true`, the lock has been acquired and
the local service instance is now the leader. If `false` is returned, some other node has acquired
the lock.
### Watch the Session
All instances now remain in an idle waiting state. In this state, they watch for changes
on the key `service/mysql/leader`. This is because the lock may be released or the instance could fail, etc.
The leader must also watch for changes since its lock may be released by an operator
or automatically released due to a false positive in the failure detector.
By default, the session makes use of only the gossip failure detector. That
is, the session is considered held by a node as long as the default Serf health check
has not declared the node unhealthy. Additional checks can be specified if desired.
Watching for changes is done via a blocking query against the key. If they ever
notice that the `Session` field in the response is blank, there is no leader, and then should
retry lock acquisition. Each attempt to acquire the key should be separated by a timed
wait. This is because Consul may be enforcing a [`lock-delay`](/docs/internals/sessions).
### Release the Session
If the leader ever wishes to step down voluntarily, this should be done by simply
releasing the lock:
```shell
$ curl -X PUT http://localhost:8500/v1/kv/service/mysql/leader?release=4ca8e74b-6350-7587-addf-a18084928f3c
```
## Discover the Leader
It is possible to identify the leader of a set of service instances participating in the election process.
As with leader election, all instances that are participating should agree on the key being used to coordinate.
### Retrieve the Key
Instances have a very simple role, they simply read the Consul KV key to discover the current leader. If the key has an associated `Session`, then there is a leader.
```shell
$ curl -X GET http://localhost:8500/v1/kv/service/mysql/leader
[
{
"Session": "4ca8e74b-6350-7587-addf-a18084928f3c",
"Value": "Ym9keQ==",
"Flags": 0,
"Key": "service/mysql/leader",
"LockIndex": 1,
"ModifyIndex": 29,
"CreateIndex": 29
}
]
```
If there is a leader then the value of the key will provide all the
application-dependent information required as a Base64 encoded blob in
the `Value` field.
### Retrieve Session Information
You can query the
[`/v1/session/info`](/api/session#session_info)
endpoint to get details about the session
```shell
$ curl -X GET http://localhost:8500/v1/session/info/4ca8e74b-6350-7587-addf-a18084928f3c
[
{
"LockDelay": 1.5e+10,
"Checks": [
"serfHealth"
],
"Node": "consul-primary-bjsiobmvdij6-node-lhe5ihreel7y",
"Name": "mysql-session",
"ID": "4ca8e74b-6350-7587-addf-a18084928f3c",
"CreateIndex": 28
}
]
```
## Summary
In this guide you used a session to initiate manual leader election for a
set of service instances. To fully benefit from this process, instances should also watch the key using a blocking query for any
changes. If the leader steps down or fails, the `Session` associated
with the key will be cleared. When a new leader is elected, the key
value will also be updated.
Using the `acquire` parameter is optional. This means
that if you use leader election to update a key, you must not update the key
without the acquire parameter.

View File

@ -1,256 +0,0 @@
---
layout: docs
page_title: Minikube
description: Consul can be installed to the Kubernetes minikube tool for local development.
---
# Consul Installation to Minikube via Helm
In this guide, you'll start a local Kubernetes cluster with minikube. You'll install Consul with only a few commands, then deploy two custom services that use Consul to discover each other over encrypted TLS via Consul Connect. Finally, you'll tighten down Consul Connect so that only the approved applications can communicate with each other.
[Demo code](https://github.com/hashicorp/demo-consul-101) is available.
- [Task 1: Start Minikube and Install Consul with Helm](#task-1-start-minikube-and-install-consul-with-helm)
- [Task 2: Deploy a Consul Aware Application to the Cluster](#task-2-deploy-a-consul-aware-application-to-the-cluster)
- [Task 3: Configure Consul Connect](#task-3-use-consul-connect)
## Prerequisites
Let's install Consul on Kubernetes with minikube. This is a relatively quick and easy way to try out Consul on your local machine without the need for any cloud credentials. You'll be able to use most Consul features right away.
First, you'll need to follow the directions for [installing minikube](https://kubernetes.io/docs/tasks/tools/install-minikube/), including VirtualBox or similar.
You'll also need to install `kubectl` and `helm`.
Mac users can install `helm` and `kubectl` with Homebrew.
```shell
$ brew install kubernetes-cli
$ brew install kubernetes-helm
```
Windows users can use Chocolatey with the same package names:
```shell
$ choco install kubernetes-cli
$ choco install kubernetes-helm
```
For more on Helm, see [helm.sh](https://helm.sh/).
## Task 1: Start Minikube and Install Consul with Helm
### Step 1: Start Minikube
Start minikube. You can use the `--memory` option with the equivalent of 4GB to 8GB so there is plenty of memory for all the pods we will run. This may take several minutes. It will download a 100-300MB of dependencies and container images.
```
$ minikube start --memory 4096
```
Next, let's view the local Kubernetes dashboard with `minikube dashboard`. Even if the previous step completed successfully, you may have to wait a minute or two for minikube to be available. If you see an error, try again after a few minutes.
Once it spins up, you'll see the dashboard in your web browser. You can view pods, nodes, and other resources.
```
$ minikube dashboard
```
![Minikube Dashboard](/img/guides/minikube-dashboard.png 'Minikube Dashboard')
### Step 2: Install the Consul Helm Chart to the Cluster
To perform the steps in this lab exercise, clone the [hashicorp/demo-consul-101](https://github.com/hashicorp/demo-consul-101) repository from GitHub. Go into the `demo-consul-101/k8s` directory.
```
$ git clone https://github.com/hashicorp/demo-consul-101.git
$ cd demo-consul-101/k8s
```
Now we're ready to install Consul to the cluster, using the `helm` tool. Initialize Helm with `helm init`. You'll see a note that Tiller (the server-side component) has been installed. You can ignore the policy warning.
```
$ helm init
$HELM_HOME has been configured at /Users/geoffrey/.helm.
```
Now we need to install Consul with Helm. To get the freshest copy of the Helm chart, clone the [hashicorp/consul-helm](https://github.com/hashicorp/consul-helm) repository.
```
$ git clone https://github.com/hashicorp/consul-helm.git
```
The chart works on its own, but we'll override a few values to help things go more smoothly with minikube and to enable useful features.
We've created `helm-consul-values.yaml` for you with overrides. See `values.yaml` in the Helm chart repository for other possible values.
We've given a name to the datacenter running this Consul cluster. We've enabled the Consul web UI via a `NodePort`. When deploying to a hosted cloud that implements load balancers, we could use `LoadBalancer` instead. We'll enable secure communication between pods with Connect. We also need to enable `grpc` on the client for Connect to work properly. Finally, specify that this Consul cluster should only run one server (suitable for local development).
```yaml
# Choose an optional name for the datacenter
global:
datacenter: minidc
# Enable the Consul Web UI via a NodePort
ui:
service:
type: 'NodePort'
# Enable Connect for secure communication between nodes
connectInject:
enabled: true
client:
enabled: true
grpc: true
# Use only one Consul server for local development
server:
replicas: 1
bootstrapExpect: 1
disruptionBudget:
enabled: true
maxUnavailable: 0
```
Now, run `helm install` together with our overrides file and the cloned `consul-helm` chart. It will print a list of all the resources that were created.
```
$ helm install -f helm-consul-values.yaml --name hedgehog ./consul-helm
```
~> NOTE: If no `--name` is provided, the chart will create a random name for the installation. To reduce confusion, consider specifying a `--name`.
## Task 2: Deploy a Consul-aware Application to the Cluster
### Step 1: View the Consul Web UI
Verify the installation by going back to the Kubernetes dashboard in your web browser. Find the list of services. Several include `consul` in the name and have the `app: consul` label.
![Minikube Dashboard with Consul](/img/guides/minikube-dashboard-consul.png 'Minikube Dashboard with Consul')
There are a few differences between running Kubernetes on a hosted cloud vs locally with minikube. You may find that any load balancer resources don't work as expected on a local cluster. But we can still view the Consul UI and other deployed resources.
Run `minikube service list` to see your services. Find the one with `consul-ui` in the name.
```
$ minikube service list
```
Run `minikube service` with the `consul-ui` service name as the argument. It will open the service in your web browser.
```
$ minikube service hedgehog-consul-ui
```
You can now view the Consul web UI with a list of Consul's services, nodes, and other resources.
![Minikube Consul UI](/img/guides/minikube-consul-ui.png 'Minikube Consul UI')
### Step 2: Deploy Custom Applications
Now let's deploy our application. It's two services: a backend data service that returns a number (`counting` service) and a front-end `dashboard` that pulls from the `counting` service over HTTP and displays the number. The kubernetes part is a single line: `kubectl create -f 04-yaml-connect-envoy`. This is a directory with several YAML files, each defining one or more resources (pods, containers, etc).
```
$ kubectl create -f 04-yaml-connect-envoy
```
The output shows that they have been created. In reality, they may take a few seconds to spin up. Refresh the Kubernetes dashboard a few times and you'll see that the `counting` and `dashboard` services are running. You can also click a resource to view more data about it.
![Services](/img/guides/minikube-services.png 'Services')
### Step 3: View the Web Application
For the last step in this initial task, use the Kubernetes `port-forward` feature for the dashboard service running on port `9002`. We already know that the pod is named `dashboard` thanks to the metadata specified in the YAML we deployed.
```
$ kubectl port-forward dashboard 9002:9002
```
Visit http://localhost:9002 in your web browser. You'll see a running `dashboard` container in the kubernetes cluster that displays a number retrieved from the `counting` service using Consul service discovery and secured over the network by TLS via an Envoy proxy.
![Application Dashboard](/img/guides/minikube-app-dashboard.png 'Application Dashboard')
### Addendum: Review the Code
Let's take a peek at the code. Relevant to this Kubernetes deployment are two YAML files in the `04` directory. The `counting` service defines an `annotation` in the `metadata` section that instructs Consul to spin up a Consul Connect proxy for this service: `connect-inject`. The relevant port number is found in the `containerPort` section (`9001`). This Pod registers a Consul service that will be available via a secure proxy.
```yaml
apiVersion: v1
kind: Pod
metadata:
name: counting
annotations:
'consul.hashicorp.com/connect-inject': 'true'
spec:
containers:
- name: counting
image: hashicorp/counting-service:0.0.2
ports:
- containerPort: 9001
name: http
# ...
```
The other side is on the `dashboard` service. This declares the same `connect-inject` annotation but also adds another. The `connect-service-upstreams` in the `annotations` section configures Connect such that this Pod will have access to the `counting` service on `localhost` port `9001`. All the rest of the configuration and communication is taken care of by Consul and the Consul Helm chart.
```yaml
apiVersion: v1
kind: Pod
metadata:
name: dashboard
labels:
app: 'dashboard'
annotations:
'consul.hashicorp.com/connect-inject': 'true'
'consul.hashicorp.com/connect-service-upstreams': 'counting:9001'
spec:
containers:
- name: dashboard
image: hashicorp/dashboard-service:0.0.3
ports:
- containerPort: 9002
name: http
env:
- name: COUNTING_SERVICE_URL
value: 'http://localhost:9001'
# ...
```
Within our `dashboard` application, we can access the `counting` service by communicating with `localhost:9001` as seen on the last line of this snippet. Here we are looking at an environment variable that is specific to the Go application running in a container in this Pod. Instead of providing an IP address or even a Consul service URL, we tell the application to talk to `localhost:9001` where our local end of the proxy is ready and listening. Because of the annotation to `counting:9001` earlier, we know that an instance of the `counting` service is on the other end.
This is what is happening in the cluster and over the network when we view the `dashboard` service in the browser.
-> TIP: The full source code for the Go-based web services and all code needed to build the Docker images are available in the [repo](https://github.com/hashicorp/demo-consul-101).
## Task 3: Use Consul Connect
### Step 1: Create an Intention that Denies All Service Communication by Default
For a final task, let's take this a step further by restricting service communication with intentions. We don't want any service to be able to communicate with any other service; only the ones we specify.
Begin by navigating to the _Intentions_ screen in the Consul web UI. Click the "Create" button and define an initial intention that blocks all communication between any services by default. Choose `*` as the source and `*` as the destination. Choose the _Deny_ radio button and add an optional description. Click "Save."
![Connect Deny](/img/guides/minikube-connect-deny.png 'Connect Deny')
Verify this by returning to the application dashboard where you will see that the "Counting Service is Unreachable."
![Application is Unreachable](/img/guides/minikube-connect-unreachable.png 'Application is Unreachable')
### Step 2: Allow the Application Dashboard to Connect to the Counting Service
Finally, the easy part. Click the "Create" button again and create an intention that allows the `dashboard` source service to talk to the `counting` destination service. Ensure that the "Allow" radio button is selected. Optionally add a description. Click "Save."
![Allow](/img/guides/minikube-connect-allow.png 'Allow')
This action does not require a reboot. It takes effect so quickly that by the time you visit the application dashboard, you'll see that it's successfully communicating with the backend `counting` service again.
And there we have Consul running on a Kubernetes cluster, as demonstrated by two services which communicate with each other via Consul Connect and an Envoy proxy.
![Success](/img/guides/minikube-connect-success.png 'Success')
## Reference
For more on Consul's integration with Kubernetes (including multi-cloud, service sync, and other features), see the [Consul with Kubernetes](/docs/platform/k8s) documentation.

View File

@ -1,317 +0,0 @@
---
layout: docs
page_title: Monitoring Consul with Telegraf
description: >-
Best practice approaches for monitoring a production Consul cluster with
Telegraf
---
# Monitoring Consul with Telegraf
Consul makes a range of metrics in various formats available so operators can
measure the health and stability of a cluster, and diagnose or predict potential
issues.
There are number of monitoring tools and options available, but for the purposes
of this guide we are going to use the [telegraf_plugin][] in conjunction with
the StatsD protocol supported by Consul.
You can read the full list of metrics available with Consul in the [telemetry
documentation](/docs/agent/telemetry).
In this guide you will:
- Configure Telegraf to collect StatsD and host level metrics
- Configure Consul to send metrics to Telegraf
- See an example of metrics visualization
- Understand important metrics to aggregate and alert on
## Installing Telegraf
The process for installing Telegraf depends on your operating system. We
recommend following the [official Telegraf installation
documentation][telegraf-install].
## Configuring Telegraf
Telegraf acts as a StatsD agent and can collect additional metrics about the
hosts where Consul agents are running. Telegraf itself ships with a wide range
of [input plugins][telegraf-input-plugins] to collect data from lots of sources
for this purpose.
We're going to enable some of the most common input plugins to monitor CPU,
memory, disk I/O, networking, and process status, since these are useful for
debugging Consul cluster issues.
The `telegraf.conf` file starts with global options:
```toml
[agent]
interval = "10s"
flush_interval = "10s"
omit_hostname = false
```
We set the default collection interval to 10 seconds and ask Telegraf to include
a `host` tag in each metric.
As mentioned above, Telegraf also allows you to set additional tags on the
metrics that pass through it. In this case, we are adding tags for the server
role and datacenter. We can then use these tags in Grafana to filter queries
(for example, to create a dashboard showing only servers with the
`consul-server` role, or only servers in the `us-east-1` datacenter).
```toml
[global_tags]
role = "consul-server"
datacenter = "us-east-1"
```
Next, we set up a StatsD listener on UDP port 8125, with instructions to
calculate percentile metrics and to parse DogStatsD-compatible tags, when
they're sent:
```toml
[[inputs.statsd]]
protocol = "udp"
service_address = ":8125"
delete_gauges = true
delete_counters = true
delete_sets = true
delete_timings = true
percentiles = [90]
metric_separator = "_"
parse_data_dog_tags = true
allowed_pending_messages = 10000
percentile_limit = 1000
```
The full reference to all the available StatsD-related options in Telegraf is
[here][telegraf-statsd-input].
Now we can configure inputs for things like CPU, memory, network I/O, and disk
I/O. Most of them don't require any configuration, but make sure the
`interfaces` list in `inputs.net` matches the interface names you see in
`ifconfig`.
```toml
[[inputs.cpu]]
percpu = true
totalcpu = true
collect_cpu_time = false
[[inputs.disk]]
# mount_points = ["/"]
# ignore_fs = ["tmpfs", "devtmpfs"]
[[inputs.diskio]]
# devices = ["sda", "sdb"]
# skip_serial_number = false
[[inputs.kernel]]
# no configuration
[[inputs.linux_sysctl_fs]]
# no configuration
[[inputs.mem]]
# no configuration
[[inputs.net]]
interfaces = ["enp0s*"]
[[inputs.netstat]]
# no configuration
[[inputs.processes]]
# no configuration
[[inputs.swap]]
# no configuration
[[inputs.system]]
# no configuration
```
Another useful plugin is the [procstat][telegraf-procstat-input] plugin, which
reports metrics for processes you select:
```toml
[[inputs.procstat]]
pattern = "(consul)"
```
Telegraf even includes a [plugin][telegraf-consul-input] that monitors the
health checks associated with the Consul agent, using Consul API to query the
data.
It's important to note: the plugin itself will not report the telemetry, Consul
will report those stats already using StatsD protocol.
```toml
[[inputs.consul]]
address = "localhost:8500"
scheme = "http"
```
## Telegraf Configuration for Consul
Asking Consul to send telemetry to Telegraf is as simple as adding a `telemetry`
section to your agent configuration:
```json
{
"telemetry": {
"dogstatsd_addr": "localhost:8125",
"disable_hostname": true
}
}
```
As you can see, we only need to specify two options. The `dogstatsd_addr`
specifies the hostname and port of the StatsD daemon.
Note that we specify DogStatsD format instead of plain StatsD, which tells
Consul to send [tags][tagging] with each metric. Tags can be used by Grafana to
filter data on your dashboards (for example, displaying only the data for which
`role=consul-server`. Telegraf is compatible with the DogStatsD format and
allows us to add our own tags too.
The second option tells Consul not to insert the hostname in the names of the
metrics it sends to StatsD, since the hostnames will be sent as tags. Without
this option, the single metric `consul.raft.apply` would become multiple
metrics:
consul.server1.raft.apply
consul.server2.raft.apply
consul.server3.raft.apply
If you are using a different agent (e.g. Circonus, Statsite, or plain StatsD),
you may want to change this configuration, and you can find the configuration
reference [here][consul-telemetry-config].
## Visualising Telegraf Consul Metrics
You can use a tool like [Grafana][] or [Chronograf][] to visualize metrics from
Telegraf.
Here is an example Grafana dashboard:
[![Grafana Consul Cluster](/img/grafana-screenshot.png)](/img/grafana-screenshot.png)
## Metric Aggregates and Alerting from Telegraf
### Memory usage
| Metric Name | Description |
| :------------------ | :------------------------------------------------------------- |
| `mem.total` | Total amount of physical memory (RAM) available on the server. |
| `mem.used_percent` | Percentage of physical memory in use. |
| `swap.used_percent` | Percentage of swap space in use. |
**Why they're important:** Consul keeps all of its data in memory. If Consul
consumes all available memory, it will crash. You should also monitor total
available RAM to make sure some RAM is available for other processes, and swap
usage should remain at 0% for best performance.
**What to look for:** If `mem.used_percent` is over 90%, or if
`swap.used_percent` is greater than 0.
### File descriptors
| Metric Name | Description |
| :------------------------- | :------------------------------------------------------------------ |
| `linux_sysctl_fs.file-nr` | Number of file handles being used across all processes on the host. |
| `linux_sysctl_fs.file-max` | Total number of available file handles. |
**Why it's important:** Practically anything Consul does -- receiving a
connection from another host, sending data between servers, writing snapshots to
disk -- requires a file descriptor handle. If Consul runs out of handles, it
will stop accepting connections. See [the Consul FAQ][consul_faq_fds] for more
details.
By default, process and kernel limits are fairly conservative. You will want to
increase these beyond the defaults.
**What to look for:** If `file-nr` exceeds 80% of `file-max`.
### CPU usage
| Metric Name | Description |
| :--------------- | :--------------------------------------------------------------- |
| `cpu.user_cpu` | Percentage of CPU being used by user processes (such as Consul). |
| `cpu.iowait_cpu` | Percentage of CPU time spent waiting for I/O tasks to complete. |
**Why they're important:** Consul is not particularly demanding of CPU time, but
a spike in CPU usage might indicate too many operations taking place at once,
and `iowait_cpu` is critical -- it means Consul is waiting for data to be
written to disk, a sign that Raft might be writing snapshots to disk too often.
**What to look for:** if `cpu.iowait_cpu` greater than 10%.
### Network activity - Bytes Recived
| Metric Name | Description |
| :--------------- | :------------------------------------------- |
| `net.bytes_recv` | Bytes received on each network interface. |
| `net.bytes_sent` | Bytes transmitted on each network interface. |
**Why they're important:** A sudden spike in network traffic to Consul might be
the result of a misconfigured application client causing too many requests to
Consul. This is the raw data from the system, rather than a specific Consul
metric.
**What to look for:** Sudden large changes to the `net` metrics (greater than
50% deviation from baseline).
**NOTE:** The `net` metrics are counters, so in order to calculate rates (such
as bytes/second), you will need to apply a function such as
[non_negative_difference][].
### Disk activity
| Metric Name | Description |
| :------------------- | :---------------------------------- |
| `diskio.read_bytes` | Bytes read from each block device. |
| `diskio.write_bytes` | Bytes written to each block device. |
**Why they're important:** If the Consul host is writing a lot of data to disk,
such as under high volume workloads, there may be frequent major I/O spikes
during leader elections. This is because under heavy load, Consul is
checkpointing Raft snapshots to disk frequently.
It may also be caused by Consul having debug/trace logging enabled in
production, which can impact performance.
Too much disk I/O can cause the rest of the system to slow down or become
unavailable, as the kernel spends all its time waiting for I/O to complete.
**What to look for:** Sudden large changes to the `diskio` metrics (greater than
50% deviation from baseline, or more than 3 standard deviations from baseline).
**NOTE:** The `diskio` metrics are counters, so in order to calculate rates
(such as bytes/second), you will need to apply a function such as
[non_negative_difference][].
## Summary
In this guide you learned how to set up Telegraf with Consul to collect metrics,
and considered your options for visualizing, aggregating, and alerting on those
metrics. To learn about other factors (in addition to monitoring) that you
should consider when running Consul in production, see the [Production Checklist][prod-checklist].
[non_negative_difference]: https://docs.influxdata.com/influxdb/v1.5/query_language/functions/#non-negative-difference
[consul_faq_fds]: https://www.consul.io/docs/faq.html#q-does-consul-require-certain-user-process-resource-limits-
[telegraf_plugin]: https://github.com/influxdata/telegraf/tree/master/plugins/inputs/consul
[telegraf-install]: https://docs.influxdata.com/telegraf/v1.6/introduction/installation/
[telegraf-consul-input]: https://github.com/influxdata/telegraf/tree/release-1.6/plugins/inputs/consul
[telegraf-statsd-input]: https://github.com/influxdata/telegraf/tree/release-1.6/plugins/inputs/statsd
[telegraf-procstat-input]: https://github.com/influxdata/telegraf/tree/release-1.6/plugins/inputs/procstat
[telegraf-input-plugins]: https://docs.influxdata.com/telegraf/v1.6/plugins/inputs/
[tagging]: https://docs.datadoghq.com/getting_started/tagging/
[consul-telemetry-config]: https://www.consul.io/docs/agent/options.html#telemetry
[consul-telemetry-ref]: https://www.consul.io/docs/agent/telemetry.html
[telegraf-input-plugins]: https://docs.influxdata.com/telegraf/v1.6/plugins/inputs/
[grafana]: https://www.influxdata.com/partners/grafana/
[chronograf]: https://www.influxdata.com/time-series-platform/chronograf/
[prod-checklist]: https://learn.hashicorp.com/consul/advanced/day-1-operations/production-checklist

View File

@ -1,273 +0,0 @@
---
layout: docs
page_title: Partial LAN Connectivity - Configuring Network Segments
description: >-
Many advanced Consul users have the need to run clusters with segmented
networks, meaning that
not all agents can be in a full mesh. This is usually the result of business
policies enforced
via network rules or firewalls. Prior to Consul 0.9.3 this was only possible
through federation,
which for some users is too heavyweight or expensive as it requires running
multiple servers per
segment.
---
# Network Segments [Enterprise Only]
~> Note, the network segment functionality described here is available only in [Consul Enterprise](https://www.hashicorp.com/products/consul/) version 0.9.3 and later.
Many advanced Consul users have the need to run clusters with segmented networks, meaning that
not all agents can be in a full mesh. This is usually the result of business policies enforced
via network rules or firewalls. Prior to Consul 0.9.3 this was only possible through federation,
which for some users is too heavyweight or expensive as it requires running multiple servers per
segment.
This guide will cover the basic configuration for setting up multiple segments, as well as
how to configure a prepared query to limit service discovery to the services in the local agent's
network segment.
To complete this guide you will need to complete the
[Deployment Guide](https://learn.hashicorp.com/consul/advanced/day-1-operations/deployment-guide).
## Partial LAN Connectivity with Network Segments
By default, all Consul agents in one datacenter are part of a shared gossip pool over the LAN;
this means that the partial connectivity caused by segmented networks would cause health flapping
as nodes failed to communicate. In this guide we will cover the Network Segments feature, added
in [Consul Enterprise](https://www.hashicorp.com/products/consul/) version 0.9.3, which allows users
to configure Consul to support this kind of segmented network topology.
### Network Segments Overview
All Consul agents are part of the default network segment, unless a segment is specified in
their configuration. In a standard cluster setup, all agents will normally be part of this default
segment and as a result, part of one shared LAN gossip pool.
Network segments can be used to break
up the LAN gossip pool into multiple isolated smaller pools by specifying the configuration for segments
on the servers. Each desired segment must be given a name and port, as well as optionally a custom
bind and advertise address for that segment's gossip listener to bind to on the server.
A few things to note:
1. Servers will be a part of all segments they have been configured with. They are the common point
linking the different segments together. The configured list of segments is specified by the
[`segments`](/docs/agent/options#segments) option.
2. Client agents can only be part of one segment at a given time, specified by the [`-segment`](/docs/agent/options#_segment) option.
3. Clients can only join agents in the same segment as them. If they attempt to join a client in
another segment, or the listening port of another segment on a server, they will get a segment mismatch error.
Once the servers have been configured with the correct segment info, the clients only need to specify
their own segment in the [Agent Config](/docs/agent/options#_segment) and join by connecting to another
agent within the same segment. If joining to a Consul server, client will need to provide the server's
port for their segment along with the address of the server when performing the join (for example,
`consul agent -retry-join "consul.domain.internal:1234"`).
## Setup Network Segments
### Configure Consul Servers
To get started,
start a server or group of servers, with the following section added to the configuration. Note, you may need to
adjust the bind/advertise addresses for your setup.
```json
{
"segments": [
{
"name": "alpha",
"bind": "{{GetPrivateIP}}",
"advertise": "{{GetPrivateIP}}",
"port": 8303
},
{
"name": "beta",
"bind": "{{GetPrivateIP}}",
"advertise": "{{GetPrivateIP}}",
"port": 8304
}
]
}
```
You should see a log message on the servers for each segment's listener as the agent starts up.
```shell
2017/08/30 19:05:13 [INFO] serf: EventMemberJoin: server1.dc1 192.168.0.4
2017/08/30 19:05:13 [INFO] serf: EventMemberJoin: server1 192.168.0.4
2017/08/30 19:05:13 [INFO] consul: Started listener for LAN segment "alpha" on 192.168.0.4:8303
2017/08/30 19:05:13 [INFO] serf: EventMemberJoin: server1 192.168.0.4
2017/08/30 19:05:13 [INFO] consul: Started listener for LAN segment "beta" on 192.168.0.4:8304
2017/08/30 19:05:13 [INFO] serf: EventMemberJoin: server1 192.168.0.4
```
Running `consul members` should show the server as being part of all segments.
```shell
(server1) $ consul members
Node Address Status Type Build Protocol DC Segment
server1 192.168.0.4:8301 alive server 0.9.3+ent 2 dc1 <all>
```
### Configure Consul Clients in Different Network Segments
Next, start a client agent in the 'alpha' segment, with `-join` set to the server's segment
address/port for that segment.
```shell
(client1) $ consul agent ... -join 192.168.0.4:8303 -node client1 -segment alpha
```
After the join is successful, we should see the client show up by running the `consul members` command
on the server again.
```shell
(server1) $ consul members
Node Address Status Type Build Protocol DC Segment
server1 192.168.0.4:8301 alive server 0.9.3+ent 2 dc1 <all>
client1 192.168.0.5:8301 alive client 0.9.3+ent 2 dc1 alpha
```
Now join another client in segment 'beta' and run the `consul members` command another time.
```shell
(client2) $ consul agent ... -join 192.168.0.4:8304 -node client2 -segment beta
```
```shell
(server1) $ consul members
Node Address Status Type Build Protocol DC Segment
server1 192.168.0.4:8301 alive server 0.9.3+ent 2 dc1 <all>
client1 192.168.0.5:8301 alive client 0.9.3+ent 2 dc1 alpha
client2 192.168.0.6:8301 alive client 0.9.3+ent 2 dc1 beta
```
### Filter Segmented Nodes
If we pass the `-segment` flag when running `consul members`, we can limit the view to agents
in a specific segment.
```shell
(server1) $ consul members -segment alpha
Node Address Status Type Build Protocol DC Segment
client1 192.168.0.5:8301 alive client 0.9.3+ent 2 dc1 alpha
server1 192.168.0.4:8303 alive server 0.9.3+ent 2 dc1 alpha
```
Using the `consul catalog nodes` command, we can filter on an internal metadata key,
`consul-network-segment`, which stores the network segment of the node.
```shell
(server1) $ consul catalog nodes -node-meta consul-network-segment=alpha
Node ID Address DC
client1 4c29819c 192.168.0.5 dc1
```
With this metadata key, we can construct a [Prepared Query](/api/query) that can be used
for DNS to return only services within the same network segment as the local agent.
## Configure a Prepared Query to Limit Service Discovery
### Create Services
First, register a service on each of the client nodes.
```shell
(client1) $ curl \
--request PUT \
--data '{"Name": "redis", "Port": 8000}' \
localhost:8500/v1/agent/service/register
```
```shell
(client2) $ curl \
--request PUT \
--data '{"Name": "redis", "Port": 9000}' \
localhost:8500/v1/agent/service/register
```
### Create the Prepared Query
Next, write the following to `query.json` and create the query using the HTTP endpoint.
```shell
(server1) $ curl \
--request POST \
--data \
'{
"Name": "",
"Template": {
"Type": "name_prefix_match"
},
"Service": {
"Service": "${name.full}",
"NodeMeta": {"consul-network-segment": "${agent.segment}"}
}
}' localhost:8500/v1/query
{"ID":"6f49dd24-de9b-0b6c-fd29-525eca069419"}
```
### Test the Segments with DNS Lookups
Now, we can replace any dns lookups of the form `<service>.service.consul` with
`<service>.query.consul` to look up only services within the same network segment.
**Client 1:**
```shell
(client1) $ dig @127.0.0.1 -p 8600 redis.query.consul SRV
; <<>> DiG 9.8.3-P1 <<>> @127.0.0.1 -p 8600 redis.query.consul SRV
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 3149
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; QUESTION SECTION:
;redis.query.consul. IN SRV
;; ANSWER SECTION:
redis.query.consul. 0 IN SRV 1 1 8000 client1.node.dc1.consul.
;; ADDITIONAL SECTION:
client1.node.dc1.consul. 0 IN A 192.168.0.5
```
**Client 2:**
```shell
(client2) $ dig @127.0.0.1 -p 8600 redis.query.consul SRV
; <<>> DiG 9.8.3-P1 <<>> @127.0.0.1 -p 8600 redis.query.consul SRV
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 3149
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; QUESTION SECTION:
;redis.query.consul. IN SRV
;; ANSWER SECTION:
redis.query.consul. 0 IN SRV 1 1 9000 client2.node.dc1.consul.
;; ADDITIONAL SECTION:
client2.node.dc1.consul. 0 IN A 192.168.0.6
```
## Summary
In this guide you configured the Consul agents to participate in partial
LAN gossip based on network segments. You then set up a couple services and
a prepared query to test the segments.

View File

@ -1,255 +0,0 @@
---
layout: docs
page_title: Outage Recovery
description: >-
Don't panic! This is a critical first step. Depending on your deployment
configuration, it may take only a single server failure for cluster
unavailability. Recovery requires an operator to intervene, but recovery is
straightforward.
---
# Outage Recovery
Don't panic! This is a critical first step.
Depending on your
[deployment configuration](/docs/internals/consensus#deployment_table), it
may take only a single server failure for cluster unavailability. Recovery
requires an operator to intervene, but the process is straightforward.
This guide is for recovery from a Consul outage due to a majority
of server nodes in a datacenter being lost. There are several types
of outages, depending on the number of server nodes and number of failed
server nodes. We will outline how to recover from:
- Failure of a Single Server Cluster. This is when you have a single Consul
server and it fails.
- Failure of a Server in a Multi-Server Cluster. This is when one server fails,
the Consul cluster has 3 or more servers.
- Failure of Multiple Servers in a Multi-Server Cluster. This when more than one
Consul server fails in a cluster of 3 or more servers. This scenario is potentially
the most serious, because it can result in data loss.
## Failure of a Single Server Cluster
If you had only a single server and it has failed, simply restart it. A
single server configuration requires the
[`-bootstrap`](/docs/agent/options#_bootstrap) or
[`-bootstrap-expect=1`](/docs/agent/options#_bootstrap_expect)
flag.
```shell
consul agent -bootstrap-expect=1
```
If the server cannot be recovered, you need to bring up a new
server using the [deployment guide](https://www.consul.io/docs/guides/deployment-guide.html).
In the case of an unrecoverable server failure in a single server cluster and
no backup procedure, data loss is inevitable since data was not replicated
to any other servers. This is why a single server deploy is **never** recommended.
Any services registered with agents will be re-populated when the new server
comes online as agents perform [anti-entropy](/docs/internals/anti-entropy).
## Failure of a Server in a Multi-Server Cluster
If you think the failed server is recoverable, the easiest option is to bring
it back online and have it rejoin the cluster with the same IP address, returning
the cluster to a fully healthy state. Similarly, even if you need to rebuild a
new Consul server to replace the failed node, you may wish to do that immediately.
Keep in mind that the rebuilt server needs to have the same IP address as the failed
server. Again, once this server is online and has rejoined, the cluster will return
to a fully healthy state.
```shell
consul agent -bootstrap-expect=3 -bind=192.172.2.4 -auto-rejoin=192.172.2.3
```
Both of these strategies involve a potentially lengthy time to reboot or rebuild
a failed server. If this is impractical or if building a new server with the same
IP isn't an option, you need to remove the failed server. Usually, you can issue
a [`consul force-leave`](/docs/commands/force-leave) command to remove the failed
server if it's still a member of the cluster.
```shell
consul force-leave <node.name.consul>
```
If [`consul force-leave`](/docs/commands/force-leave) isn't able to remove the
server, you have two methods available to remove it, depending on your version of Consul:
- In Consul 0.7 and later, you can use the [`consul operator`](/docs/commands/operator#raft-remove-peer) command to remove the stale peer server on the fly with no downtime if the cluster has a leader.
- In versions of Consul prior to 0.7, you can manually remove the stale peer
server using the `raft/peers.json` recovery file on all remaining servers. See
the [section below](#peers.json) for details on this procedure. This process
requires a Consul downtime to complete.
In Consul 0.7 and later, you can use the [`consul operator`](/docs/commands/operator#raft-list-peers)
command to inspect the Raft configuration:
```
$ consul operator raft list-peers
Node ID Address State Voter RaftProtocol
alice 10.0.1.8:8300 10.0.1.8:8300 follower true 3
bob 10.0.1.6:8300 10.0.1.6:8300 leader true 3
carol 10.0.1.7:8300 10.0.1.7:8300 follower true 3
```
## Failure of Multiple Servers in a Multi-Server Cluster
In the event that multiple servers are lost, causing a loss of quorum and a
complete outage, partial recovery is possible using data on the remaining
servers in the cluster. There may be data loss in this situation because multiple
servers were lost, so information about what's committed could be incomplete.
The recovery process implicitly commits all outstanding Raft log entries, so
it's also possible to commit data that was uncommitted before the failure.
See the section below on manual recovery using peers.json for details of the recovery procedure. You
simply include just the remaining servers in the `raft/peers.json` recovery file.
The cluster should be able to elect a leader once the remaining servers are all
restarted with an identical `raft/peers.json` configuration.
Any new servers you introduce later can be fresh with totally clean data directories
and joined using Consul's `join` command.
```shell
consul agent -join=192.172.2.3
```
In extreme cases, it should be possible to recover with just a single remaining
server by starting that single server with itself as the only peer in the
`raft/peers.json` recovery file.
Prior to Consul 0.7 it wasn't always possible to recover from certain
types of outages with `raft/peers.json` because this was ingested before any Raft
log entries were played back. In Consul 0.7 and later, the `raft/peers.json`
recovery file is final, and a snapshot is taken after it is ingested, so you are
guaranteed to start with your recovered configuration. This does implicitly commit
all Raft log entries, so should only be used to recover from an outage, but it
should allow recovery from any situation where there's some cluster data available.
<a name="peers.json"></a>
### Manual Recovery Using peers.json
To begin, stop all remaining servers. You can attempt a graceful leave,
but it will not work in most cases. Do not worry if the leave exits with an
error. The cluster is in an unhealthy state, so this is expected.
In Consul 0.7 and later, the `peers.json` file is no longer present
by default and is only used when performing recovery. This file will be deleted
after Consul starts and ingests this file. Consul 0.7 also uses a new, automatically-
created `raft/peers.info` file to avoid ingesting the `raft/peers.json` file on the
first start after upgrading. Be sure to leave `raft/peers.info` in place for proper
operation.
Using `raft/peers.json` for recovery can cause uncommitted Raft log entries to be
implicitly committed, so this should only be used after an outage where no
other option is available to recover a lost server. Make sure you don't have
any automated processes that will put the peers file in place on a
periodic basis.
The next step is to go to the [`-data-dir`](/docs/agent/options#_data_dir)
of each Consul server. Inside that directory, there will be a `raft/`
sub-directory. We need to create a `raft/peers.json` file. The format of this file
depends on what the server has configured for its
[Raft protocol](/docs/agent/options#_raft_protocol) version.
For Raft protocol version 2 and earlier, this should be formatted as a JSON
array containing the address and port of each Consul server in the cluster, like
this:
```json
["10.1.0.1:8300", "10.1.0.2:8300", "10.1.0.3:8300"]
```
For Raft protocol version 3 and later, this should be formatted as a JSON
array containing the node ID, address:port, and suffrage information of each
Consul server in the cluster, like this:
```
[
{
"id": "adf4238a-882b-9ddc-4a9d-5b6758e4159e",
"address": "10.1.0.1:8300",
"non_voter": false
},
{
"id": "8b6dda82-3103-11e7-93ae-92361f002671",
"address": "10.1.0.2:8300",
"non_voter": false
},
{
"id": "97e17742-3103-11e7-93ae-92361f002671",
"address": "10.1.0.3:8300",
"non_voter": false
}
]
```
- `id` `(string: <required>)` - Specifies the [node ID](/docs/agent/options#_node_id)
of the server. This can be found in the logs when the server starts up if it was auto-generated,
and it can also be found inside the `node-id` file in the server's data directory.
- `address` `(string: <required>)` - Specifies the IP and port of the server. The port is the
server's RPC port used for cluster communications.
- `non_voter` `(bool: <false>)` - This controls whether the server is a non-voter, which is used
in some advanced [Autopilot](/docs/guides/autopilot) configurations. If omitted, it will
default to false, which is typical for most clusters.
Simply create entries for all servers. You must confirm that servers you do not include here have
indeed failed and will not later rejoin the cluster. Ensure that this file is the same across all
remaining server nodes.
At this point, you can restart all the remaining servers. In Consul 0.7 and
later you will see them ingest recovery file:
```text
...
2016/08/16 14:39:20 [INFO] consul: found peers.json file, recovering Raft configuration...
2016/08/16 14:39:20 [INFO] consul.fsm: snapshot created in 12.484µs
2016/08/16 14:39:20 [INFO] snapshot: Creating new snapshot at /tmp/peers/raft/snapshots/2-5-1471383560779.tmp
2016/08/16 14:39:20 [INFO] consul: deleted peers.json file after successful recovery
2016/08/16 14:39:20 [INFO] raft: Restored from snapshot 2-5-1471383560779
2016/08/16 14:39:20 [INFO] raft: Initial configuration (index=1): [{Suffrage:Voter ID:10.212.15.121:8300 Address:10.212.15.121:8300}]
...
```
If any servers managed to perform a graceful leave, you may need to have them
rejoin the cluster using the [`join`](/docs/commands/join) command:
```text
$ consul join <Node Address>
Successfully joined cluster by contacting 1 nodes.
```
It should be noted that any existing member can be used to rejoin the cluster
as the gossip protocol will take care of discovering the server nodes.
At this point, the cluster should be in an operable state again. One of the
nodes should claim leadership and emit a log like:
```text
[INFO] consul: cluster leadership acquired
```
In Consul 0.7 and later, you can use the [`consul operator`](/docs/commands/operator#raft-list-peers)
command to inspect the Raft configuration:
```
$ consul operator raft list-peers
Node ID Address State Voter RaftProtocol
alice 10.0.1.8:8300 10.0.1.8:8300 follower true 3
bob 10.0.1.6:8300 10.0.1.6:8300 leader true 3
carol 10.0.1.7:8300 10.0.1.7:8300 follower true 3
```
## Summary
In this guided we reviewed how to recover from a Consul server outage. Depending on the
quorum size and number of failed servers, the recovery process will vary. In the event of
complete failure it is beneficial to have a
[backup process](https://www.consul.io/docs/guides/deployment-guide.html#backups).

View File

@ -1,177 +0,0 @@
---
layout: docs
page_title: Semaphore
description: >-
This guide demonstrates how to implement a distributed semaphore using the
Consul KV store.
---
# Semaphore
A distributed semaphore can be useful when you want to coordinate many services, while
restricting access to certain resources. In this guide we will focus on using Consul's support for
sessions and Consul KV to build a distributed
semaphore. Note, there are a number of ways that a semaphore can be built, we will not cover all the possible methods in this guide.
To complete this guide successfully, you should have familiarity with
[Consul KV](/docs/agent/kv) and Consul [sessions](/docs/internals/sessions).
~> If you only need mutual exclusion or leader election,
[this guide](/docs/guides/leader-election)
provides a simpler algorithm that can be used instead.
## Contending Nodes in the Semaphore
Let's imagine we have a set of nodes who are attempting to acquire a slot in the
semaphore. All nodes that are participating should agree on three decisions
- the prefix in the KV store used to coordinate.
- a single key to use as a lock.
- a limit on the number of slot holders.
### Session
The first step is for each contending node to create a session. Sessions allow us to build a system that
can gracefully handle failures.
This is done using the
[Session HTTP API](/api/session#session_create).
```shell
curl -X PUT -d '{"Name": "db-semaphore"}' \
http://localhost:8500/v1/session/create
```
This will return a JSON object contain the session ID.
```json
{
"ID": "4ca8e74b-6350-7587-addf-a18084928f3c"
}
```
-> **Note:** Sessions by default only make use of the gossip failure detector. That is, the session is considered held by a node as long as the default Serf health check has not declared the node unhealthy. Additional checks can be specified at session creation if desired.
### KV Entry for Node Locks
Next, we create a lock contender entry. Each contender creates a kv entry that is tied
to a session. This is done so that if a contender is holding a slot and fails, its session
is detached from the key, which can then be detected by the other contenders.
Create the contender key by doing an `acquire` on `<prefix>/<session>` via `PUT`.
```shell
curl -X PUT -d <body> http://localhost:8500/v1/kv/<prefix>/<session>?acquire=<session>
```
`body` can be used to associate a meaningful value with the contender, such as its nodes name.
This body is opaque to Consul but can be useful for human operators.
The `<session>` value is the ID returned by the call to
[`/v1/session/create`](/api/session#session_create).
The call will either return `true` or `false`. If `true`, the contender entry has been
created. If `false`, the contender node was not created; it's likely that this indicates
a session invalidation.
### Single Key for Coordination
The next step is to create a single key to coordinate which holders are currently
reserving a slot. A good choice for this lock key is simply `<prefix>/.lock`. We will
refer to this special coordinating key as `<lock>`.
```shell
curl -X PUT -d <body> http://localhost:8500/v1/kv/<lock>?cas=0
```
Since the lock is being created, a `cas` index of 0 is used so that the key is only put if it does not exist.
The `body` of the request should contain both the intended slot limit for the semaphore and the session ids
of the current holders (initially only of the creator). A simple JSON body like the following works.
```json
{
"Limit": 2,
"Holders": ["<session>"]
}
```
## Semaphore Management
The current state of the semaphore is read by doing a `GET` on the entire `<prefix>`.
```shell
curl http://localhost:8500/v1/kv/<prefix>?recurse
```
Within the list of the entries, we should find two keys: the `<lock>` and the
contender key `<prefix>/<session>`.
```json
[
{
"LockIndex": 0,
"Key": "<lock>",
"Flags": 0,
"Value": "eyJMaW1pdCI6IDIsIkhvbGRlcnMiOlsiPHNlc3Npb24+Il19",
"Session": "",
"CreateIndex": 898,
"ModifyIndex": 901
},
{
"LockIndex": 1,
"Key": "<prefix>/<session>",
"Flags": 0,
"Value": null,
"Session": "<session>",
"CreateIndex": 897,
"ModifyIndex": 897
}
]
```
Note that the `Value` we embedded into `<lock>` is Base64 encoded when returned by the API.
When the `<lock>` is read and its `Value` is decoded, we can verify the `Limit` agrees with the `Holders` count.
This is used to detect a potential conflict. The next step is to determine which of the current
slot holders are still alive. As part of the results of the `GET`, we also have all the contender
entries. By scanning those entries, we create a set of all the `Session` values. Any of the
`Holders` that are not in that set are pruned. In effect, we are creating a set of live contenders
based on the list results and doing a set difference with the `Holders` to detect and prune
any potentially failed holders. In this example `<session>` is present in `Holders` and
is attached to the key `<prefix>/<session>`, so no pruning is required.
If the number of holders after pruning is less than the limit, a contender attempts acquisition
by adding its own session to the `Holders` list and doing a Check-And-Set update of the `<lock>`.
This performs an optimistic update.
This is done with:
```shell
curl -X PUT -d <Updated Lock Body> http://localhost:8500/v1/kv/<lock>?cas=<lock-modify-index>
```
`lock-modify-index` is the latest `ModifyIndex` value known for `<lock>`, 901 in this example.
If this request succeeds with `true`, the contender now holds a slot in the semaphore.
If this fails with `false`, then likely there was a race with another contender to acquire the slot.
To re-attempt the acquisition, we watch for changes on `<prefix>`. This is because a slot
may be released, a node may fail, etc. Watching for changes is done via a blocking query
against `/kv/<prefix>?recurse`.
Slot holders **must** continuously watch for changes to `<prefix>` since their slot can be
released by an operator or automatically released due to a false positive in the failure detector.
On changes to `<prefix>` the locks `Holders` list must be re-checked to ensure the slot
is still held. Additionally, if the watch fails to connect the slot should be considered lost.
This semaphore system is purely _advisory_. Therefore it is up to the client to verify
that a slot is held before (and during) execution of some critical operation.
Lastly, if a slot holder ever wishes to release its slot voluntarily, it should be done by doing a
Check-And-Set operation against `<lock>` to remove its session from the `Holders` object.
Once that is done, both its contender key `<prefix>/<session>` and session should be deleted.
## Summary
In this guide we created a distributed semaphore using Consul KV and Consul sessions. We also learned how to manage the newly created semaphore.

View File

@ -1,75 +0,0 @@
---
layout: docs
page_title: Windows Service
description: >-
By using the _sc_ command either on Powershell or
the Windows command line, you can make Consul run as a service. For more
details about the _sc_ command
the Windows page for
[sc](https://msdn.microsoft.com/en-us/library/windows/desktop/ms682107(v=vs.85).aspx)
should help you get started.
---
# Run Consul as a Service on Windows
By using the _sc_ command, either on Powershell or
the Windows command line, you can run Consul as a service. For more details about the _sc_ command
the Windows page for [sc](<https://msdn.microsoft.com/en-us/library/windows/desktop/ms682107(v=vs.85).aspx>)
should help you get started.
Before installing Consul, you will need to create a permanent directory for storing the configuration files. Once that directory is created, you will set it when starting Consul with the `-config-dir` option.
In this guide, you will download the Consul binary, register the Consul service
with the Service Manager, and finally start Consul.
The steps presented here, assume that you have launched Powershell with _Adminstrator_ capabilities.
## Installing Consul as a Service
Download the Consul binary for your architecture.
Use the _sc_ command to create a service named **Consul**, that will load configuration files from the `config-dir`. Read the agent configuration
[documentation](/docs/agent/options#configuration-files) to learn more about configuration options.
```text
sc.exe create "Consul" binPath= "<path to the Consul.exe> agent -config-dir <path to configuration directory>" start= auto
[SC] CreateService SUCCESS
```
If you get an output that is similar to the one above, then your service is
registered with the Service Manager.
If you get an error, please check that
you have specified the proper path to the binary and check if you've entered the arguments correctly for the Consul service.
## Running Consul as a Service
You have two options for starting the service.
The first option is to use the Windows Service Manager, and look for **Consul** under the service name. Click the _start_ button to start the service.
The second option is to use the _sc_ command.
```text
sc.exe start "Consul"
SERVICE_NAME: Consul
TYPE : 10 WIN32_OWN_PROCESS
STATE : 4 RUNNING (STOPPABLE, NOT_PAUSABLE, ACCEPTS_SHUTDOWN)
WIN32_EXIT_CODE : 0 (0x0)
SERVICE_EXIT_CODE : 0 (0x0)
CHECKPOINT : 0x0
WAIT_HINT : 0x0
PID : 8008
FLAGS :
```
The service automatically starts up during/after boot, so you don't need to
launch Consul from the command-line again.
## Summary
In this guide you setup a Consul service on Windows. This process can be repeated to setup an entire cluster of agents.