The Consul agent is the core process of Consul. The agent maintains membership information, registers services, runs checks, responds to queries and more. The agent must run on every node that is part of a Consul cluster.
---
# Consul Agent
@ -25,7 +27,7 @@ running forever or until told to quit. The agent command takes a variety
of configuration options but the defaults are usually good enough. When
running `consul agent`, you should see output similar to that below:
One of the primary roles of the agent is the management of system and application level health checks. A health check is considered to be application level if it associated with a service. A check is defined in a configuration file, or added at runtime over the HTTP interface.
---
# Checks
@ -30,25 +32,29 @@ There are two different kinds of checks:
A check definition that is a script looks like:
{
"check": {
"id": "mem-util",
"name": "Memory utilization",
"script": "/usr/local/bin/check_mem.py",
"interval": "10s"
}
}
```javascript
{
"check": {
"id": "mem-util",
"name": "Memory utilization",
"script": "/usr/local/bin/check_mem.py",
"interval": "10s"
}
}
```
A TTL based check is very similar:
{
"check": {
"id": "web-app",
"name": "Web App Status",
"notes": "Web app does a curl internally every 10 seconds",
"ttl": "30s"
}
}
```javascript
{
"check": {
"id": "web-app",
"name": "Web App Status",
"notes": "Web app does a curl internally every 10 seconds",
"ttl": "30s"
}
}
```
Both types of definitions must include a `name`, and may optionally
provide an `id` and `notes` field. The `id` is set to the `name` if not
One of the primary query interfaces for Consul is using DNS. The DNS interface allows applications to make use of service discovery without any high-touch integration with Consul.
---
# DNS Interface
@ -50,25 +52,26 @@ DNS lookup for nodes in other datacenters, with no additional effort.
For a node lookup, the only records returned are A records with the IP address of
the node.
$ dig @127.0.0.1 -p 8600 foobar.node.consul ANY
```text
$ dig @127.0.0.1 -p 8600 foobar.node.consul ANY
; <<>> DiG 9.8.3-P1 <<>> @127.0.0.1 -p 8600 foobar.node.consul ANY
The Consul agent supports encrypting all of its network traffic. The exact method of this encryption is described on the encryption internals page. There are two seperate systems, one for gossip traffic and one for RPC.
---
# Encryption
@ -19,7 +21,7 @@ in a configuration file for the agent. The key must be 16-bytes that are base64
encoded. The easiest method to obtain a cryptographically suitable key is by
using `consul keygen`.
```
```text
$ consul keygen
cg8StVXbQJ0gPvMd9o7yrg==
```
@ -27,7 +29,7 @@ cg8StVXbQJ0gPvMd9o7yrg==
With that key, you can enable encryption on the agent. You can verify
encryption is enabled because the output will include "Encrypted: true".
The agent has various configuration options that can be specified via the command-line or via configuration files. All of the configuration options are completely optional and their defaults will be specified with their descriptions.
The Consul agent provides a complete RPC mechanism that can be used to control the agent programmatically. This RPC mechanism is the same one used by the CLI, but can be used by other applications to easily leverage the power of Consul without directly embedding.
---
# RPC Protocol
@ -24,15 +26,21 @@ that is broadly available across languages.
All RPC requests have a request header, and some requests have
a request body. The request header looks like:
```
{"Command": "Handshake", "Seq": 0}
```javascript
{
"Command": "Handshake",
"Seq": 0
}
```
All responses have a response header, and some may contain
a response body. The response header looks like:
```
{"Seq": 0, "Error": ""}
```javascript
{
"Seq": 0,
"Error": ""
}
```
The `Command` is used to specify what command the server should
@ -65,8 +73,10 @@ the server which version the client is using.
The request header must be followed with a handshake body, like:
```
{"Version": 1}
```javascript
{
"Version": 1
}
```
The body specifies the IPC version being used, however only version
@ -81,8 +91,10 @@ response and check for an error.
This command is used to remove failed nodes from a cluster. It takes
the following body:
```
{"Node": "failed-node-name"}
```javascript
{
"Node": "failed-node-name"
}
```
There is no special response body.
@ -92,8 +104,14 @@ There is no special response body.
This command is used to join an existing cluster using a known node.
One of the main goals of service discovery is to provide a catalog of available services. To that end, the agent provides a simple service definition format to declare the availability of a service, and to potentially associate it with a health check. A health check is considered to be application level if it associated with a service. A service is defined in a configuration file, or added at runtime over the HTTP interface.
---
# Services
@ -17,17 +19,19 @@ or added at runtime over the HTTP interface.
A service definition that is a script looks like:
{
"service": {
"name": "redis",
"tags": ["master"],
"port": 8000,
"check": {
"script": "/usr/local/bin/check_redis.py",
"interval": "10s"
}
}
```javascript
{
"service": {
"name": "redis",
"tags": ["master"],
"port": 8000,
"check": {
"script": "/usr/local/bin/check_redis.py",
"interval": "10s"
}
}
}
```
A service definition must include a `name`, and may optionally provide
an `id`, `tags`, `port`, and `check`. The `id` is set to the `name` if not
The Consul agent collects various metrics data at runtime about the performance of different libraries and sub-systems. These metrics are aggregated on a ten second interval and are retained for one minute.
---
# Telemetry
@ -25,7 +27,7 @@ aggregate and flushed to Graphite or any other metrics store.
Watches are a way of specifying a view of data (list of nodes, KV pairs, health checks, etc) which is monitored for any updates. When an update is detected, an external handler is invoked. A handler can be any executable. As an example, you could watch the status of health checks and notify an external system when a check is critical.
---
# Watches
@ -74,11 +76,13 @@ This maps to the `/v1/kv/` API internally.
Here is an example configuration:
{
"type": "key",
"key": "foo/bar/baz",
"handler": "/usr/bin/my-key-handler.sh"
}
```javascript
{
"type": "key",
"key": "foo/bar/baz",
"handler": "/usr/bin/my-key-handler.sh"
}
```
Or, using the watch command:
@ -86,15 +90,17 @@ Or, using the watch command:
An example of the output of this command:
{
"Key": "foo/bar/baz",
"CreateIndex": 1793,
"ModifyIndex": 1793,
"LockIndex": 0,
"Flags": 0,
"Value": "aGV5",
"Session": ""
}
```javascript
{
"Key": "foo/bar/baz",
"CreateIndex": 1793,
"ModifyIndex": 1793,
"LockIndex": 0,
"Flags": 0,
"Value": "aGV5",
"Session": ""
}
```
### Type: keyprefix
@ -105,11 +111,13 @@ This maps to the `/v1/kv/` API internally.
Here is an example configuration:
{
"type": "keyprefix",
"prefix": "foo/",
"handler": "/usr/bin/my-prefix-handler.sh"
}
```javascript
{
"type": "keyprefix",
"prefix": "foo/",
"handler": "/usr/bin/my-prefix-handler.sh"
}
```
Or, using the watch command:
@ -117,36 +125,37 @@ Or, using the watch command:
An example of the output of this command:
[
{
"Key": "foo/bar",
"CreateIndex": 1796,
"ModifyIndex": 1796,
"LockIndex": 0,
"Flags": 0,
"Value": "TU9BUg==",
"Session": ""
},
{
"Key": "foo/baz",
"CreateIndex": 1795,
"ModifyIndex": 1795,
"LockIndex": 0,
"Flags": 0,
"Value": "YXNkZg==",
"Session": ""
},
{
"Key": "foo/test",
"CreateIndex": 1793,
"ModifyIndex": 1793,
"LockIndex": 0,
"Flags": 0,
"Value": "aGV5",
"Session": ""
}
]
```javascript
[
{
"Key": "foo/bar",
"CreateIndex": 1796,
"ModifyIndex": 1796,
"LockIndex": 0,
"Flags": 0,
"Value": "TU9BUg==",
"Session": ""
},
{
"Key": "foo/baz",
"CreateIndex": 1795,
"ModifyIndex": 1795,
"LockIndex": 0,
"Flags": 0,
"Value": "YXNkZg==",
"Session": ""
},
{
"Key": "foo/test",
"CreateIndex": 1793,
"ModifyIndex": 1793,
"LockIndex": 0,
"Flags": 0,
"Value": "aGV5",
"Session": ""
}
]
```
### Type: services
@ -157,11 +166,13 @@ This maps to the `/v1/catalog/services` API internally.
An example of the output of this command:
{
"consul": [],
"redis": [],
"web": []
}
```javascript
{
"consul": [],
"redis": [],
"web": []
}
```
### Type: nodes
@ -172,32 +183,34 @@ This maps to the `/v1/catalog/nodes` API internally.
An example of the output of this command:
[
{
"Node": "nyc1-consul-1",
"Address": "192.241.159.115"
},
{
"Node": "nyc1-consul-2",
"Address": "192.241.158.205"
},
{
"Node": "nyc1-consul-3",
"Address": "198.199.77.133"
},
{
"Node": "nyc1-worker-1",
"Address": "162.243.162.228"
},
{
"Node": "nyc1-worker-2",
"Address": "162.243.162.226"
},
{
"Node": "nyc1-worker-3",
"Address": "162.243.162.229"
}
]
```javascript
[
{
"Node": "nyc1-consul-1",
"Address": "192.241.159.115"
},
{
"Node": "nyc1-consul-2",
"Address": "192.241.158.205"
},
{
"Node": "nyc1-consul-3",
"Address": "198.199.77.133"
},
{
"Node": "nyc1-worker-1",
"Address": "162.243.162.228"
},
{
"Node": "nyc1-worker-2",
"Address": "162.243.162.226"
},
{
"Node": "nyc1-worker-3",
"Address": "162.243.162.229"
}
]
```
### Type: service
@ -211,11 +224,13 @@ This maps to the `/v1/health/service` API internally.
Here is an example configuration:
{
"type": "service",
"service": "redis",
"handler": "/usr/bin/my-service-handler.sh"
}
```javascript
{
"type": "service",
"key": "redis",
"handler": "/usr/bin/my-service-handler.sh"
}
```
Or, using the watch command:
@ -223,42 +238,44 @@ Or, using the watch command:
An example of the output of this command:
[
{
"Node": {
"Node": "foobar",
"Address": "10.1.10.12"
},
"Service": {
"ID": "redis",
"Service": "redis",
"Tags": null,
"Port": 8000
},
"Checks": [
{
"Node": "foobar",
"CheckID": "service:redis",
"Name": "Service 'redis' check",
"Status": "passing",
"Notes": "",
"Output": "",
"ServiceID": "redis",
"ServiceName": "redis"
},
{
"Node": "foobar",
"CheckID": "serfHealth",
"Name": "Serf Health Status",
"Status": "passing",
"Notes": "",
"Output": "",
"ServiceID": "",
"ServiceName": ""
}
]
}
```javascript
[
{
"Node": {
"Node": "foobar",
"Address": "10.1.10.12"
},
"Service": {
"ID": "redis",
"Service": "redis",
"Tags": null,
"Port": 8000
},
"Checks": [
{
"Node": "foobar",
"CheckID": "service:redis",
"Name": "Service 'redis' check",
"Status": "passing",
"Notes": "",
"Output": "",
"ServiceID": "redis",
"ServiceName": "redis"
},
{
"Node": "foobar",
"CheckID": "serfHealth",
"Name": "Serf Health Status",
"Status": "passing",
"Notes": "",
"Output": "",
"ServiceID": "",
"ServiceName": ""
}
]
}
]
```
### Type: checks
@ -272,19 +289,20 @@ or `/v1/health/checks/` if monitoring by service.
An example of the output of this command:
[
{
"Node": "foobar",
"CheckID": "service:redis",
"Name": "Service 'redis' check",
"Status": "passing",
"Notes": "",
"Output": "",
"ServiceID": "redis",
"ServiceName": "redis"
}
]
```javascript
[
{
"Node": "foobar",
"CheckID": "service:redis",
"Name": "Service 'redis' check",
"Status": "passing",
"Notes": "",
"Output": "",
"ServiceID": "redis",
"ServiceName": "redis"
}
]
```
### Type: event
@ -297,11 +315,13 @@ This maps to the `v1/event/list` API internally.
Here is an example configuration:
{
"type": "event",
"name": "web-deploy",
"handler": "/usr/bin/my-deploy-handler.sh"
}
```javascript
{
"type": "event",
"name": "web-deploy",
"handler": "/usr/bin/my-deploy-handler.sh"
}
```
Or, using the watch command:
@ -309,21 +329,22 @@ Or, using the watch command:
An example of the output of this command:
[
{
"ID": "f07f3fcc-4b7d-3a7c-6d1e-cf414039fcee",
"Name": "web-deploy",
"Payload": "MTYwOTAzMA==",
"NodeFilter": "",
"ServiceFilter": "",
"TagFilter": "",
"Version": 1,
"LTime": 18
},
...
]
```javascript
[
{
"ID": "f07f3fcc-4b7d-3a7c-6d1e-cf414039fcee",
"Name": "web-deploy",
"Payload": "MTYwOTAzMA==",
"NodeFilter": "",
"ServiceFilter": "",
"TagFilter": "",
"Version": 1,
"LTime": 18
},
...
]
```
To fire a new `web-deploy` event the following could be used:
The `consul agent` command is the heart of Consul: it runs the agent that performs the important task of maintaining membership information, running checks, announcing services, handling queries, etc.
The event command provides a mechanism to fire a custom user event to an entire datacenter. These events are opaque to Consul, but they can be used to build scripting infrastructure to do automated deploys, restart services, or perform any other orchestration action. Events can be handled by using a watch.
---
# Consul Event
Command: `consul event`
The event command provides a mechanism to fire a custom user event to an
The `event` command provides a mechanism to fire a custom user event to an
entire datacenter. These events are opaque to Consul, but they can be used
to build scripting infrastructure to do automated deploys, restart services,
or perform any other orchestration action. Events can be handled by
The exec command provides a mechanism for remote execution. For example, this can be used to run the `uptime` command across all machines providing the `web` service.
---
# Consul Exec
Command: `consul exec`
The exec command provides a mechanism for remote execution. For example,
The `exec` command provides a mechanism for remote execution. For example,
this can be used to run the `uptime` command across all machines providing
The `force-leave` command forces a member of a Consul cluster to enter the left state. Note that if the member is still actually alive, it will eventually rejoin the cluster. The true purpose of this method is to force remove failed nodes.
Consul is controlled via a very easy to use command-line interface (CLI). Consul is only a single command-line application: `consul`. This application then takes a subcommand such as agent or members. The complete list of subcommands is in the navigation to the left.
---
# Consul Commands (CLI)
@ -19,7 +21,7 @@ as you'd most likely expect. And some commands that expect input accept
To view a list of the available commands at any time, just run `consul` with
no arguments:
```
```text
$ consul
usage: consul [--version] [--help] <command> [<args>]
@ -42,7 +44,7 @@ Available commands are:
To get help for any specific command, pass the `-h` flag to the relevant
subcommand. For example, to see help about the `join` subcommand:
```
```text
$ consul join -h
Usage: consul join [options] address ...
@ -53,5 +55,4 @@ Options:
-rpc-addr=127.0.0.1:8400 RPC address of the Consul agent.
-wan Joins a server to another server in the WAN pool
The `info` command provides various debugging information that can be useful to operators. Depending on if the agent is a client or server, information about different sub-systems will be returned.
---
# Consul Info
Command: `consul info`
The info command provides various debugging information that can be
The `info` command provides various debugging information that can be
useful to operators. Depending on if the agent is a client or server,
information about different sub-systems will be returned.
@ -22,47 +24,49 @@ There are currently the top-level keys for:
The `join` command tells a Consul agent to join an existing cluster. A new Consul agent must join with at least one existing member of a cluster in order to join an existing cluster. After joining that one member, the gossip layer takes over, propagating the updated membership state across the cluster.
---
# Consul Join
Command: `consul join`
The `consul join` command tells a Consul agent to join an existing cluster.
The `join` command tells a Consul agent to join an existing cluster.
A new Consul agent must join with at least one existing member of a cluster
in order to join an existing cluster. After joining that one member,
the gossip layer takes over, propagating the updated membership state across
The `keygen` command generates an encryption key that can be used for Consul agent traffic encryption. The keygen command uses a cryptographically strong pseudo-random number generator to generate the key.
---
# Consul Keygen
Command: `consul keygen`
The `consul keygen` command generates an encryption key that can be used for
The `keygen` command generates an encryption key that can be used for
The `leave` command triggers a graceful leave and shutdown of the agent. It is used to ensure other nodes see the agent as left instead of failed. Nodes that leave will not attempt to re-join the cluster on restarting with a snapshot.
---
# Consul Leave
Command: `consul leave`
The leave command triggers a graceful leave and shutdown of the agent.
This is used to ensure other nodes see the agent as "left" instead of
The `leave` command triggers a graceful leave and shutdown of the agent.
It is used to ensure other nodes see the agent as "left" instead of
"failed". Nodes that leave will not attempt to re-join the cluster
The `members` command outputs the current list of members that a Consul agent knows about, along with their state. The state of a node can only be alive, left, or failed.
---
# Consul Members
Command: `consul members`
The members command outputs the current list of members that a Consul
The `members` command outputs the current list of members that a Consul
agent knows about, along with their state. The state of a node can only
The `monitor` command is used to connect and follow the logs of a running Consul agent. Monitor will show the recent logs and then continue to follow the logs, not exiting until interrupted or until the remote agent quits.
---
# Consul Monitor
Command: `consul monitor`
The monitor command is used to connect and follow the logs of a running
The `monitor` command is used to connect and follow the logs of a running
Consul agent. Monitor will show the recent logs and then continue to follow
the logs, not exiting until interrupted or until the remote agent quits.
The `watch` command provides a mechanism to watch for changes in a particular data view (list of nodes, service members, key value, etc) and to invoke a process with the latest values of the view. If no process is specified, the current values are dumped to stdout which can be a useful way to inspect data in Consul.
---
# Consul Watch
Command: `consul watch`
The watch command provides a mechanism to watch for changes in a particular
The `watch` command provides a mechanism to watch for changes in a particular
data view (list of nodes, service members, key value, etc) and to invoke
a process with the latest values of the view. If no process is specified,
the current values are dumped to stdout which can be a useful way to inspect
We expect Consul to run in large clusters as long-running agents. Because upgrading agents in this sort of environment relies heavily on protocol compatibility, this page makes it clear on our promise to keeping different Consul versions compatible with each other.
---
# Protocol Compatibility Promise
@ -28,25 +30,24 @@ upgrading, see the [upgrading page](/docs/upgrading.html).
Before a Consul cluster can begin to service requests, it is necessary for a server node to be elected leader. For this reason, the first nodes that are started are generally the server nodes. Remember that an agent can run in both client and server mode. Server nodes are responsible for running the consensus protocol, and storing the cluster state. The client nodes are mostly stateless and rely on the server nodes, so they can be started easily.
---
# Bootstrapping a Datacenter
@ -26,22 +28,28 @@ discouraged as data loss is inevitable in a failure scenario.
Suppose we are starting a 3 server cluster, we can start `Node A`, `Node B` and `Node C` providing
the `-bootstrap-expect 3` flag. Once the nodes are started, you should see a message to the effect of:
[WARN] raft: EnableSingleNode disabled, and no known peers. Aborting election.
```text
[WARN] raft: EnableSingleNode disabled, and no known peers. Aborting election.
```
This indicates that the nodes are expecting 2 peers, but none are known yet. The servers will not elect
themselves leader to prevent a split-brain. We can now join these machines together. Since a join operation
is symmetric it does not matter which node initiates it. From any node you can do the following:
$ consul join <NodeAAddress><NodeBAddress><NodeCAddress>
Successfully joined cluster by contacting 3 nodes.
```text
$ consul join <NodeAAddress><NodeBAddress><NodeCAddress>
Successfully joined cluster by contacting 3 nodes.
```
Once the join is successful, one of the nodes will output something like:
[INFO] consul: adding server foo (Addr: 127.0.0.2:8300) (DC: dc1)
[INFO] consul: adding server bar (Addr: 127.0.0.1:8300) (DC: dc1)
[INFO] consul: Attempting bootstrap with nodes: [127.0.0.3:8300 127.0.0.2:8300 127.0.0.1:8300]
```text
[INFO] consul: adding server foo (Addr: 127.0.0.2:8300) (DC: dc1)
[INFO] consul: adding server bar (Addr: 127.0.0.1:8300) (DC: dc1)
[INFO] consul: Attempting bootstrap with nodes: [127.0.0.3:8300 127.0.0.2:8300 127.0.0.1:8300]
...
[INFO] consul: cluster leadership acquired
[INFO] consul: cluster leadership acquired
```
As a sanity check, the `consul info` command is a useful tool. It can be used to
verify `raft.num_peers` is now 2, and you can view the latest log index under `raft.last_log_index`.
@ -64,4 +72,3 @@ In versions of Consul previous to 0.4, bootstrapping was a more manual process.
For a guide on using the `-bootstrap` flag directly, see the [manual bootstrapping guide](/docs/guides/manual-bootstrap.html).
This is not recommended, as it is more error prone than automatic bootstrapping.
One of the key features of Consul is its support for multiple datacenters. The architecture of Consul is designed to promote a low-coupling of datacenters, so that connectivity issues or failure of any datacenter does not impact the availability of Consul in other regions. This means each datacenter runs independently, with a dedicated group of servers and a private LAN gossip pool.
---
# Multi-Datacenter Deploys
@ -20,7 +22,7 @@ we can refer to as `dc1` and `dc2`, although the names are opaque to Consul.
The next step is to ensure that all the server nodes join the WAN gossip pool.
To query the known WAN nodes, we use the `members` command:
```
```text
$ consul members -wan
...
```
@ -31,7 +33,7 @@ to a datacenter-local server, which then forwards the request to a server in the
The next step is to simply join all the servers in the WAN pool:
```
```text
$ consul join -wan <server1><server2> ...
...
```
@ -46,14 +48,14 @@ Once this is done the `members` command can be used to verify that
all server nodes are known about. We can also verify that both datacenters
One of the main interfaces to Consul is DNS. Using DNS is a simple way integrate Consul into an existing infrastructure without any high-touch integration.
---
# DNS Caching
@ -57,18 +59,17 @@ that matches if there is no specific service TTL provided.
This is specified using the `dns_config.service_ttl` map. The "*" service
is the wildcard service. For example, if we specify:
```
{
"dns_config": {
"service_ttl": {
"*": "5s",
"web": "30s"
}
}
```javascript
{
"dns_config": {
"service_ttl": {
"*": "5s",
"web": "30s"
}
}
}
```
This sets all lookups to "web.service.consul" to use a 30 second TTL,
while lookups to "db.service.consul" or "api.service.consul" will use the
Very few infrastructures are entirely self-contained, and often rely on a multitude of external service providers. Most services are registered in Consul through the use of a service definition, however that registers the local node as the service provider. In the case of external services, we want to register a service as being provided by an external provider.
---
# Registering an External Service
@ -21,36 +23,41 @@ also appear in standard queries against the API.
Let us suppose we want to register a "search" service that is provided by
By default, DNS is served from port 53 which requires root privileges. Instead of running Consul as root, it is possible to instead run Bind and forward queries to Consul as appropriate.
---
# Forwarding DNS
@ -18,36 +20,40 @@ simplicity but this is not required.
First, you have to disable DNSSEC so that Consul and Bind can communicate.
This section provides various guides for common actions. Due to the nature of Consul, some of these procedures can be complex, so our goal is to provide guidance to do them safely.
---
# Consul Guides
@ -12,21 +14,14 @@ guidance to do them safely.
The following guides are available:
* [Adding/Removing Servers](/docs/guides/servers.html) - This guide covers how to safely add
and remove Consul servers from the cluster. This should be done carefully to avoid availability
outages.
* [Adding/Removing Servers](/docs/guides/servers.html) - This guide covers how to safely add and remove Consul servers from the cluster. This should be done carefully to avoid availability outages.
* [Bootstrapping](/docs/guides/bootstrapping.html) - This guide covers bootstrapping a new
datacenter. This covers safely adding the initial Consul servers.
* [Bootstrapping](/docs/guides/bootstrapping.html) - This guide covers bootstrapping a new datacenter. This covers safely adding the initial Consul servers.
* [DNS Forwarding](/docs/guides/forwarding.html) - Forward DNS queries from Bind to Consul
* [DNS Forwarding](/docs/guides/forwarding.html) - Forward DNS queries from Bind to Consul
* [External Services](/docs/guides/external.html) - This guide covers registering
an external service. This allows using 3rd party services within the Consul framework.
* [External Services](/docs/guides/external.html) - This guide covers registering an external service. This allows using 3rd party services within the Consul framework.
* [Multiple Datacenters](/docs/guides/datacenters.html) - Configuring Consul to support multiple
datacenters.
* [Outage Recovery](/docs/guides/outage.html) - This guide covers recovering a cluster
that has become unavailable due to server failures.
* [Multiple Datacenters](/docs/guides/datacenters.html) - Configuring Consul to support multiple datacenters.
* [Outage Recovery](/docs/guides/outage.html) - This guide covers recovering a cluster that has become unavailable due to server failures.
The goal of this guide is to cover how to build client-side leader election using Consul. If you are interested in the leader election used internally to Consul, you want to read about the consensus protocol instead.
---
# Leader Election
@ -21,7 +23,9 @@ The first flow we cover is for nodes who are attempting to acquire leadership
for a given service. All nodes that are participating should agree on a given
key being used to coordinate. A good choice is simply:
service/<servicename>/leader
```text
service/<servicename>/leader
```
We will refer to this as just `key` for simplicity.
@ -35,7 +39,9 @@ that may be needed.
Attempt to `acquire` the `key` by doing a `PUT`. This is something like:
curl -X PUT -d body http://localhost:8500/v1/kv/key?acquire=session
```text
curl -X PUT -d body http://localhost:8500/v1/kv/key?acquire=session
```
This will either return `true` or `false`. If `true` is returned, the lock
has been acquired and the local node is now the leader. If `false` is returned,
@ -54,7 +60,9 @@ wait. This is because Consul may be enforcing a [`lock-delay`](/docs/internals/s
If the leader ever wishes to step down voluntarily, this should be done by simply
releasing the lock:
curl -X PUT http://localhost:8500/v1/kv/key?release=session
```text
curl -X PUT http://localhost:8500/v1/kv/key?release=session
```
## Discovering a Leader
@ -70,4 +78,3 @@ the value of the key will provide all the application-dependent information requ
Clients should also watch the key using a blocking query for any changes. If the leader
steps down, or fails, then the `Session` associated with the key will be cleared. When
a new leader is elected, the key value will also be updated.
When deploying Consul to a datacenter for the first time, there is an initial bootstrapping that must be done. As of Consul 0.4, an automatic bootstrapping is available and is the recommended approach. However, older versions only support a manual bootstrap that is documented here.
---
# Manually Bootstrapping a Datacenter
When deploying Consul to a datacenter for the first time, there is an initial bootstrapping that
must be done. As of Consul 0.4, an [automatic bootstrapping](/docs/guides/bootstrapping.html) is
available and is the recommended approach. However, older versions only support a manual bootstrap
that is documented here.
Generally, the first nodes that are started are the server nodes. Remember that an
agent can run in both client and server mode. Server nodes are responsible for running
the [consensus protocol](/docs/internals/consensus.html), and storing the cluster state.
The client nodes are mostly stateless and rely on the server nodes, so they can be started easily.
Manual bootstrapping requires that the first server that is deployed in a new datacenter provide
the `-bootstrap` [configuration option](/docs/agent/options.html). This option allows the server to
assert leadership of the cluster without agreement from any other server. This is necessary because
at this point, there are no other servers running in the datacenter! Lets call this first server `Node A`.
When starting `Node A` something like the following will be logged:
Do not panic! This is a critical first step. Depending on your deployment configuration, it may take only a single server failure for cluster unavailability. Recovery requires an operator to intervene, but is straightforward.
---
# Outage Recovery
@ -11,15 +13,14 @@ Do not panic! This is a critical first step. Depending on your
take only a single server failure for cluster unavailability. Recovery
requires an operator to intervene, but is straightforward.
<divclass="alert alert-block alert-warning">
This page covers recovery from Consul becoming unavailable due to a majority
~> This page covers recovery from Consul becoming unavailable due to a majority
of server nodes in a datacenter being lost. If you are just looking to
add or remove a server <ahref="/docs/guides/servers.html">see this page</a>.
</div>
add or remove a server [see this page](/docs/guides/servers.html).
If you had only a single server and it has failed, simply restart it.
Note that a single server configuration requires the `-bootstrap` or `-bootstrap-expect 1` flag.
If that server cannot be recovered, you need to bring up a new server.
Note that a single server configuration requires the `-bootstrap` or
`-bootstrap-expect 1` flag. If that server cannot be recovered, you need to
bring up a new server.
See the [bootstrapping guide](/docs/guides/bootstrapping.html). Data loss
is inevitable, since data was not replicated to any other servers. This
is why a single server deploy is never recommended. Any services registered
@ -35,8 +36,12 @@ The next step is to go to the `-data-dir` of each Consul server. Inside
that directory, there will be a `raft/` sub-directory. We need to edit
the `raft/peers.json` file. It should be something like:
```
["10.0.1.8:8300","10.0.1.6:8300","10.0.1.7:8300"]
```javascript
[
"10.0.1.8:8300",
"10.0.1.6:8300",
"10.0.1.7:8300"
]
```
Simply delete the entries for all the failed servers. You must confirm
@ -47,7 +52,7 @@ At this point, you can restart all the remaining servers. If any servers
managed to perform a graceful leave, you may need to have then rejoin
the cluster using the `join` command:
```
```text
$ consul join <NodeAddress>
Successfully joined cluster by contacting 1 nodes.
```
@ -58,13 +63,13 @@ as the gossip protocol will take care of discovering the server nodes.
At this point the cluster should be in an operable state again. One of the
nodes should claim leadership and emit a log like:
```
```text
[INFO] consul: cluster leadership acquired
```
Additional, the `info` command can be a useful debugging tool:
```
```text
$ consul info
...
raft:
@ -85,4 +90,3 @@ You should verify that one server claims to be the `Leader`, and all the
others should be in the `Follower` state. All the nodes should agree on the
peer count as well. This count is (N-1), since a server does not count itself
Consul is designed to require minimal operator involvement, however any changes to the set of Consul servers must be handled carefully. To better understand why, reading about the consensus protocol will be useful. In short, the Consul servers perform leader election and replication. For changes to be processed, a minimum quorum of servers (N/2)+1 must be available. That means if there are 3 server nodes, at least 2 must be available.
---
# Adding/Removing Servers
@ -22,26 +24,32 @@ Adding new servers is generally straightforward. Simply start the new
server with the `-server` flag. At this point, the server will not be a member of
any cluster, and should emit something like:
[WARN] raft: EnableSingleNode disabled, and no known peers. Aborting election.
```text
[WARN] raft: EnableSingleNode disabled, and no known peers. Aborting election.
```
This means that it does not know about any peers and is not configured to elect itself.
This is expected, and we can now add this node to the existing cluster using `join`.
From the new server, we can join any member of the existing cluster:
$ consul join <NodeAddress>
Successfully joined cluster by contacting 1 nodes.
```text
$ consul join <NodeAddress>
Successfully joined cluster by contacting 1 nodes.
```
It is important to note that any node, including a non-server may be specified for
join. The gossip protocol is used to properly discover all the nodes in the cluster.
Once the node has joined, the existing cluster leader should log something like:
Welcome to the Consul documentation! This documentation is more of a reference guide for all available features and options of Consul. If you're just getting started with Consul, please start with the introduction and getting started guide instead.
Consul provides an optional Access Control List (ACL) system which can be used to control access to data and APIs. The ACL system is a Capability-based system that relies on tokens which can have fine grained rules applied to them. It is very similar to AWS IAM in many ways.
---
# ACL System
@ -76,37 +78,40 @@ with JSON making it easy to machine generate.
As of Consul 0.4, it is only possible to specify policies for the
KV store. Specification in the HCL format looks like:
# Default all keys to read-only
key "" {
policy = "read"
}
key "foo/" {
policy = "write"
}
key "foo/private/" {
# Deny access to the private dir
policy = "deny"
}
```javascript
# Default all keys to read-only
key "" {
policy = "read"
}
key "foo/" {
policy = "write"
}
key "foo/private/" {
# Deny access to the private dir
policy = "deny"
}
```
This is equivalent to the following JSON input:
{
"key": {
"": {
"policy": "read",
},
"foo/": {
"policy": "write",
},
"foo/private": {
"policy": "deny",
}
}
```javascript
{
"key": {
"": {
"policy": "read",
},
"foo/": {
"policy": "write",
},
"foo/private": {
"policy": "deny",
}
}
}
```
Key policies provide both a prefix and a policy. The rules are enforced
using a longest-prefix match policy. This means we pick the most specific
policy possible. The policy is either "read", "write" or "deny". A "write"
policy implies "read", and there is no way to specify write-only. If there
is no applicable rule, the `acl_default_policy` is applied.
Consul is a complex system that has many different moving parts. To help users and developers of Consul form a mental model of how it works, this page documents the system architecture.
---
# Consul Architecture
@ -10,12 +12,10 @@ Consul is a complex system that has many different moving parts. To help
users and developers of Consul form a mental model of how it works, this
page documents the system architecture.
<divclass="alert alert-block alert-warning">
<strong>Advanced Topic!</strong> This page covers technical details of
~> **Advanced Topic!** This page covers technical details of
the internals of Consul. You don't need to know these details to effectively
operate and use Consul. These details are documented here for those who wish
to learn about them without having to go spelunking through the source code.
</div>
## Glossary
@ -70,7 +70,7 @@ allowing a client to make a request from a server.
From a 10,000 foot altitude the architecture of Consul looks like this:
Consul uses a consensus protocol to provide Consistency as defined by CAP. This page documents the details of this internal protocol. The consensus protocol is based on Raft: In search of an Understandable Consensus Algorithm. For a visual explanation of Raft, see the The Secret Lives of Data.
---
# Consensus Protocol
@ -11,12 +13,10 @@ to provide [Consistency](http://en.wikipedia.org/wiki/CAP_theorem) as defined by
This page documents the details of this internal protocol. The consensus protocol is based on
["Raft: In search of an Understandable Consensus Algorithm"](https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf). For a visual explanation of Raft, see the [The Secret Lives of Data](http://thesecretlivesofdata.com/raft).
<divclass="alert alert-block alert-warning">
<strong>Advanced Topic!</strong> This page covers technical details of
~> **Advanced Topic!** This page covers technical details of
the internals of Consul. You don't need to know these details to effectively
operate and use Consul. These details are documented here for those who wish
to learn about them without having to go spelunking through the source code.
</div>
## Raft Protocol Overview
@ -139,7 +139,7 @@ supports 3 different consistency modes for reads.
The three read modes are:
* default - Raft makes use of leader leasing, providing a time window
* `default` - Raft makes use of leader leasing, providing a time window
in which the leader assumes its role is stable. However, if a leader
is partitioned from the remaining peers, a new leader may be elected
while the old leader is holding the lease. This means there are 2 leader
@ -151,12 +151,12 @@ The three read modes are:
only stale in a hard to trigger situation. The time window of stale reads
is also bounded, since the leader will step down due to the partition.
* consistent - This mode is strongly consistent without caveats. It requires
* `consistent` - This mode is strongly consistent without caveats. It requires
that a leader verify with a quorum of peers that it is still leader. This
introduces an additional round-trip to all server nodes. The trade off is
always consistent reads, but increased latency due to an extra round trip.
* stale - This mode allows any server to service the read, regardless of if
* `stale` - This mode allows any server to service the read, regardless of if
it is the leader. This means reads can be arbitrarily stale, but are generally
within 50 milliseconds of the leader. The trade off is very fast and scalable
reads but values will be stale. This mode allows reads without a leader, meaning
@ -172,45 +172,44 @@ recommended deployment is either 3 or 5 servers. A single server deployment
is _**highly**_ discouraged as data loss is inevitable in a failure scenario.
Consul uses a gossip protocol to manage membership and broadcast messages to the cluster. All of this is provided through the use of the Serf library. The gossip protocol used by Serf is based on SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol, with a few minor adaptations.
---
# Gossip Protocol
@ -13,12 +15,10 @@ used by Serf is based on
["SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol"](http://www.cs.cornell.edu/~asdas/research/dsn02-swim.pdf),
with a few minor adaptations. There are more details about [Serf's protocol here](http://www.serfdom.io/docs/internals/gossip.html).
<divclass="alert alert-block alert-warning">
<strong>Advanced Topic!</strong> This page covers technical details of
~> **Advanced Topic!** This page covers technical details of
the internals of Consul. You don't need to know these details to effectively
operate and use Consul. These details are documented here for those who wish
to learn about them without having to go spelunking through the source code.
</div>
## Gossip in Consul
@ -41,4 +41,3 @@ All of these features are provided by leveraging [Serf](http://www.serfdom.io/).
is used as an embedded library to provide these features. From a user perspective,
this is not important, since the abstraction should be masked by Consul. It can be useful
however as a developer to understand how this library is leveraged.
Jepsen is a tool written by Kyle Kingsbury that is designed to test the partition tolerance of distributed systems. It creates network partitions while fuzzing the system with random operations. The results are analyzed to see if the system violates any of the consistency properties it claims to have.
---
# Jepsen Testing
@ -30,7 +32,7 @@ Below is the output captured from Jepsen. We ran Jepsen multiple times,
and it passed each time. This output is only representative of a single
run.
```
```text
$ lein test :only jepsen.system.consul-test
lein test jepsen.system.consul-test
@ -4018,4 +4020,3 @@ INFO jepsen.system.consul - :n5 consul nuked
Consul relies on both a lightweight gossip mechanism and an RPC system to provide various features. Both of the systems have different security mechanisms that stem from their designs. However, the goals of Consuls security are to provide confidentiality, integrity and authentication.
---
# Security Model
@ -23,12 +25,10 @@ This means Consul communication is protected against eavesdropping, tampering,
or spoofing. This makes it possible to run Consul over untrusted networks such
as EC2 and other shared hosting providers.
<divclass="alert alert-block alert-warning">
<strong>Advanced Topic!</strong> This page covers the technical details of
~> **Advanced Topic!** This page covers the technical details of
the security model of Consul. You don't need to know these details to
operate and use Consul. These details are documented here for those who wish
to learn about them without having to go spelunking through the source code.
</div>
## Threat Model
@ -50,4 +50,3 @@ When designing security into a system you design it to fit the threat model.
Our goal is not to protect top secret data but to provide a "reasonable"
level of security that would require an attacker to commit a considerable
Consul provides a session mechansim which can be used to build distributed locks. Sessions act as a binding layer between nodes, health checks, and key/value data. They are designed to provide granular locking, and are heavily inspired by The Chubby Lock Service for Loosely-Coupled Distributed Systems.
---
# Sessions
@ -11,12 +13,10 @@ Sessions act as a binding layer between nodes, health checks, and key/value data
They are designed to provide granular locking, and are heavily inspired
by [The Chubby Lock Service for Loosely-Coupled Distributed Systems](http://research.google.com/archive/chubby.html).
<divclass="alert alert-block alert-warning">
<strong>Advanced Topic!</strong> This page covers technical details of
~> **Advanced Topic!** This page covers technical details of
the internals of Consul. You don't need to know these details to effectively
operate and use Consul. These details are documented here for those who wish
to learn about them without having to go spelunking through the source code.
</div>
## Session Design
@ -28,7 +28,7 @@ store to acquire locks, which are advisory mechanisms for mutual exclusion.
Below is a diagram showing the relationship between these components:
Consul is meant to be a long-running agent on any nodes participating in a Consul cluster. These nodes consistently communicate with each other. As such, protocol level compatibility and ease of upgrades is an important thing to keep in mind when using Consul.
---
# Upgrading Consul
@ -85,9 +87,7 @@ only specifies the protocol version to _speak_. Every Consul agent can
always understand the entire range of protocol versions it claims to
on `consul -v`.
<divclass="alert alert-block alert-warning">
<strong>By running a previous protocol version</strong>, some features
~> **By running a previous protocol version**, some features
of Consul, especially newer features, may not be available. If this is the
case, Consul will typically warn you. In general, you should always upgrade
your cluster so that you can run the latest protocol version.
After Consul is installed, the agent must be run. The agent can either run in a server or client mode. Each datacenter must have at least one server, although 3 or 5 is recommended. A single server deployment is highly discouraged as data loss is inevitable in a failure scenario.
---
# Run the Consul Agent
@ -19,7 +21,7 @@ will be part of the cluster.
For simplicity, we'll run a single Consul agent in server mode right now:
```
```text
$ consul agent -server -bootstrap-expect 1 -data-dir /tmp/consul
==> WARNING: BootstrapExpect Mode is specified as 1; this is the same as Bootstrap mode.
==> WARNING: Bootstrap mode enabled! Do not enable unless necessary
@ -53,12 +55,10 @@ data. From the log data, you can see that our agent is running in server mode,
and has claimed leadership of the cluster. Additionally, the local member has
been marked as a healthy member of the cluster.
<divclass="alert alert-block alert-warning">
<strong>Note for OS X Users:</strong> Consul uses your hostname as the
~> **Note for OS X Users:** Consul uses your hostname as the
default node name. If your hostname contains periods, DNS queries to
that node will not work with Consul. To avoid this, explicitly set
the name of your node with the <code>-node</code> flag.
</div>
the name of your node with the `-node` flag.
## Cluster Members
@ -66,7 +66,7 @@ If you run `consul members` in another terminal, you can see the members of
the Consul cluster. You should only see one member (yourself). We'll cover
joining clusters in the next section.
```
```text
$ consul members
Node Address Status Type Build Protocol
Armons-MacBook-Air 10.1.10.38:8301 alive server 0.3.0 2
@ -82,7 +82,7 @@ For a strongly consistent view of the world, use the
[HTTP API](/docs/agent/http.html), which forwards the request to the
We've now seen how simple it is to run Consul, add nodes and services, and query those nodes and services. In this section we will continue by adding health checks to both nodes and services, a critical component of service discovery that prevents using services that are unhealthy.
---
# Health Checks
@ -29,7 +31,7 @@ the second node.
The first file will add a host-level check, and the second will modify the web
Consul must first be installed on every node that will be a member of a Consul cluster. To make installation easy, Consul is distributed as a binary package for all supported platforms and architectures. This page will not cover how to compile Consul from source.
---
# Install Consul
@ -28,13 +30,15 @@ you would like.
If you are using [homebrew](http://brew.sh/#install) as a package manager,
than you can install consul as simple as:
```
brew cask install consul
```text
$ brew cask install consul
```
if you are missing the [cask plugin](http://caskroom.io/) you can install it with:
```
brew install caskroom/cask/brew-cask
```text
$ brew install caskroom/cask/brew-cask
```
## Verifying the Installation
@ -43,7 +47,7 @@ After installing Consul, verify the installation worked by opening a new
terminal session and checking that `consul` is available. By executing
`consul` you should see help output similar to that below:
```
```text
$ consul
usage: consul [--version] [--help] <command> [<args>]
By this point, we've started our first agent and registered and queried one or more services on that agent. This showed how easy it is to use Consul, but didn't show how this could be extended to a scalable production service discovery infrastructure. On this page, we'll create our first real cluster with multiple members.
---
# Consul Cluster
@ -33,7 +35,7 @@ and it *must* be accessible by all other nodes in the cluster. The first node
will act as our server in this cluster. We're still not making a cluster
of servers.
```
```text
$ consul agent -server -bootstrap-expect 1 -data-dir /tmp/consul \
-node=agent-one -bind=172.20.20.10
...
@ -44,7 +46,7 @@ This time, we set the bind address to match the IP of the second node
as specified in the Vagrantfile. In production, you will generally want
to provide a bind address or interface as well.
```
```text
$ consul agent -data-dir /tmp/consul -node=agent-two -bind=172.20.20.11
...
```
@ -59,7 +61,7 @@ against each agent and noting that only one member is a part of each.
Now, let's tell the first agent to join the second agent by running
the following command in a new terminal:
```
```text
$ consul join 172.20.20.11
Successfully joined cluster by contacting 1 nodes.
```
@ -69,19 +71,16 @@ carefully, you'll see that they received join information. If you
run `consul members` against each agent, you'll see that both agents now
In addition to providing service discovery and integrated health checking, Consul provides an easy to use Key/Value store. This can be used to hold dynamic configuration, assist in service coordination, build leader election, and anything else a developer can think to build.
---
# Key/Value Data
@ -22,7 +24,7 @@ in the K/V store.
Querying the agent we started in a prior page, we can first verify that
there are no existing keys in the k/v store:
```
```text
$ curl -v http://localhost:8500/v1/kv/?recurse
* About to connect() to localhost port 8500 (#0)
* Trying 127.0.0.1... connected
@ -68,7 +70,7 @@ keys using the `?recurse` parameter.
That concludes the getting started guide for Consul. Hopefully you're able to see that while Consul is simple to use, it has a powerful set of features. We've covered the basics for all of these features in this guide.
---
# Next Steps
@ -26,4 +28,3 @@ As a next step, the following resources are available:
The work-in-progress examples folder within the GitHub
repository for Consul contains functional examples of various use cases
of Consul to help you get started with exactly what you need.
In the previous page, we ran our first agent, saw the cluster members, and queried that node. On this page, we'll register our first service and query that service. We're not yet running a cluster of Consul agents.
---
# Registering Services
@ -26,7 +28,7 @@ First, create a directory for Consul configurations. A good directory
is typically `/etc/consul.d`. Consul loads all configuration files in the
configuration directory.
```
```text
$ sudo mkdir /etc/consul.d
```
@ -35,14 +37,14 @@ pretend we have a service named "web" running on port 80. Additionally,
we'll give it some tags, which we can use as additional ways to query
Consul comes with support for a beautiful, functional web UI out of the box. This UI can be used for viewing all services and nodes, viewing all health checks and their current status, and for reading and setting key/value data. The UI automatically supports multi-datacenter.
---
# Consul Web UI
@ -33,7 +35,7 @@ A screenshot of one page of the demo is shown below so you can get an
idea of what the web UI is like. Click the screenshot for the full size.
Welcome to the intro guide to Consul! This guide is the best place to start with Consul. We cover what Consul is, what problems it can solve, how it compares to existing software, and a quick start for using Consul. If you are already familiar with the basics of Consul, the documentation provides more of a reference for all available features.
It is not uncommon to find people using Chef, Puppet, and other configuration management tools to build service discovery mechanisms. This is usually done by querying global state to construct configuration files on each node during a periodic convergence run.
As a code base grows, a monolithic app usually evolves into a Service Oriented Architecture (SOA). A universal pain point for SOA is service discovery and configuration. In many cases, this leads to organizations building home grown solutions. It is an undisputed fact that distributed systems are hard; building one is error prone and time consuming. Most systems cut corners by introducing single points of failure such as a single Redis or RDBMS to maintain cluster state. These solutions may work in the short term, but they are rarely fault tolerant or scalable. Besides these limitations, they require time and resources to build and maintain.
---
# Consul vs. Custom Solutions
As a code base grows, a monolithic app usually evolves into a Service Oriented Architecture (SOA).
A universal pain point for SOA is service discovery and configuration. In many
cases, this leads to organizations building home grown solutions.
It is an undisputed fact that distributed systems are hard; building one is error prone and time consuming.
Most systems cut corners by introducing single points of failure such
as a single Redis or RDBMS to maintain cluster state. These solutions may work in the short term,
but they are rarely fault tolerant or scalable. Besides these limitations,
they require time and resources to build and maintain.
As a code base grows, a monolithic app usually evolves into a Service Oriented
Architecture (SOA). A universal pain point for SOA is service discovery and
configuration. In many cases, this leads to organizations building home grown
solutions. It is an undisputed fact that distributed systems are hard; building
one is error prone and time consuming. Most systems cut corners by introducing
single points of failure such as a single Redis or RDBMS to maintain cluster
state. These solutions may work in the short term, but they are rarely fault
tolerant or scalable. Besides these limitations, they require time and resources
to build and maintain.
Consul provides the core set of features needed by a SOA out of the box. By using Consul,
organizations can leverage open source work to reduce their time and resource commitment to
re-inventing the wheel and focus on their business applications.
Consul provides the core set of features needed by a SOA out of the box. By
using Consul, organizations can leverage open source work to reduce their time
and resource commitment to re-inventing the wheel and focus on their business
applications.
Consul is built on well-cited research, and is designed with the constraints of
distributed systems in mind. At every step, Consul takes efforts to provide a robust
and scalable solution for organizations of any size.
distributed systems in mind. At every step, Consul takes efforts to provide a
robust and scalable solution for organizations of any size.
The problems Consul solves are varied, but each individual feature has been solved by many different systems. Although there is no single system that provides all the features of Consul, there are other options available to solve some of these problems.
---
# Consul vs. Other Software
The problems Consul solves are varied, but each individual feature has been
solved by many different systems. Although there is no single system that provides
all the features of Consul, there are other options available to solve some of these problems.
In this section, we compare Consul to some other options. In most cases, Consul is not
mutually exclusive with any other system.
solved by many different systems. Although there is no single system that
provides all the features of Consul, there are other options available to solve
some of these problems.
In this section, we compare Consul to some other options. In most cases, Consul
is not mutually exclusive with any other system.
Use the navigation to the left to read the comparison of Consul to specific
Serf is a node discovery and orchestration tool and is the only tool discussed so far that is built on an eventually consistent gossip model, with no centralized servers. It provides a number of features, including group membership, failure detection, event broadcasts and a query mechanism. However, Serf does not provide any high-level features such as service discovery, health checking or key/value storage. To clarify, the discovery feature of Serf is at a node level, while Consul provides a service and node level abstraction.
---
# Consul vs. Serf
@ -43,4 +45,3 @@ general purpose tool. Consul uses a CP architecture, favoring consistency over
availability. Serf is a AP system, and sacrifices consistency for availability.
This means Consul cannot operate if the central servers cannot form a quorum,
while Serf will continue to function under almost all circumstances.
SkyDNS is a relatively new tool designed to solve service discovery. It uses multiple central servers that are strongly consistent and fault tolerant. Nodes register services using an HTTP API, and queries can be made over HTTP or DNS to perform discovery.
SmartStack is another tool which tackles the service discovery problem. It has a rather unique architecture, and has 4 major components: ZooKeeper, HAProxy, Synapse, and Nerve. The ZooKeeper servers are responsible for storing cluster state in a consistent and fault tolerant manner. Each node in the SmartStack cluster then runs both Nerves and Synapses. The Nerve is responsible for running health checks against a service, and registering with the ZooKeeper servers. Synapse queries ZooKeeper for service providers and dynamically configures HAProxy. Finally, clients speak to HAProxy, which does health checking and load balancing across service providers.
---
# Consul vs. SmartStack
@ -54,4 +57,3 @@ an integrated key/value store for configuration and multi-datacenter support.
While it may be possible to configure SmartStack for multiple datacenters,
the central ZooKeeper cluster would be a serious impediment to a fault tolerant
ZooKeeper, doozerd and etcd are all similar in their architecture. All three have server nodes that require a quorum of nodes to operate (usually a simple majority). They are strongly consistent, and expose various primitives that can be used through client libraries within applications to build complex distributed systems.
---
# Consul vs. ZooKeeper, doozerd, etcd
@ -58,4 +60,3 @@ all these other systems require additional tools and libraries to be built on
top. By using client nodes, Consul provides a simple API that only requires thin clients.
Additionally, the API can be avoided entirely by using configuration files and the
DNS interface to have a complete service discovery solution with no development at all.