|
|
@ -72,7 +72,7 @@ There are several important components that `consul agent` outputs: |
|
|
|
agent. This is also the address other applications can use over [RPC to control Consul](/docs/agent/rpc.html). |
|
|
|
agent. This is also the address other applications can use over [RPC to control Consul](/docs/agent/rpc.html). |
|
|
|
|
|
|
|
|
|
|
|
* **Cluster Addr**: This is the address and ports used for communication between |
|
|
|
* **Cluster Addr**: This is the address and ports used for communication between |
|
|
|
Consul agents in a cluster. Every Consul agent in a cluster does not have to |
|
|
|
Consul agents in a cluster. Not all Consul agents in a cluster have to |
|
|
|
use the same port, but this address **MUST** be reachable by all other nodes. |
|
|
|
use the same port, but this address **MUST** be reachable by all other nodes. |
|
|
|
|
|
|
|
|
|
|
|
## Stopping an Agent |
|
|
|
## Stopping an Agent |
|
|
@ -105,7 +105,7 @@ this lifecycle is useful to building a mental model of an agent's interactions |
|
|
|
with a cluster, and how the cluster treats a node. |
|
|
|
with a cluster, and how the cluster treats a node. |
|
|
|
|
|
|
|
|
|
|
|
When an agent is first started, it does not know about any other node in the cluster. |
|
|
|
When an agent is first started, it does not know about any other node in the cluster. |
|
|
|
To discover it's peers, it must _join_ the cluster. This is done with the `join` |
|
|
|
To discover its peers, it must _join_ the cluster. This is done with the `join` |
|
|
|
command or by providing the proper configuration to auto-join on start. Once a node |
|
|
|
command or by providing the proper configuration to auto-join on start. Once a node |
|
|
|
joins, this information is gossiped to the entire cluster, meaning all nodes will |
|
|
|
joins, this information is gossiped to the entire cluster, meaning all nodes will |
|
|
|
eventually be aware of each other. If the agent is a server, existing servers will |
|
|
|
eventually be aware of each other. If the agent is a server, existing servers will |
|
|
@ -115,18 +115,18 @@ In the case of a network failure, some nodes may be unreachable by other nodes. |
|
|
|
In this case, unreachable nodes are marked as _failed_. It is impossible to distinguish |
|
|
|
In this case, unreachable nodes are marked as _failed_. It is impossible to distinguish |
|
|
|
between a network failure and an agent crash, so both cases are handled the same. |
|
|
|
between a network failure and an agent crash, so both cases are handled the same. |
|
|
|
Once a node is marked as failed, this information is updated in the service catalog. |
|
|
|
Once a node is marked as failed, this information is updated in the service catalog. |
|
|
|
There is some nuance here relating, since this update is only possible if the |
|
|
|
There is some nuance here since this update is only possible if the servers can |
|
|
|
servers can still [form a quorum](/docs/internals/consensus.html). Once the network |
|
|
|
still [form a quorum](/docs/internals/consensus.html). Once the network recovers, |
|
|
|
failure recovers, or a crashed agent restarts, the cluster will repair itself, |
|
|
|
or a crashed agent restarts, the cluster will repair itself, and unmark |
|
|
|
and unmark a node as failed. The health check in the catalog will also be updated |
|
|
|
a node as failed. The health check in the catalog will also be updated to reflect |
|
|
|
to reflect this. |
|
|
|
this. |
|
|
|
|
|
|
|
|
|
|
|
When a node _leaves_, it specifies it's intent to do so, and so the cluster |
|
|
|
When a node _leaves_, it specifies its intent to do so, and so the cluster |
|
|
|
marks that node as having _left_. Unlike the _failed_ case, all of the |
|
|
|
marks that node as having _left_. Unlike the _failed_ case, all of the |
|
|
|
services provided by a node are immediately deregistered. If the agent was |
|
|
|
services provided by a node are immediately deregistered. If the agent was |
|
|
|
a server, replication to it will stop. To prevent an accumulation |
|
|
|
a server, replication to it will stop. To prevent an accumulation |
|
|
|
of dead nodes, Consul will automatically reap _failed_ nodes out of the |
|
|
|
of dead nodes, Consul will automatically reap _failed_ nodes out of the |
|
|
|
catalog as well. This is currently done on a non-configurable interval |
|
|
|
catalog as well. This is currently done on a non-configurable interval of |
|
|
|
which defaults to 72 hours. Reaping is similar to leaving, causing all |
|
|
|
72 hours. Reaping is similar to leaving, causing all associated services |
|
|
|
associated services to be deregistered. |
|
|
|
to be deregistered. |
|
|
|
|
|
|
|
|
|
|
|