consul/website/content/docs/architecture/anti-entropy.mdx

124 lines
6.0 KiB
Plaintext
Raw Normal View History

---
2020-04-07 18:55:19 +00:00
layout: docs
2022-09-13 19:58:34 +00:00
page_title: Anti-Entropy Enforcement
description: >-
2022-09-16 15:28:32 +00:00
Anti-entropy keeps distributed systems consistent. Learn how Consul uses an anti-entropy mechanism to periodically sync agent states with the service catalog to prevent the catalog from becoming stale.
---
2022-09-13 19:58:34 +00:00
# Anti-Entropy Enforcement
Consul uses an advanced method of maintaining service and health information.
This page details how services and checks are registered, how the catalog is
populated, and how health status information is updated as it changes.
### Components
It is important to first understand the moving pieces involved in services and
health checks: the [agent](#agent) and the [catalog](#catalog). These are
described conceptually below to make anti-entropy easier to understand.
#### Agent
Each Consul agent maintains its own set of service and check registrations as
well as health information. The agents are responsible for executing their own
health checks and updating their local state.
Services and checks within the context of an agent have a rich set of
configuration options available. This is because the agent is responsible for
generating information about its services and their health through the use of
Backport of Docs/services refactor docs day 122022 into release/1.15.x (#16470) * backport of commit 1c0ec4721f3f8b5ef72f5b47b032929b101a5370 * Docs/services refactor docs day 122022 (#16103) * converted main services page to services overview page * set up services usage dirs * added Define Services usage page * converted health checks everything page to Define Health Checks usage page * added Register Services and Nodes usage page * converted Query with DNS to Discover Services and Nodes Overview page * added Configure DNS Behavior usage page * added Enable Static DNS Lookups usage page * added the Enable Dynamic Queries DNS Queries usage page * added the Configuration dir and overview page - may not need the overview, tho * fixed the nav from previous commit * added the Services Configuration Reference page * added Health Checks Configuration Reference page * updated service defaults configuraiton entry to new configuration ref format * fixed some bad links found by checker * more bad links found by checker * another bad link found by checker * converted main services page to services overview page * set up services usage dirs * added Define Services usage page * converted health checks everything page to Define Health Checks usage page * added Register Services and Nodes usage page * converted Query with DNS to Discover Services and Nodes Overview page * added Configure DNS Behavior usage page * added Enable Static DNS Lookups usage page * added the Enable Dynamic Queries DNS Queries usage page * added the Configuration dir and overview page - may not need the overview, tho * fixed the nav from previous commit * added the Services Configuration Reference page * added Health Checks Configuration Reference page * updated service defaults configuraiton entry to new configuration ref format * fixed some bad links found by checker * more bad links found by checker * another bad link found by checker * fixed cross-links between new topics * updated links to the new services pages * fixed bad links in scale file * tweaks to titles and phrasing * fixed typo in checks.mdx * started updating the conf ref to latest template * update SD conf ref to match latest CT standard * Apply suggestions from code review Co-authored-by: Eddie Rowe <74205376+eddie-rowe@users.noreply.github.com> * remove previous version of the checks page * fixed cross-links * Apply suggestions from code review Co-authored-by: Eddie Rowe <74205376+eddie-rowe@users.noreply.github.com> --------- Co-authored-by: Eddie Rowe <74205376+eddie-rowe@users.noreply.github.com> --------- Co-authored-by: trujillo-adam <ajosetru@gmail.com> Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com> Co-authored-by: Eddie Rowe <74205376+eddie-rowe@users.noreply.github.com>
2023-03-01 00:48:51 +00:00
[health checks](/consul/docs/services/usage/checks).
#### Catalog
Consul's service discovery is backed by a service catalog. This catalog is
formed by aggregating information submitted by the agents. The catalog maintains
the high-level view of the cluster, including which services are available,
which nodes run those services, health information, and more. The catalog is
used to expose this information via the various interfaces Consul provides,
including DNS and HTTP.
Services and checks within the context of the catalog have a much more limited
set of fields when compared with the agent. This is because the catalog is only
2020-04-06 20:27:35 +00:00
responsible for recording and returning information _about_ services, nodes, and
health.
The catalog is maintained only by server nodes. This is because the catalog is
replicated via the [Raft log](/consul/docs/architecture/consensus) to provide a
consolidated and consistent view of the cluster.
### Anti-Entropy
Entropy is the tendency of systems to become increasingly disordered. Consul's
anti-entropy mechanisms are designed to counter this tendency, to keep the
state of the cluster ordered even through failures of its components.
2018-01-04 21:44:07 +00:00
Consul has a clear separation between the global service catalog and the agent's
local state as discussed above. The anti-entropy mechanism reconciles these two
2015-09-11 19:24:54 +00:00
views of the world: anti-entropy is a synchronization of the local agent state and
the catalog. For example, when a user registers a new service or check with the
agent, the agent in turn notifies the catalog that this new check exists.
Similarly, when a check is deleted from the agent, it is consequently removed from
the catalog as well.
Anti-entropy is also used to update availability information. As agents run
their health checks, their status may change in which case their new status
is synced to the catalog. Using this information, the catalog can respond
intelligently to queries about its nodes and services based on their
availability.
During this synchronization, the catalog is also checked for correctness. If
any services or checks exist in the catalog that the agent is not aware of, they
will be automatically removed to make the catalog reflect the proper set of
services and health information for that agent. Consul treats the state of the
agent as authoritative; if there are any differences between the agent
2018-01-04 21:44:07 +00:00
and catalog view, the agent-local view will always be used.
### Periodic Synchronization
In addition to running when changes to the agent occur, anti-entropy is also a
long-running process which periodically wakes up to sync service and check
status to the catalog. This ensures that the catalog closely matches the agent's
true state. This also allows Consul to re-populate the service catalog even in
the case of complete data loss.
To avoid saturation, the amount of time between periodic anti-entropy runs will
vary based on cluster size. The table below defines the relationship between
cluster size and sync interval:
2020-04-07 23:56:08 +00:00
| Cluster Size | Periodic Sync Interval |
| ------------ | ---------------------- |
| 1 - 128 | 1 minute |
| 129 - 256 | 2 minutes |
| 257 - 512 | 3 minutes |
| 513 - 1024 | 4 minutes |
| ... | ... |
The intervals above are approximate. Each Consul agent will choose a randomly
staggered start time within the interval window to avoid a thundering herd.
### Best-effort sync
Anti-entropy can fail in a number of cases, including misconfiguration of the
agent or its operating environment, I/O problems (full disk, filesystem
permission, etc.), networking problems (agent cannot communicate with server),
among others. Because of this, the agent attempts to sync in best-effort
fashion.
If an error is encountered during an anti-entropy run, the error is logged and
the agent continues to run. The anti-entropy mechanism is run periodically to
automatically recover from these types of transient failures.
### Enable Tag Override
Synchronization of service registration can be partially modified to
allow external agents to change the tags for a service. This can be
useful in situations where an external monitoring service needs to be
the source of truth for tag information. For example, the Redis
database and its monitoring service Redis Sentinel have this kind of
relationship. Redis instances are responsible for much of their
configuration, but Sentinels determine whether the Redis instance is a
Backport of Docs/services refactor docs day 122022 into release/1.15.x (#16470) * backport of commit 1c0ec4721f3f8b5ef72f5b47b032929b101a5370 * Docs/services refactor docs day 122022 (#16103) * converted main services page to services overview page * set up services usage dirs * added Define Services usage page * converted health checks everything page to Define Health Checks usage page * added Register Services and Nodes usage page * converted Query with DNS to Discover Services and Nodes Overview page * added Configure DNS Behavior usage page * added Enable Static DNS Lookups usage page * added the Enable Dynamic Queries DNS Queries usage page * added the Configuration dir and overview page - may not need the overview, tho * fixed the nav from previous commit * added the Services Configuration Reference page * added Health Checks Configuration Reference page * updated service defaults configuraiton entry to new configuration ref format * fixed some bad links found by checker * more bad links found by checker * another bad link found by checker * converted main services page to services overview page * set up services usage dirs * added Define Services usage page * converted health checks everything page to Define Health Checks usage page * added Register Services and Nodes usage page * converted Query with DNS to Discover Services and Nodes Overview page * added Configure DNS Behavior usage page * added Enable Static DNS Lookups usage page * added the Enable Dynamic Queries DNS Queries usage page * added the Configuration dir and overview page - may not need the overview, tho * fixed the nav from previous commit * added the Services Configuration Reference page * added Health Checks Configuration Reference page * updated service defaults configuraiton entry to new configuration ref format * fixed some bad links found by checker * more bad links found by checker * another bad link found by checker * fixed cross-links between new topics * updated links to the new services pages * fixed bad links in scale file * tweaks to titles and phrasing * fixed typo in checks.mdx * started updating the conf ref to latest template * update SD conf ref to match latest CT standard * Apply suggestions from code review Co-authored-by: Eddie Rowe <74205376+eddie-rowe@users.noreply.github.com> * remove previous version of the checks page * fixed cross-links * Apply suggestions from code review Co-authored-by: Eddie Rowe <74205376+eddie-rowe@users.noreply.github.com> --------- Co-authored-by: Eddie Rowe <74205376+eddie-rowe@users.noreply.github.com> --------- Co-authored-by: trujillo-adam <ajosetru@gmail.com> Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com> Co-authored-by: Eddie Rowe <74205376+eddie-rowe@users.noreply.github.com>
2023-03-01 00:48:51 +00:00
primary or a secondary. Enable the
[`enable_tag_override`](/consul/docs/services/configuration/services-configuration-reference#enable_tag_override) parameter in your service definition file to tell the Consul agent where the Redis database is running to bypass
tags during anti-entropy synchronization. Refer to
[Modify anti-entropy synchronozation](/consul/docs/services/usage/define-services#modify-anti-entropy-synchronization) for additional information.