mirror of https://github.com/hashicorp/consul
202 lines
8.9 KiB
Markdown
202 lines
8.9 KiB
Markdown
---
|
|
layout: docs
|
|
page_title: Mesh Gateways for WAN Federation
|
|
description: >-
|
|
You can use mesh gateways to simplify the networking requirements for WAN federated Consul datacenters. Mesh gateways reduce cross-datacenter connection paths, ports, and communication protocols.
|
|
---
|
|
|
|
# Mesh Gateways for WAN Federation
|
|
|
|
-> **1.8.0+:** This feature is available in Consul versions 1.8.0 and higher
|
|
|
|
~> This topic requires familiarity with [mesh gateways](/docs/connect/gateways/mesh-gateway/service-to-service-traffic-datacenters).
|
|
|
|
WAN federation via mesh gateways allows for Consul servers in different datacenters
|
|
to be federated exclusively through mesh gateways.
|
|
|
|
When setting up a
|
|
[multi-datacenter](https://learn.hashicorp.com/consul/security-networking/datacenters)
|
|
Consul cluster, operators must ensure that all Consul servers in every
|
|
datacenter must be directly connectable over their WAN-advertised network
|
|
address from each other.
|
|
|
|
[![WAN federation without mesh gateways](/img/wan-federation-connectivity-traditional.png)](/img/wan-federation-connectivity-traditional.png)
|
|
|
|
This requires that operators setting up the virtual machines or containers
|
|
hosting the servers take additional steps to ensure the necessary routing and
|
|
firewall rules are in place to allow the servers to speak to each other over
|
|
the WAN.
|
|
|
|
Sometimes this prerequisite is difficult or undesirable to meet:
|
|
|
|
- **Difficult:** The datacenters may exist in multiple Kubernetes clusters that
|
|
unfortunately have overlapping pod IP subnets, or may exist in different
|
|
cloud provider VPCs that have overlapping subnets.
|
|
|
|
- **Undesirable:** Network security teams may not approve of granting so many
|
|
firewall rules. When using platform autoscaling, keeping rules up to date becomes untenable.
|
|
|
|
Operators looking to simplify their WAN deployment and minimize the exposed
|
|
security surface area can elect to join these datacenters together using [mesh
|
|
gateways](/docs/connect/gateways/mesh-gateway/service-to-service-traffic-datacenters) to do so.
|
|
|
|
[![WAN federation with mesh gateways](/img/wan-federation-connectivity-mesh-gateways.png)](/img/wan-federation-connectivity-mesh-gateways.png)
|
|
|
|
## Architecture
|
|
|
|
There are two main kinds of communication that occur over the WAN link spanning
|
|
the gulf between disparate Consul datacenters:
|
|
|
|
- **WAN gossip:** We leverage the serf and memberlist libraries to gossip
|
|
around failure detector knowledge about Consul servers in each datacenter.
|
|
By default this operates point to point between servers over `8302/udp` with
|
|
a fallback to `8302/tcp` (which logs a warning indicating the network is
|
|
misconfigured).
|
|
|
|
- **Cross-datacenter RPCs:** Consul servers expose a special multiplexed port
|
|
over `8300/tcp`. Several distinct kinds of messages can be received on this
|
|
port, such as RPC requests forwarded from servers in other datacenters.
|
|
|
|
In this network topology individual Consul client agents on a LAN in one
|
|
datacenter never need to directly dial servers in other datacenters. This
|
|
means you could introduce a set of firewall rules prohibiting `10.0.0.0/24`
|
|
from sending any traffic at all to `10.1.2.0/24` for security isolation.
|
|
|
|
You may already have configured [mesh
|
|
gateways](https://learn.hashicorp.com/tutorials/consul/service-mesh-gateways)
|
|
to allow for services in the service mesh to freely connect between datacenters
|
|
regardless of the lateral connectivity of the nodes hosting the Consul client
|
|
agents.
|
|
|
|
By activating WAN federation via mesh gateways the servers
|
|
can similarly use the existing mesh gateways to reach each other without
|
|
themselves being directly reachable.
|
|
|
|
## Configuration
|
|
|
|
### TLS
|
|
|
|
All Consul servers in all datacenters should have TLS configured with certificates containing
|
|
these SAN fields:
|
|
|
|
server.<this_datacenter>.<domain> (normal)
|
|
<node_name>.server.<this_datacenter>.<domain> (needed for wan federation)
|
|
|
|
This can be achieved using any number of tools, including `consul tls cert create` with the `-node` flag.
|
|
|
|
### Mesh Gateways
|
|
|
|
There needs to be at least one mesh gateway configured to opt-in to exposing
|
|
the servers in its configuration. When using the `consul connect envoy` CLI
|
|
this is done by using the flag `-expose-servers`. All this does is to register
|
|
the mesh gateway into the catalog with the additional piece of service metadata
|
|
of `{"consul-wan-federation":"1"}`. If you are registering the mesh gateways
|
|
into the catalog out of band you may simply add this to your existing
|
|
registration payload.
|
|
|
|
!> Before activating the feature on an existing cluster you should ensure that
|
|
there is at least one mesh gateway prepared to expose the servers registered in
|
|
each datacenter otherwise the WAN will become only partly connected.
|
|
|
|
### Consul Server Options
|
|
|
|
There are a few necessary additional pieces of configuration beyond those
|
|
required for standing up a
|
|
[multi-datacenter](https://learn.hashicorp.com/consul/security-networking/datacenters)
|
|
Consul cluster.
|
|
|
|
Consul servers in the _primary_ datacenter should add this snippet to the
|
|
configuration file:
|
|
|
|
```hcl
|
|
connect {
|
|
enabled = true
|
|
enable_mesh_gateway_wan_federation = true
|
|
}
|
|
```
|
|
|
|
Consul servers in all _secondary_ datacenters should add this snippet to the
|
|
configuration file:
|
|
|
|
```hcl
|
|
primary_gateways = [ "<primary-mesh-gateway-ip>:<primary-mesh-gateway-port>", ... ]
|
|
connect {
|
|
enabled = true
|
|
enable_mesh_gateway_wan_federation = true
|
|
}
|
|
```
|
|
|
|
The [`start_join_wan`](/docs/agent/config/config-files#start_join_wan) or
|
|
[`retry_join_wan`](/docs/agent/config/config-files#retry_join_wan) are
|
|
only used for the [traditional federation process](/docs/k8s/deployment-configurations/multi-cluster#traditional-wan-federation).
|
|
They must be omitted when federating Consul servers via gateways.
|
|
|
|
-> The `primary_gateways` configuration can also use `go-discover` syntax just
|
|
like `retry_join_wan`.
|
|
|
|
### Bootstrapping
|
|
|
|
For ease of debugging (such as avoiding a flurry of misleading error messages)
|
|
when intending to activate WAN federation via mesh gateways it is best to
|
|
follow this general procedure:
|
|
|
|
### New secondary
|
|
|
|
1. Upgrade to the desired version of the consul binary for all servers,
|
|
clients, and CLI.
|
|
2. Start all consul servers and clients on the new version in the primary
|
|
datacenter.
|
|
3. Ensure the primary datacenter has at least one running, registered mesh gateway with
|
|
the service metadata key of `{"consul-wan-federation":"1"}` set.
|
|
4. Ensure you are _prepared_ to launch corresponding mesh gateways in all
|
|
secondaries. When ACLs are enabled actually registering these requires
|
|
upstream connectivity to the primary datacenter to authorize catalog
|
|
registration.
|
|
5. Ensure all servers in the primary datacenter have updated configuration and
|
|
restart.
|
|
6. Ensure all servers in the secondary datacenter have updated configuration.
|
|
7. Start all consul servers and clients on the new version in the secondary
|
|
datacenter.
|
|
8. When ACLs are enabled, shortly afterwards it should become possible to
|
|
resolve ACL tokens from the secondary, at which time it should be possible
|
|
to launch the mesh gateways in the secondary datacenter.
|
|
|
|
### Existing secondary
|
|
|
|
1. Upgrade to the desired version of the consul binary for all servers,
|
|
clients, and CLI.
|
|
2. Restart all consul servers and clients on the new version.
|
|
3. Ensure each datacenter has at least one running, registered mesh gateway with the
|
|
service metadata key of `{"consul-wan-federation":"1"}` set.
|
|
4. Ensure all servers in the primary datacenter have updated configuration and
|
|
restart.
|
|
5. Ensure all servers in the secondary datacenter have updated configuration and
|
|
restart.
|
|
|
|
### Verification
|
|
|
|
From any two datacenters joined together double check the following give you an
|
|
expected result:
|
|
|
|
- Check that `consul members -wan` lists all servers in all datacenters with
|
|
their _local_ ip addresses and are listed as `alive`.
|
|
|
|
- Ensure any API request that activates datacenter request forwarding. such as
|
|
[`/v1/catalog/services?dc=<OTHER_DATACENTER_NAME>`](/api-docs/catalog#dc-1)
|
|
succeeds.
|
|
|
|
### Upgrading the primary gateways
|
|
|
|
Once federation is established, secondary datacenters will continuously request
|
|
updated mesh gateway addresses from the primary datacenter. Consul routes the requests
|
|
through the primary datacenter's mesh gateways. This is because
|
|
secondary datacenters cannot directly dial the primary datacenter's Consul servers.
|
|
If the primary gateways are upgraded, and their previous instances are decommissioned
|
|
before the updates are propagated, then the primary datacenter will become unreachable.
|
|
|
|
To safely upgrade primary gateways, we recommend that you apply one of the following policies:
|
|
- Avoid decommissioning primary gateway IP addresses. This is because the [primary_gateways](/docs/agent/config/config-files#primary_gateways) addresses configured on the secondary servers act as a fallback mechanism for re-establishing connectivity to the primary.
|
|
|
|
- Verify that addresses of the new mesh gateways in the primary were propagated
|
|
to the secondary datacenters before decommissioning the old mesh gateways in the primary.
|