diff --git a/website/data/docs-navigation.js b/website/data/docs-navigation.js index 51fe093bc9..b383332b82 100644 --- a/website/data/docs-navigation.js +++ b/website/data/docs-navigation.js @@ -155,6 +155,7 @@ export default [ content: ['envoy', 'built-in', 'integrate'], }, 'mesh_gateway', + 'wan-federation-via-mesh-gateways', { category: 'registration', content: ['service-registration', 'sidecar-service'], diff --git a/website/pages/docs/connect/wan-federation-via-mesh-gateways.mdx b/website/pages/docs/connect/wan-federation-via-mesh-gateways.mdx new file mode 100644 index 0000000000..f526d4c370 --- /dev/null +++ b/website/pages/docs/connect/wan-federation-via-mesh-gateways.mdx @@ -0,0 +1,185 @@ +--- +layout: docs +page_title: Connect - WAN Federation via Mesh Gateways +sidebar_title: WAN Federation via Mesh Gateways Beta +description: |- + WAN federation via mesh gateways allows for Consul servers in different datacenters to be federated exclusively through mesh gateways. +--- + +# WAN Federation via Mesh Gateways Beta + +-> **1.8.0+:** This feature is available in Consul versions 1.8.0 and higher + +~> This topic requires familiarity with [mesh gateways](/docs/connect/mesh_gateway). + +WAN federation via mesh gateways allows for Consul servers in different datacenters +to be federated exclusively through mesh gateways. + +When setting up a +[multi-datacenter](https://learn.hashicorp.com/consul/security-networking/datacenters) +Consul cluster, operators must ensure that all Consul servers in every +datacenter must be directly connectable over their WAN-advertised network +address from each other. + +If you are using Kubernetes, refer to our [Kubernetes Multi Cluster](/docs/k8s/installation/multi-cluster) documentation. + +This requires that operators setting up the virtual machines or containers +hosting the servers take additional steps to ensure the necessary routing and +firewall rules are in place to allow the servers to speak to each other over +the WAN. + +Sometimes this prerequisite is difficult or undesirable to meet: + +* **Difficult:** The datacenters may exist in multiple Kubernetes clusters that + unfortunately have overlapping pod IP subnets, or may exist in different + cloud provider VPCs that have overlapping subnets. + +* **Undesirable:** Network security teams may not approve of granting so many + firewall rules. When using platform autoscaling, keeping rules up to date becomes untenable. + +Operators looking to simplify their WAN deployment and minimize the exposed +security surface area can elect to join these datacenters together using [mesh +gateways](/docs/connect/mesh_gateways.html) to do so. + +## Architecture + +There are two main kinds of communication that occur over the WAN link spanning +the gulf between disparate Consul datacenters: + +* **WAN gossip:** We leverage the serf and memberlist libraries to gossip + around failure detector knowledge about Consul servers in each datacenter. + By default this operates point to point between servers over `8302/udp` with + a fallback to `8302/tcp` (which logs a warning indicating the network is + misconfigured). + +* **Cross-datacenter RPCs:** Consul servers expose a special multiplexed port + over `8300/tcp`. Several distinct kinds of messages can be received on this + port, such as RPC requests forwarded from servers in other datacenters. + + +In this network topology individual Consul client agents on a LAN in one +datacenter never need to directly dial servers in other datacenters. This +means you could introduce a set of firewall rules prohibiting `10.0.0.0/24` +from sending any traffic at all to `10.1.2.0/24` for security isolation. + +You may already have configured [mesh +gateways](https://learn.hashicorp.com/consul/developer-mesh/connect-gateways) +to allow for services in the service mesh to freely connect between datacenters +regardless of the lateral connectivity of the nodes hosting the Consul client +agents. + +By activating WAN federation via mesh gateways the servers +can similarly use the existing mesh gateways to reach each other without +themselves being directly reachable. + +## Configuration + +### TLS + +All Consul servers in all datacenters should have TLS configured with certificates containing +these SAN fields: + + server.. (normal) + .server.. (needed for wan federation) + +This can be achieved using any number of tools, including `consul tls cert +create` with the `-node` flag. + +### Mesh Gateways + +There needs to be at least one mesh gateway configured to opt-in to exposing +the servers in its configuration. When using the `consul connect envoy` CLI +this is done by using the flag `-expose-servers`. All this does is to register +the mesh gateway into the catalog with the additional piece of service metadata +of `{"consul-wan-federation":"1"}`. If you are registering the mesh gateways +into the catalog out of band you may simply add this to your existing +registration payload. + +!> Before activating the feature on an existing cluster you should ensure that +there is at least one mesh gateway prepared to expose the servers registered in +each datacenter otherwise the WAN will become only partly connected. + +### Consul Server Options + +There are a few necessary additional pieces of configuration beyond those +required for standing up a +[multi-datacenter](https://learn.hashicorp.com/consul/security-networking/datacenters) +Consul cluster. + +Consul servers in the _primary_ datacenter should add this snippet to the +configuration file: + +```hcl +connect { + enabled = true + enable_mesh_gateway_wan_federation = true +} +``` + +Consul servers in all _secondary_ datacenters should add this snippet to the +configuration file: + +```hcl +primary_gateways = [ ":", ... ] +connect { + enabled = true + enable_mesh_gateway_wan_federation = true +} +``` + +Any references to [`start_join_wan`](/docs/agent/options#start_join_wan) or [`retry_join_wan`](/docs/agent/options#retry_join_wan) should be omitted. + +-> The `primary_gateways` configuration can also use `go-discover` syntax just +like `retry_join_wan`. + +### Bootstrapping + +For ease of debugging (such as avoiding a flurry of misleading error messages) +when intending to activate WAN federation via mesh gateways it is best to +follow this general procedure: + +### New secondary + +1. Upgrade to the desired version of the consul binary for all servers, + clients, and CLI. +2. Start all consul servers and clients on the new version in the primary + datacenter. +3. Ensure the primary datacenter has at least one running, registered mesh gateway with + the service metadata key of `{"consul-wan-federation":"1"}` set. +4. Ensure you are _prepared_ to launch corresponding mesh gateways in all + secondaries. When ACLs are enabled actually registering these requires + upstream connectivity to the primary datacenter to authorize catalog + registration. +5. Ensure all servers in the primary datacenter have updated configuration and + restart. +6. Ensure all servers in the secondary datacenter have updated configuration. +7. Start all consul servers and clients on the new version in the secondary + datacenter. +8. When ACLs are enabled, shortly afterwards it should become possible to + resolve ACL tokens from the secondary, at which time it should be possible + to launch the mesh gateways in the secondary datacenter. + + +### Existing secondary + +1. Upgrade to the desired version of the consul binary for all servers, + clients, and CLI. +2. Restart all consul servers and clients on the new version. +3. Ensure each datacenter has at least one running, registered mesh gateway with the + service metadata key of `{"consul-wan-federation":"1"}` set. +4. Ensure all servers in the primary datacenter have updated configuration and + restart. +5. Ensure all servers in the secondary datacenter have updated configuration and + restart. + +### Verification + +From any two datacenters joined together double check the following give you an +expected result: + +* Check that `consul members -wan` lists all servers in all datacenters with + their _local_ ip addresses and are listed as `alive`. + +* Ensure any API request that activates datacenter request forwarding. such as + [`/v1/catalog/services?dc=`](/api/catalog.html#dc-1) + succeeds.