Commit Graph

170 Commits (658c27a6843cbf11fd22f554a68f2874b542817b)

Author SHA1 Message Date
R.B. Boyer 8e22d80e35
connect: fix failover through a mesh gateway to a remote datacenter (#6259)
Failover is pushed entirely down to the data plane by creating envoy
clusters and putting each successive destination in a different load
assignment priority band. For example this shows that normally requests
go to 1.2.3.4:8080 but when that fails they go to 6.7.8.9:8080:

- name: foo
  load_assignment:
    cluster_name: foo
    policy:
      overprovisioning_factor: 100000
    endpoints:
    - priority: 0
      lb_endpoints:
      - endpoint:
          address:
            socket_address:
              address: 1.2.3.4
              port_value: 8080
    - priority: 1
      lb_endpoints:
      - endpoint:
          address:
            socket_address:
              address: 6.7.8.9
              port_value: 8080

Mesh gateways route requests based solely on the SNI header tacked onto
the TLS layer. Envoy currently only lets you configure the outbound SNI
header at the cluster layer.

If you try to failover through a mesh gateway you ideally would
configure the SNI value per endpoint, but that's not possible in envoy
today.

This PR introduces a simpler way around the problem for now:

1. We identify any target of failover that will use mesh gateway mode local or
   remote and then further isolate any resolver node in the compiled discovery
   chain that has a failover destination set to one of those targets.

2. For each of these resolvers we will perform a small measurement of
   comparative healths of the endpoints that come back from the health API for the
   set of primary target and serial failover targets. We walk the list of targets
   in order and if any endpoint is healthy we return that target, otherwise we
   move on to the next target.

3. The CDS and EDS endpoints both perform the measurements in (2) for the
   affected resolver nodes.

4. For CDS this measurement selects which TLS SNI field to use for the cluster
   (note the cluster is always going to be named for the primary target)

5. For EDS this measurement selects which set of endpoints will populate the
   cluster. Priority tiered failover is ignored.

One of the big downsides to this approach to failover is that the failover
detection and correction is going to be controlled by consul rather than
deferring that entirely to the data plane as with the prior version. This also
means that we are bound to only failover using official health signals and
cannot make use of data plane signals like outlier detection to affect
failover.

In this specific scenario the lack of data plane signals is ok because the
effectiveness is already muted by the fact that the ultimate destination
endpoints will have their data plane signals scrambled when they pass through
the mesh gateway wrapper anyway so we're not losing much.

Another related fix is that we now use the endpoint health from the
underlying service, not the health of the gateway (regardless of
failover mode).
2019-08-05 13:30:35 -05:00
R.B. Boyer c395affc93
connect: expose an API endpoint to compile the discovery chain (#6248)
In addition to exposing compilation over the API cleaned up the structures that would be exchanged to be cleaner and easier to support and understand.

Also removed ability to configure the envoy OverprovisioningFactor.
2019-08-02 15:34:54 -05:00
R.B. Boyer f02924fafe
connect: simplify the compiled discovery chain data structures (#6242)
This should make them better for sending over RPC or the API.

Instead of a chain implemented explicitly like a linked list (nodes
holding pointers to other nodes) instead switch to a flat map of named
nodes with nodes linking other other nodes by name. The shipped
structure is just a map and a string to indicate which key to start
from.

Other changes:

* inline the compiler option InferDefaults as true

* introduce compiled target config to avoid needing to send back
  additional maps of Resolvers; future target-specific compiled state
  can go here

* move compiled MeshGateway out of the Resolver and into the
  TargetConfig where it makes more sense.
2019-08-01 22:44:05 -05:00
R.B. Boyer 6393edba53
connect: reconcile how upstream configuration works with discovery chains (#6225)
* connect: reconcile how upstream configuration works with discovery chains

The following upstream config fields for connect sidecars sanely
integrate into discovery chain resolution:

- Destination Namespace/Datacenter: Compilation occurs locally but using
different default values for namespaces and datacenters. The xDS
clusters that are created are named as they normally would be.

- Mesh Gateway Mode (single upstream): If set this value overrides any
value computed for any resolver for the entire discovery chain. The xDS
clusters that are created may be named differently (see below).

- Mesh Gateway Mode (whole sidecar): If set this value overrides any
value computed for any resolver for the entire discovery chain. If this
is specifically overridden for a single upstream this value is ignored
in that case. The xDS clusters that are created may be named differently
(see below).

- Protocol (in opaque config): If set this value overrides the value
computed when evaluating the entire discovery chain. If the normal chain
would be TCP or if this override is set to TCP then the result is that
we explicitly disable L7 Routing and Splitting. The xDS clusters that
are created may be named differently (see below).

- Connect Timeout (in opaque config): If set this value overrides the
value for any resolver in the entire discovery chain. The xDS clusters
that are created may be named differently (see below).

If any of the above overrides affect the actual result of compiling the
discovery chain (i.e. "tcp" becomes "grpc" instead of being a no-op
override to "tcp") then the relevant parameters are hashed and provided
to the xDS layer as a prefix for use in naming the Clusters. This is to
ensure that if one Upstream discovery chain has no overrides and
tangentially needs a cluster named "api.default.XXX", and another
Upstream does have overrides for "api.default.XXX" that they won't
cross-pollinate against the operator's wishes.

Fixes #6159
2019-08-01 22:03:34 -05:00
hashicorp-ci a4431da1cc Merge Consul OSS branch 'master' at commit ef257b084d 2019-07-20 02:00:29 +00:00
Christian Muehlhaeuser 16193665ca Fixed a few tautological condition mistakes (#6177)
None of these changes should have any side-effects. They're merely
fixing tautological mistakes.
2019-07-19 07:53:42 -04:00
R.B. Boyer bcd2de3a2e
implement some missing service-router features and add more xDS testing (#6065)
- also implement OnlyPassing filters for non-gateway clusters
2019-07-12 14:16:21 -05:00
Jack Pearkes e6f1b78efb Make cluster names SNI always (#6081)
* Make cluster names SNI always

* Update some tests

* Ensure we check for prepared query types

* Use sni for route cluster names

* Proper mesh gateway mode defaulting when the discovery chain is used

* Ignore service splits from PatchSliceOfMaps

* Update some xds golden files for proper test output

* Allow for grpc/http listeners/cluster configs with the disco chain

* Update stats expectation
2019-07-08 12:48:48 +01:00
Matt Keeler 25f580bcaa Fix a bunch of xds flaky tests
The clusters/endpoints test were still relying on deterministic ordering of clusters/endpoints which cannot be relied upon due to golang purposefully not providing any guarantee about consistent interation ordering of maps.

Also fixed a small bug in the connect proxy cluster generation that was causing the clusters slice to be double the size it needed to with the first half being all nil pointers.
2019-07-02 15:53:06 -04:00
Matt Keeler a7421c160f Implement mesh gateway management of service subsets
Fixup some error handling
2019-07-02 10:29:37 -04:00
R.B. Boyer 4bdb690a25
activate most discovery chain features in xDS for envoy (#6024) 2019-07-01 22:10:51 -05:00
Matt Keeler 8d953f5840 Implement Mesh Gateways
This includes both ingress and egress functionality.
2019-07-01 16:28:30 -04:00
R.B. Boyer 38d76c624e
Allow for both snake_case and CamelCase for config entries written with 'consul config write'. (#6044)
This also has the added benefit of fixing an issue with passing
time.Duration fields through config entries.
2019-06-28 11:35:35 -05:00
Matt Keeler 813e009a2d
Prepare for having different service kinds that are all generic… (#6013)
Default to internal error when service kind is unknown
2019-06-24 15:05:36 -04:00
Paul Banks ffcfdf29fc
Upgrade xDS (go-control-plane) API to support Envoy 1.10. (#5872)
* Upgrade xDS (go-control-plane) API to support Envoy 1.10.

This includes backwards compatibility shim to work around the ext_authz package rename in 1.10.

It also adds integration test support in CI for 1.10.0.

* Fix go vet complaints

* go mod vendor

* Update Envoy version info in docs

* Update website/source/docs/connect/proxies/envoy.md
2019-06-07 07:10:43 -05:00
Paul Banks 421ecd32fc
Connect: allow configuring Envoy for L7 Observability (#5558)
* Add support for HTTP proxy listeners

* Add customizable bootstrap configuration options

* Debug logging for xDS AuthZ

* Add Envoy Integration test suite with basic test coverage

* Add envoy command tests to cover new cases

* Add tracing integration test

* Add gRPC support WIP

* Merged changes from master Docker. get CI integration to work with same Dockerfile now

* Make docker build optional for integration

* Enable integration tests again!

* http2 and grpc integration tests and fixes

* Fix up command config tests

* Store all container logs as artifacts in circle on fail

* Add retries to outer part of stats measurements as we keep missing them in CI

* Only dump logs on failing cases

* Fix typos from code review

* Review tidying and make tests pass again

* Add debug logs to exec test.

* Fix legit test failure caused by upstream rename in envoy config

* Attempt to reduce cases of bad TLS handshake in CI integration tests

* bring up the right service

* Add prometheus integration test

* Add test for denied AuthZ both HTTP and TCP

* Try ANSI term for Circle
2019-04-29 17:27:57 +01:00
Paul Banks 89fa5ec3ba
Connect: Fix Envoy getting stuck during load (#5499)
* Connect: Fix Envoy getting stuck during load

Also in this PR:
 - Enabled outlier detection on upstreams which will mark instances unhealthy after 5 failures (using Envoy's defaults)
 - Enable weighted load balancing where DNS weights are configured

* Fix empty load assignments in the right place

* Fix import names from review

* Move millisecond parse to a helper function
2019-03-22 19:37:14 +00:00
R.B. Boyer f4a3b9d518
fix typos reported by golangci-lint:misspell (#5434) 2019-03-06 11:13:28 -06:00
Nicholas Jackson 99fe9dabce Envoy config cluster (#5308)
* Start adding tests for cluster override

* Refactor tests for clusters

* Passing tests for custom upstream cluster override

* Added capability to customise local app cluster

* Rename config for local cluster override
2019-02-19 13:45:33 +00:00
Paul Banks 1909a95118 xDS Server Implementation (#4731)
* Vendor updates for gRPC and xDS server

* xDS server implementation for serving Envoy as a Connect proxy

* Address initial review comments

* consistent envoy package aliases; typos fixed; override TLS and authz for custom listeners

* Moar Typos

* Moar typos
2018-10-10 16:55:34 +01:00