Commit Graph

51 Commits (7f9a5d0f58d89e86b3d9588b87bb3fcc303e71c8)

Author SHA1 Message Date
freddygv 7f9a5d0f58 Add basic nonce management
This commit adds a monotonically increasing nonce to include in peering
replication response messages. Every ack/nack from the peer handling a
response will include this nonce, allowing to correlate the ack/nack
with a specific resource.

At the moment nothing is done with the nonce when it is received. In the
future we may want to add functionality such as retries on NACKs,
depending on the class of error.
2022-10-11 19:02:04 -06:00
freddygv bf72df7b0e Fixup test 2022-10-10 13:20:14 -06:00
Chris S. Kim b0a4c5c563 Include stream-related information in peering endpoints 2022-10-10 13:20:14 -06:00
freddygv 3034df6a5c Require Connect and TLS to generate peering tokens
By requiring Connect and a gRPC TLS listener we can automatically
configure TLS for all peering control-plane traffic.
2022-10-07 09:06:29 -06:00
Eric Haberkorn 1b565444be
Rename `PeerName` to `Peer` on prepared queries and exported services (#14854) 2022-10-04 14:46:15 -04:00
freddygv a8c4d6bc55 Share mgw addrs in peering stream if needed
This commit adds handling so that the replication stream considers
whether the user intends to peer through mesh gateways.

The subscription will return server or mesh gateway addresses depending
on the mesh configuration setting. These watches can be updated at
runtime by modifying the mesh config entry.
2022-10-03 11:42:20 -06:00
Eric Haberkorn 80e51ff907
Add exported services event to cluster peering replication. (#14797) 2022-09-29 15:37:19 -04:00
Chris S. Kim 953808e899 PR feedback on terminated state checking 2022-09-06 10:28:20 -04:00
Chris S. Kim ec36755cc0 Properly assert for ServerAddresses replication request 2022-09-02 11:44:54 -04:00
Chris S. Kim d1d9dbff8e Fix terminate not returning early 2022-09-02 11:44:38 -04:00
Chris S. Kim 560d410c6d Merge branch 'main' into NET-638-push-server-address-updates-to-the-peer
# Conflicts:
#	agent/grpc-external/services/peerstream/stream_test.go
2022-08-30 11:09:25 -04:00
Chris S. Kim 74ddf040dd Add heartbeat timeout grace period when accounting for peering health 2022-08-29 16:32:26 -04:00
Chris S. Kim def529edd3 Rename test 2022-08-29 10:34:50 -04:00
Chris S. Kim 93271f649c Fix test 2022-08-29 10:20:30 -04:00
Chris S. Kim 4d97e2f936 Adjust metrics reporting for peering tracker 2022-08-26 17:34:17 -04:00
Chris S. Kim 937a8ec742 Fix casing 2022-08-26 11:56:26 -04:00
Chris S. Kim 1c43a1a7b4 Merge branch 'main' into NET-638-push-server-address-updates-to-the-peer
# Conflicts:
#	agent/grpc-external/services/peerstream/stream_test.go
2022-08-26 10:43:56 -04:00
alex 30ff2e9a35
peering: add peer health metric (#14004)
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-08-25 16:32:59 -07:00
Chris S. Kim 8c94d1a80c Update test comment 2022-08-24 13:50:24 -04:00
Chris S. Kim 5f2959329f Add check for zero-length server addresses 2022-08-24 13:30:52 -04:00
Chris S. Kim cdc8b0634d Fix flakes 2022-08-22 14:45:31 -04:00
Chris S. Kim 03e92826aa Increase heartbeat rate to reduce test flakes 2022-08-22 14:24:05 -04:00
Chris S. Kim 06ba9775ee Remove check for ResponseNonce 2022-08-22 13:55:01 -04:00
Chris S. Kim adff2eef16 Fix data race
newMockSnapshotHandler has an assertion on t.Cleanup which gets called before the event publisher is cancelled. This commit reorders the context.WithCancel so it properly gets cancelled before the assertion is made.
2022-08-22 13:55:01 -04:00
Chris S. Kim 4e40e1d222 Handle server addresses update as client 2022-08-22 13:42:12 -04:00
Chris S. Kim 584d3409c4 Send server addresses on update from server 2022-08-22 13:41:44 -04:00
Chris S. Kim c9d8ad3939 Add new subscription for server addresses 2022-08-22 13:40:25 -04:00
Chris S. Kim 028b87d51f Cleanup unused logger 2022-08-22 13:40:23 -04:00
freddygv 1031ffc3c7 Re-validate existing secrets at state store
Previously establishment and pending secrets were only checked at the
RPC layer. However, given that these are Check-and-Set transactions we
should ensure that the given secrets are still valid when persisting a
secret exchange or promotion.

Otherwise it would be possible for concurrent requests to overwrite each
other.
2022-08-08 09:06:07 -06:00
freddygv c04515a844 Use proto message for each secrets write op
Previously there was a field indicating the operation that triggered a
secrets write. Now there is a message for each operation and it contains
the secret ID being persisted.
2022-08-08 01:41:00 -06:00
freddygv 60d6e28c97 Pass explicit signal with op for secrets write
Previously the updates to the peering secrets UUID table relied on
inferring what action triggered the update based on a reconciliation
against the existing secrets.

Instead we now explicitly require the operation to be given so that the
inference isn't necessary. This makes the UUID table logic easier to
reason about and fixes some related bugs.

There is also an update so that the peering secrets get handled on
snapshots/restores.
2022-08-03 17:25:12 -05:00
Freddy 42996411cc
Various peering fixes (#13979)
* Avoid logging StreamSecretID
* Wrap additional errors in stream handler
* Fix flakiness in leader test and rename servers for clarity. There was
  a race condition where the peering was being deleted in the test
  before the stream was active. Now the test waits for the stream to be
  connected on both sides before deleting the associated peering.
* Run flaky test serially
2022-08-01 15:06:18 -06:00
Matt Keeler f74d0cef7a
Implement/Utilize secrets for Peering Replication Stream (#13977) 2022-08-01 10:33:18 -04:00
Luke Kysow 95096e2c03
peering: retry establishing connection more quickly on certain errors (#13938)
When we receive a FailedPrecondition error, retry that more quickly
because we expect it will resolve shortly. This is particularly
important in the context of Consul servers behind a load balancer
because when establishing a connection we have to retry until we
randomly land on a leader node.

The default retry backoff goes from 2s, 4s, 8s, etc. which can result in
very long delays quite quickly. Instead, this backoff retries in 8ms
five times, then goes exponentially from there: 16ms, 32ms, ... up to a
max of 8152ms.
2022-07-29 13:04:32 -07:00
Luke Kysow 465a9801e1
Merge pull request #13924 from hashicorp/lkysow/util-metric-peering
peering: don't track imported services/nodes in usage
2022-07-27 14:49:55 -07:00
Luke Kysow 021b00e321 Remove duplicate comment 2022-07-26 10:19:49 -07:00
Luke Kysow 8c5b70d227
Rename receive to recv in tracker (#13896)
Because it's shorter
2022-07-25 16:08:03 -07:00
Luke Kysow 3530d3782d
peering: read endpoints can now return failing status (#13849)
Track streams that have been disconnected due to an error and
set their statuses to failing.
2022-07-25 14:27:53 -07:00
alex 927cee692b
peering: emit exported services count metric (#13811)
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-22 12:05:08 -07:00
Luke Kysow 0c87be0845
peering: Add heartbeating to peering streams (#13806)
* Add heartbeating to peering streams
2022-07-21 10:03:27 -07:00
Luke Kysow c411e6b326
Add send mutex to protect against concurrent sends (#13805) 2022-07-20 15:48:18 -07:00
alex de5a991d8c
peering: refactor reconcile, cleanup (#13795)
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-19 11:43:29 -07:00
alex a9ae2ff4fa
peering: track exported services (#13784)
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-18 10:20:04 -07:00
R.B. Boyer cd513aeead
peerstream: require a resource subscription to receive updates of that type (#13767)
This mimics xDS's discovery protocol where you must request a resource
explicitly for the exporting side to send those events to you.

As part of this I aligned the overall ResourceURL with the TypeURL that
gets embedded into the encoded protobuf Any construct. The
CheckServiceNodes is now wrapped in a better named "ExportedService"
struct now.
2022-07-15 15:03:40 -05:00
R.B. Boyer c737301093
peerstream: fix test assertions (#13780) 2022-07-15 14:43:24 -05:00
Luke Kysow ca3d7c964c
peerstream: dialer should reconnect when stream closes (#13745)
* peerstream: dialer should reconnect when stream closes

If the stream is closed unexpectedly (i.e. when we haven't received
a terminated message), the dialer should attempt to re-establish the
stream.

Previously, the `HandleStream` would return `nil` when the stream
was closed. The caller then assumed the stream was terminated on purpose
and so didn't reconnect when instead it was stopped unexpectedly and
the dialer should have attempted to reconnect.
2022-07-15 11:58:33 -07:00
R.B. Boyer bb4d4040fb
server: ensure peer replication can successfully use TLS over external gRPC (#13733)
Ensure that the peer stream replication rpc can successfully be used with TLS activated.

Also:

- If key material is configured for the gRPC port but HTTPS is not
  enabled now TLS will still be activated for the gRPC port.

- peerstream replication stream opened by the establishing-side will now
  ignore grpc.WithBlock so that TLS errors will bubble up instead of
  being awkwardly delayed or suppressed
2022-07-15 13:15:50 -05:00
alex adb5ffa1a6
peering: track imported services (#13718) 2022-07-15 10:20:43 -07:00
Matt Keeler 257f88d4df
Use Node Name for peering healthSnapshot instead of ID (#13773)
A Node ID is not a required field with Consul’s data model. Therefore we cannot reliably expect all uses to have it. However the node name is required and must be unique so its equally as good of a key for the internal healthSnapshot node tracking.
2022-07-15 10:51:38 -04:00
Chris S. Kim b4ffa9ae0c Scrub VirtualIPs before exporting 2022-07-13 16:05:10 -04:00