consul

Commit Graph

Author	SHA1	Message	Date
Daniel Upton	653b8c4f9d	proxycfg: server-local config entry data sources This is the OSS portion of enterprise PR 2056. This commit provides server-local implementations of the proxycfg.ConfigEntry and proxycfg.ConfigEntryList interfaces, that source data from streaming events. It makes use of the LocalMaterializer type introduced for peering replication, adding the necessary support for authorization. It also adds support for "wildcard" subscriptions (within a topic) to the event publisher, as this is needed to fetch service-resolvers for all services when configuring mesh gateways. Currently, events will be emitted for just the ingress-gateway, service-resolver, and mesh config entry types, as these are the only entries required by proxycfg — the events will be emitted on topics named IngressGateway, ServiceResolver, and MeshConfig topics respectively. Though these events will only be consumed "locally" for now, they can also be consumed via the gRPC endpoint (confirmed using grpcurl) so using them from client agents should be a case of swapping the LocalMaterializer for an RPCMaterializer.	2022-07-04 10:48:36 +01:00
alex	cd9ca4290a	peering: add imported/exported counts to peering (#13644 ) Signed-off-by: acpana <8968914+acpana@users.noreply.github.com> Co-authored-by: Chris S. Kim <ckim@hashicorp.com>	2022-06-29 14:07:30 -07:00
alex	beb8b03e8a	peering: reconcile/ hint active state for list (#13619 ) Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>	2022-06-29 09:43:50 -07:00
R.B. Boyer	0fa828db76	peering: replicate all SpiffeID values necessary for the importing side to do SAN validation (#13612 ) When traversing an exported peered service, the discovery chain evaluation at the other side may re-route the request to a variety of endpoints. Furthermore we intend to terminate mTLS at the mesh gateway for arriving peered traffic that is http-like (L7), so the caller needs to know the mesh gateway's SpiffeID in that case as well. The following new SpiffeID values will be shipped back in the peerstream replication: - tcp: all possible SpiffeIDs resulting from the service-resolver component of the exported discovery chain - http-like: the SpiffeID of the mesh gateway	2022-06-27 14:37:18 -05:00
R.B. Boyer	e8ea3d7c3b	state: peering ID assignment cannot happen inside of the state store (#13525 ) Move peering ID assignment outisde of the FSM, so that the ID is written to the raft log and the same ID is used by all voters, and after restarts.	2022-06-21 13:04:08 -05:00
R.B. Boyer	201d1458c3	xds: mesh gateways now have their own leaf certificate when involved in a peering (#13460 ) This is only configured in xDS when a service with an L7 protocol is exported. They also load any relevant trust bundles for the peered services to eventually use for L7 SPIFFE validation during mTLS termination.	2022-06-15 14:36:18 -05:00
freddygv	f3843809da	Avoid deleting peerings marked as terminated. When our peer deletes the peering it is locally marked as terminated. This termination should kick off deleting all imported data, but should not delete the peering object itself. Keeping peerings marked as terminated acts as a signal that the action took place.	2022-06-14 15:37:09 -06:00
freddygv	6453375ab2	Add leader routine to clean up peerings Once a peering is marked for deletion a new leader routine will now clean up all imported resources and then the peering itself. A lot of the logic was grabbed from the namespace/partitions deferred deletions but with a handful of simplifications: - The rate limiting is not configurable. - Deleting imported nodes/services/checks is done by deleting nodes with the Txn API. The services and checks are deleted as a side-effect. - There is no "round rate limiter" like with namespaces and partitions. This is because peerings are purely local, and deleting a peering in the datacenter does not depend on deleting data from other DCs like with WAN-federated namespaces. All rate limiting is handled by the Raft rate limiter.	2022-06-14 15:36:50 -06:00
freddygv	6c8ab1bbac	Fixup stream tear-down steps. 1. Fix a bug where the peering leader routine would not track all active peerings in the "stored" reconciliation map. This could lead to tearing down streams where the token was generated, since the ConnectedStreams() method used for reconciliation returns all streams and not just the ones initiated by this leader routine. 2. Fix a race where stream contexts were being canceled before termination messages were being processed by a peer. Previously the leader routine would tear down streams by canceling their context right after the termination message was sent. This context cancelation could be propagated to the server side faster than the termination message. Now there is a change where the dialing peer uses CloseSend() to signal when no more messages will be sent. Eventually the server peer will read an EOF after receiving and processing the preceding termination message. Using CloseSend() is actually not enough to address the issue mentioned, since it doesn't wait for the server peer to finish processing messages. Because of this now the dialing peer also reads from the stream until an error signals that there are no more messages. Receiving an EOF from our peer indicates that they processed the termination message and have no additional work to do. Given that the stream is being closed, all the messages received by Recv are discarded. We only check for errors to avoid importing new data.	2022-06-13 12:10:42 -06:00
freddygv	cc921a9c78	Update peering state and RPC for deferred deletion When deleting a peering we do not want to delete the peering and all imported data in a single operation, since deleting a large amount of data at once could overload Consul. Instead we defer deletion of peerings so that: 1. When a peering deletion request is received via gRPC the peering is marked for deletion by setting the DeletedAt field. 2. A leader routine will monitor for peerings that are marked for deletion and kick off a throttled deletion of all imported resources before deleting the peering itself. This commit mostly addresses point #1 by modifying the peering service to mark peerings for deletion. Another key change is to add a PeeringListDeleted state store function which can return all peerings marked for deletion. This function is what will be watched by the deferred deletion leader routine.	2022-06-13 12:10:32 -06:00
Freddy	71b254522e	Clean up imported nodes/services/checks as needed (#13367 ) Previously, imported data would never be deleted. As nodes/services/checks were registered and deregistered, resources deleted from the exporting cluster would accumulate in the imported cluster. This commit makes updates to replication so that whenever an update is received for a service name we reconcile what was present in the catalog against what was received. This handleUpdateService method can handle both updates and deletions.	2022-06-13 11:52:28 -06:00
Chris S. Kim	a02e9abcc1	Update RBAC to handle imported services (#13404 ) When converting from Consul intentions to xds RBAC rules, services imported from other peers must encode additional data like partition (from the remote cluster) and trust domain. This PR updates the PeeringTrustBundle to hold the sending side's local partition as ExportedPartition. It also updates RBAC code to encode SpiffeIDs of imported services with the ExportedPartition and TrustDomain.	2022-06-10 17:15:22 -04:00
R.B. Boyer	7001e1151c	peering: rename initiate to establish in the context of the APIs (#13419 )	2022-06-10 11:10:46 -05:00
R.B. Boyer	bba3eb8cdd	peering: mesh gateways are required for cross-peer service mesh communication (#13410 ) Require use of mesh gateways in order for service mesh data plane traffic to flow between peers. This also adds plumbing for envoy integration tests involving peers, and one starter peering test.	2022-06-09 11:05:18 -05:00
R.B. Boyer	7423886136	peering: allow protobuf requests to populate the default partition or namespace (#13398 )	2022-06-08 11:55:18 -05:00
R.B. Boyer	ab758b7b32	peering: allow mesh gateways to proxy L4 peered traffic (#13339 ) Mesh gateways will now enable tcp connections with SNI names including peering information so that those connections may be proxied. Note: this does not change the callers to use these mesh gateways.	2022-06-06 14:20:41 -05:00
alex	bbbc50815a	peering: send leader addr (#13342 ) Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>	2022-06-06 10:00:38 -07:00
R.B. Boyer	019aeaa57d	peering: update how cross-peer upstreams and represented in proxycfg and rendered in xds (#13362 ) This removes unnecessary, vestigal remnants of discovery chains.	2022-06-03 16:42:50 -05:00
freddygv	647c57a416	Add agent cache-type for TrustBundleListByService There are a handful of changes in this commit: * When querying trust bundles for a service we need to be able to specify the namespace of the service. * The endpoint needs to track the index because the cache watches use it. * Extracted bulk of the endpoint's logic to a state store function so that index tracking could be tested more easily. * Removed check for service existence, deferring that sort of work to ACL authz * Added the cache type	2022-06-01 17:05:10 -06:00
freddygv	8b58fa8afe	Update assumptions around exported-service config Given that the exported-services config entry can use wildcards, the precedence for wildcards is handled as with intentions. The most exact match is the match that applies for any given service. We do not take the union of all that apply. Another update that was made was to reflect that only one exported-services config entry applies to any given service in a partition. This is a pre-existing constraint that gets enforced by the Normalize() method on that config entry type.	2022-06-01 17:03:51 -06:00
freddygv	870e7c72d7	Return SPIFFE ID for connect proxies in PeerMeta Proxies dialing exporting services need to know the SPIFFE ID of services dialed so that the upstream's SANs can be validated. This commit attaches the SPIFFE ID to all connect proxies exported over the peering stream so that they are available to importing clusters. The data in the SPIFFE ID cannot be re-constructed in peer clusters because the partition of exported services is overwritten on imports.	2022-05-31 09:55:37 -06:00
Freddy	9427700270	[OSS] Add grpc endpoint to fetch a specific trust bundle (#13292 ) Co-authored-by: R.B. Boyer <rb@hashicorp.com>	2022-05-31 09:54:40 -06:00
alex	fd7a403e11	monitor leadership in peering service (#13257 ) Signed-off-by: acpana <8968914+acpana@users.noreply.github.com> Co-authored-by: Chris S. Kim <ckim@hashicorp.com> Co-authored-by: Freddy <freddygv@users.noreply.github.com>	2022-05-26 17:55:16 -07:00
Chris S. Kim	6d3bea7129	Add support for streaming CA roots to peers (#13260 ) Sender watches for changes to CA roots and sends them through the replication stream. Receiver saves CA roots to tablePeeringTrustBundle	2022-05-26 15:24:09 -04:00
R.B. Boyer	1a8834e1c8	peering: replicate expected SNI, SPIFFE, and service protocol to peers (#13218 ) The importing peer will need to know what SNI and SPIFFE name corresponds to each exported service. Additionally it will need to know at a high level the protocol in use (L4/L7) to generate the appropriate connection pool and local metrics. For replicated connect synthetic entities we edit the `Connect{}` part of a `NodeService` to have a new section: { "PeerMeta": { "SNI": [ "web.default.default.owt.external.183150d5-1033-3672-c426-c29205a576b8.consul" ], "SpiffeID": [ "spiffe://183150d5-1033-3672-c426-c29205a576b8.consul/ns/default/dc/dc1/svc/web" ], "Protocol": "tcp" } } This data is then replicated and saved as-is at the importing side. Both SNI and SpiffeID are slices for now until I can be sure we don't need them for how mesh gateways will ultimately work.	2022-05-25 12:37:44 -05:00
R.B. Boyer	be631ebdce	peering: disable requirement for mesh gateways initially (#13213 )	2022-05-25 10:13:23 -05:00
alex	876f3bb971	peering: expose IsLeader, hung up on dialer if follower (#13164 ) Signed-off-by: acpana <8968914+acpana@users.noreply.github.com> Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>	2022-05-23 11:30:58 -07:00
R.B. Boyer	2e72f44fda	peering: accept replication stream of discovery chain information at the importing side (#13151 )	2022-05-19 16:37:52 -05:00
R.B. Boyer	3e4a522882	peering: replicate discovery chains information to importing peers Treat each exported service as a "discovery chain" and replicate one synthetic CheckServiceNode for each chain and remote mesh gateway. The health will be a flattened generated check of the checks for that mesh gateway node.	2022-05-19 14:21:44 -05:00
R.B. Boyer	5a03536040	prefactor some functions out of the monolithic file	2022-05-19 14:21:29 -05:00
Freddy	b38be4c0ed	Patches to peering initiation for POC demo (#13076 ) Co-authored-by: R.B. Boyer <rb@hashicorp.com>	2022-05-13 13:01:00 -06:00
Freddy	e874b860c0	Actually block when syncing subscriptions (#13066 ) By changing to use WatchCtx we will actually block for changes to the peering list. WatchCh creates a goroutine to collect errors from WatchCtx and returns immediately. The existing behavior wouldn't result in a tight loop because of the rate limiting in the surrounding function, but it would still lead to more work than is necessary.	2022-05-12 17:36:14 -06:00
Evan Culver	0fa5e7be5a	peering: add TrustBundleListByService endpoint (#13048 )	2022-05-12 15:58:22 -07:00
Freddy	4e215dc411	[OSS] Add upsert handling for receiving CheckServiceNode (#13061 )	2022-05-12 15:04:44 -06:00
R.B. Boyer	cc15a11f9c	test: ensure this package uses freeport for port allocation (#13036 )	2022-05-11 14:20:50 -05:00
R.B. Boyer	901fd4dd68	remove remaining shim runStep functions (#13015 ) Wraps up the refactor from #13013	2022-05-10 16:24:45 -05:00
R.B. Boyer	0d6d16ddfb	add general runstep test helper instead of copying it all over the place (#13013 )	2022-05-10 15:25:51 -05:00
FFMMM	37a1e33834	expose meta tags for peering (#12964 )	2022-05-09 13:47:37 -07:00
R.B. Boyer	f507f62f3c	peering: initial sync (#12842 ) - Add endpoints related to peering: read, list, generate token, initiate peering - Update node/service/check table indexing to account for peers - Foundational changes for pushing service updates to a peer - Plumb peer name through Health.ServiceNodes path see: ENT-1765, ENT-1280, ENT-1283, ENT-1283, ENT-1756, ENT-1739, ENT-1750, ENT-1679, ENT-1709, ENT-1704, ENT-1690, ENT-1689, ENT-1702, ENT-1701, ENT-1683, ENT-1663, ENT-1650, ENT-1678, ENT-1628, ENT-1658, ENT-1640, ENT-1637, ENT-1597, ENT-1634, ENT-1613, ENT-1616, ENT-1617, ENT-1591, ENT-1588, ENT-1596, ENT-1572, ENT-1555 Co-authored-by: R.B. Boyer <rb@hashicorp.com> Co-authored-by: freddygv <freddy@hashicorp.com> Co-authored-by: Chris S. Kim <ckim@hashicorp.com> Co-authored-by: Evan Culver <eculver@hashicorp.com> Co-authored-by: Nitya Dhanushkodi <nitya@hashicorp.com>	2022-04-21 17:34:40 -05:00
FFMMM	a46bbe892d	add more labels to RequestRecorder (#12727 ) Co-authored-by: Daniel Nephin <dnephin@hashicorp.com> Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>	2022-04-12 10:50:25 -07:00
FFMMM	5245251bbf	[rpc/middleware][consul] plumb intercept off, add server level happy test (#12692 )	2022-04-06 14:33:05 -07:00
FFMMM	7ed356b338	lower log to trace (#12708 )	2022-04-06 11:37:08 -07:00
FFMMM	1adfd7b94c	polish rpc.service.call metric behavior (#12624 )	2022-03-31 10:49:37 -07:00
FFMMM	c39854de78	fix bad oss sync, use gauges not counters (#12611 )	2022-03-24 14:41:30 -07:00
FFMMM	a7e5ee005a	factor out recording func, add unit tests (#12585 ) Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>	2022-03-22 09:31:54 -07:00
Dan Upton	7298967070	Restructure gRPC server setup (#12586 ) OSS sync of enterprise changes at 0b44395e	2022-03-22 12:40:24 +00:00
FFMMM	e5ebc47a94	pre register new rpc metric, rename metric (#12582 )	2022-03-21 17:26:32 -07:00
FFMMM	db27ea3484	[sync oss] add net/rpc interceptor implementation (#12573 ) * sync ent changes from 866dcb0667 Signed-off-by: FFMMM <FFMMM@users.noreply.github.com> * update oss go.mod Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>	2022-03-17 16:02:26 -07:00
Dan Upton	fdfe079674	streaming: split event buffer by key (#12080 )	2022-01-28 12:27:00 +00:00
Giulio Micheloni	af7b7b5693	Merge branch 'main' into serve-panic-recovery	2021-11-06 16:12:06 +01:00
Daniel Nephin	8ba760a2fc	acl: remove id and revision from Policy constructors The fields were removed in a previous commit. Also remove an unused constructor for PolicyMerger	2021-11-05 15:45:08 -04:00
Daniel Nephin	aea4cc5a6d	acl: remove legacy arg to store.ACLTokenSet And remove the tests for legacy=true	2021-10-25 17:25:14 -04:00
Giulio Micheloni	0c78ddacde	Merge branch 'main' of https://github.com/hashicorp/consul into hashicorp-main	2021-10-16 16:59:32 +01:00
R.B. Boyer	706fc8bcd0	grpc: strip local ACL tokens from RPCs during forwarding if crossing datacenters (#11099 ) Fixes #11086	2021-09-22 13:14:26 -05:00
Giulio Micheloni	655da1fc42	Merge branch 'main' into serve-panic-recovery	2021-08-22 20:31:11 +02:00
Giulio Micheloni	4b0eaa4bff	grpc, xds: recovery middleware to return and log error in case of panic 1) xds and grpc servers: 1.1) to use recovery middleware with callback that prints stack trace to log 1.2) callback turn the panic into a core.Internal error 2) added unit test for grpc server	2021-08-22 19:06:26 +01:00
R.B. Boyer	097e1645e3	agent: ensure that most agent behavior correctly respects partition configuration (#10880 )	2021-08-19 15:09:42 -05:00
R.B. Boyer	310e775a8a	state: partition nodes and coordinates in the state store (#10859 ) Additionally: - partitioned the catalog indexes appropriately for partitioning - removed a stray reference to a non-existent index named "node.checks"	2021-08-17 13:29:39 -05:00
Daniel Nephin	f497d5ab30	acl: remove many instances of authz == nil	2021-07-30 13:58:35 -04:00
R.B. Boyer	fc9b1a277d	sync changes to oss files made in enterprise (#10670 )	2021-07-22 13:58:08 -05:00
R.B. Boyer	188e8dc51f	agent/structs: add a bunch more EnterpriseMeta helper functions to help with partitioning (#10669 )	2021-07-22 13:20:45 -05:00
Daniel Nephin	71b0f0a7a6	structs: remove EnterpriseMeta.GetNamespace I added this recently without realizing that the method already existed and was named NamespaceOrEmpty. Replace all calls to GetNamespace with NamespaceOrEmpty or NamespaceOrDefault as appropriate.	2021-03-09 15:17:26 -05:00
Daniel Nephin	1d2d15b1e1	agent: add a test for streaming in the service health endpoint Co-authored-by: Paul Banks <banks@banksco.de>	2021-02-25 14:08:10 -05:00
Daniel Nephin	d1772ae305	structs: rename EnterpriseMeta constructor To match the Go convention.	2021-02-16 14:45:43 -05:00
Daniel Nephin	b9e60c0775	testing: skip slow tests with -short Add a skip condition to all tests slower than 100ms. This change was made using `gotestsum tool slowest` with data from the last 3 CI runs of master. See https://github.com/gotestyourself/gotestsum#finding-and-skipping-slow-tests With this change: ``` $ time go test -count=1 -short ./agent ok github.com/hashicorp/consul/agent 0.743s real 0m4.791s $ time go test -count=1 -short ./agent/consul ok github.com/hashicorp/consul/agent/consul 4.229s real 0m8.769s ```	2020-12-07 13:42:55 -05:00
Daniel Nephin	fb70c8bac2	stream: document that Payload must be immutable If they are sent to EventPublisher.Publish. Also document that PayloadEvents is expected to come from a subscription and that it is not immutable.	2020-11-06 13:00:33 -05:00
Daniel Nephin	868cfe1eac	stream: Add HasReadPermission to Payload Required now that filter is a method on PayloadEvents instead of Event	2020-11-05 19:17:18 -05:00
Daniel Nephin	a33c50ef0d	Merge pull request #9073 from hashicorp/dnephin/backport-streaming-namespaces streaming: backport namespace changes	2020-11-05 14:19:10 -05:00
Daniel Nephin	c82f6ef2d8	Merge pull request #9061 from hashicorp/dnephin/event-fields stream: support filtering by namespace	2020-11-05 14:18:35 -05:00
Daniel Nephin	b532e092dc	structs: add a namespace test for CheckServiceNode.CanRead	2020-10-30 15:07:04 -04:00
Daniel Nephin	c42fe5ae43	subscribe: set the request namespace	2020-10-30 14:34:04 -04:00
Daniel Nephin	a5dd2001cf	stream: remove Event.Key Makes Payload a type with FilterByKey so that Payloads can implement filtering by key. With this approach we don't need to expose a Namespace field on Event, and we don't need to invest micro formats or require a bunch of code to be aware of exactly how the key field is encoded.	2020-10-28 16:48:04 -04:00
Daniel Nephin	68342a0cb5	proto: remove Event.Key field The field is never used, and the value is available from the payload.	2020-10-28 16:33:00 -04:00
Daniel Nephin	9a1e845be8	proto: remove Event.Namespace field All events are part of a single Topic, so we don't need this field.	2020-10-28 16:33:00 -04:00
Daniel Nephin	3dfb7c224b	stream: Use a no-op event publisher if streaming is disabled	2020-10-28 13:54:19 -04:00
Daniel Nephin	fb57d9b26a	stream: close the subscription on Unsubscribe	2020-10-22 13:39:27 -04:00
Daniel Nephin	c5d57c9f07	subscribe: add test cases for newEventFromStreamEvent	2020-10-08 18:48:17 -04:00
Daniel Nephin	f185124320	subscribe: Add steps to rpc/subscribe tests To make them easier to follow	2020-10-08 15:38:01 -04:00
Daniel Nephin	b103568e98	Merge pull request #8818 from hashicorp/streaming/add-subscribe-service-batch-events stream: handle batch events as a special case of Event	2020-10-07 21:25:32 -04:00
Daniel Nephin	21c21191f4	structs: add CheckServiceNode.CanRead And use it from the subscribe endpoint.	2020-10-07 18:15:13 -04:00
Daniel Nephin	b27068b72a	stream: Return a single event from a subscription.Next Handle batch events as a single event	2020-10-06 13:18:20 -04:00
Daniel Nephin	f5d11562f2	subscribe: update to use NewSnapshotToFollow event	2020-10-06 12:49:35 -04:00
Daniel Nephin	e3290f5971	Move agent/subscribe -> agent/rpc/subscribe	2020-10-06 12:49:35 -04:00

1 2 3

133 Commits (5feefa7863d6c836f32693519fa6d1b2134f89d7)