Commit Graph

137 Commits (2a2e7739082d455c4d07edd59d3ad40d5fc691cd)

Author SHA1 Message Date
Derek Menteer 0ac8ae6c3b
Fix xDS deadlock due to syncLoop termination. (#20867)
* Fix xDS deadlock due to syncLoop termination.

This fixes an issue where agentless xDS streams can deadlock permanently until
a server is restarted. When this issue occurs, no new proxies are able to
successfully connect to the server.

Effectively, the trigger for this deadlock stems from the following return
statement:
https://github.com/hashicorp/consul/blob/v1.18.0/agent/proxycfg-sources/catalog/config_source.go#L199-L202

When this happens, the entire `syncLoop()` terminates and stops consuming from
the following channel:
https://github.com/hashicorp/consul/blob/v1.18.0/agent/proxycfg-sources/catalog/config_source.go#L182-L192

Which results in the `ConfigSource.cleanup()` function never receiving a
response and holding a mutex indefinitely:
https://github.com/hashicorp/consul/blob/v1.18.0/agent/proxycfg-sources/catalog/config_source.go#L241-L247

Because this mutex is shared, it effectively deadlocks the server's ability to
process new xDS streams.

----

The fix to this issue involves removing the `chan chan struct{}` used like an
RPC-over-channels pattern and replacing it with two distinct channels:

+ `stopSyncLoopCh` - indicates that the `syncLoop()` should terminate soon.  +
`syncLoopDoneCh` - indicates that the `syncLoop()` has terminated.

Splitting these two concepts out and deferring a `close(syncLoopDoneCh)` in the
`syncLoop()` function ensures that the deadlock above should no longer occur.

We also now evict xDS connections of all proxies for the corresponding
`syncLoop()` whenever it encounters an irrecoverable error. This is done by
hoisting the new `syncLoopDoneCh` upwards so that it's visible to the xDS delta
processing. Prior to this fix, the behavior was to simply orphan them so they
would never receive catalog-registration or service-defaults updates.

* Add changelog.
2024-03-15 13:57:11 -05:00
Matt Keeler 8fcafb139c
Add `consul snapshot decode` command (#20824)
Add snapshot decoding command
2024-03-14 12:59:06 -04:00
Matt Keeler 1a454b7b47
Make routes controller tests more robust when testing with multiple tenancies (#20743)
When executing the test with multiple tenancies the various mesh controllers remained running across multiple sub-tests. While the resourcetest.Client cleans up resources created with it, resources created by the controllers were not being cleaned up. This could result in subsequent sub tests encountering unexpected values and failing when they should pass.
2024-02-28 11:03:12 -05:00
John Maguire 2ed67ba9ae
[NET-7375] Add Skeleton for APIGW ProxyStateTemplateBuilder (#20628)
* WIP

* got empty state working and cleaned up additional code

* Moved api gateway mapper to it's own file, removed mesh port usage for
api gateway, allow gateway to reconcile without routes

* Update internal/mesh/internal/controllers/gatewayproxy/controller.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* Feedback from PR Review:
  - rename referencedTCPRoutes variable to be more descriptive
  - fetchers are in alphabetical order
  - move log statements to be more indicative of internal state

---------

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
2024-02-16 10:38:08 -05:00
R.B. Boyer 671c436415
mesh: use ComputedImplicitDestinations resource in the sidecar controller (#20553)
Wire the ComputedImplicitDestinations resource into the sidecar controller, replacing the inline version already present.

Also:

- Rewrite the controller to use the controller cache
- Rewrite it to no longer depend on ServiceEndpoints
- Remove the fetcher and (local) cache abstraction
2024-02-12 14:10:33 -06:00
John Maguire ec76090be9
[NET-7450] Fix listenerToProtocol function for input (#20536)
* make listenerProtocolToCatalogProtocol function more forgiving for
different cased input

* update tests
2024-02-12 11:24:56 -05:00
R.B. Boyer 6742340878
mesh: add ComputedImplicitDestinations resource for future use (#20547)
Creates a new controller to create ComputedImplicitDestinations resources by 
composing ComputedRoutes, Services, and ComputedTrafficPermissions to 
infer all ParentRef services that could possibly send some portion of traffic to a 
Service that has at least one accessible Workload Identity. A followup PR will 
rewire the sidecar controller to make use of this new resource.

As this is a performance optimization, rather than a security feature the following 
aspects of traffic permissions have been ignored:

- DENY rules
- port rules (all ports are allowed)

Also:

- Add some v2 TestController machinery to help test complex dependency mappers.
2024-02-09 15:42:10 -06:00
John Maguire 7c3a379e48
Fixes gatewayproxy controller tests for ent (#20543)
fix tests for ent
2024-02-08 18:34:44 +00:00
John Maguire 2ee32b1980
Add tests for gw proxy controller (#20510)
* Added basic tests for gatewayproxy controller

* add copyright header

* Clean up tests
2024-02-07 17:01:10 -05:00
Nathan Coleman 45d645471b
[NET-7414] Reconcile PST for mesh gateway workloads on change to ComputedExportedServices (#20271)
* Reconcile ProxyStateTemplate on change to ComputedExportedServices

* gofmt changeset

---------

Co-authored-by: NiniOak <anita.akaeze@hashicorp.com>
2024-02-07 21:27:13 +00:00
skpratt 57bad0df85
add traffic permissions excludes and tests (#20453)
* add traffic permissions tests

* review fixes

* Update internal/mesh/internal/controllers/sidecarproxy/builder/local_app.go

Co-authored-by: John Landa <jonathanlanda@gmail.com>

---------

Co-authored-by: John Landa <jonathanlanda@gmail.com>
2024-02-07 20:21:44 +00:00
John Maguire 24e9603d9b
Fix Gatewayproxy Controller and Re-Enable APIGW v2 Controller (#20508)
re-enable apigw controller, fix typo in key name for metadata for
gatewayproxy
2024-02-06 18:55:55 +00:00
Matt Keeler 49e6c0232d
Panic for unregistered types (#20476)
* Panic when controllers attempt to make invalid requests to the resource service

This will help to catch bugs in tests that could cause infinite errors to be emitted.

* Disable the API GW v2 controller

With the previous commit, this would cause a server to panic due to watching a type which has not yet been created/registered.

* Ensure that a test server gets the full type registry instead of constructing its own

* Skip TestServer_ControllerDependencies

* Fix peering tests so that they use the full resource registry.
2024-02-06 11:23:06 -05:00
John Maguire 54c974748e
[NET-7280] Add APIGW support to the gatewayproxy controller (#20484)
* Add APIGW support to the gatewayproxy controller

* update copywrite headers
2024-02-06 11:03:37 -05:00
Tauhid Anjum 88b8a1cc36
NET-6776 - Update Routes controller to use ComputedFailoverPolicy CE (#20496)
Update Routes controller to use ComputedFailoverPolicy
2024-02-06 13:28:18 +05:30
Eric Haberkorn d0243b618d
Change the multicluster group to v2 (#20430) 2024-02-01 12:08:26 -05:00
Melisa Griffin 7c00d396cf
[NET-6417] Add validation of MeshGateway name + listeners (#20425)
* Add validation of MeshGateway name + listeners

* Adds test for ValidateMeshGateway

* Fixes data fetcher test for gatewayproxy

---------

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
2024-01-31 18:47:57 -05:00
Nathan Coleman 74e4200d07
[NET-6429] Program ProxyStateTemplate to route cross-partition traffi… (#20410)
[NET-6429] Program ProxyStateTemplate to route cross-partition traffic to the correct destination mesh gateway

* Program mesh port to route wildcarded gateway SNI to the appropriate remote partition's mesh gateway

* Update target + route ports in service endpoint refs when building PST

* Use proper name of local datacenter when constructing SNI for gateway target

* Use destination identities for TLS when routing L4 traffic through the mesh gateway

* Use new constants, move comment to correct location

* Use new constants for port names

* Update test assertions

* Undo debug logging change
2024-01-31 10:46:04 -05:00
Nathan Coleman 21b3c18d5d
Use a full EndpointRef on ComputedRoutes targets instead of just the ID (#20400)
* Use a full EndpointRef on ComputedRoutes targets instead of just the ID

Today, the `ComputedRoutes` targets have the appropriate ID set for their `ServiceEndpoints` reference; however, the `MeshPort` and `RoutePort` are assumed to be that of the target when adding the endpoints reference in the sidecar's `ProxyStateTemplate`.

This is problematic when the target lives behind a `MeshGateway` and the `Mesh/RoutePort` used in the sidecar's `ProxyStateTemplate` should be that of the `MeshGateway` instead of the target.

Instead of assuming the `MeshPort` and `RoutePort` when building the `ProxyStateTemplate` for the sidecar, let's just add the full `EndpointRef` -- including the ID and the ports -- when hydrating the computed destinations.

* Make sure the UID from the existing ServiceEndpoints makes it onto ComputedRoutes

* Update test assertions

* Undo confusing whitespace change

* Remove one-line function wrapper

* Use plural name for endpoints ref

* Add constants for gateway name, kind and port names
2024-01-30 16:25:44 -05:00
Matt Keeler 34a32d4ce5
Remove V2 PeerName field from pbresource.Tenancy (#19865)
The peer name will eventually show up elsewhere in the resource. For now though this rips it out of where we don’t want it to be.
2024-01-29 15:08:31 -05:00
Nitya Dhanushkodi 92aab7ea31
[NET-5586][rebased] v2: Support virtual port references in config (#20371)
[OG Author: michael.zalimeni@hashicorp.com, rebase needed a separate PR]

* v2: support virtual port in Service port references

In addition to Service target port references, allow users to specify a
port by stringified virtual port value. This is useful in environments
such as Kubernetes where typical configuration is written in terms of
Service virtual ports rather than workload (pod) target port names.

Retaining the option of referencing target ports by name supports VMs,
Nomad, and other use cases where virtual ports are not used by default.

To support both uses cases at once, we will strictly interpret port
references based on whether the value is numeric. See updated
`ServicePort` docs for more details.

* v2: update service ref docs for virtual port support

Update proto and generated .go files with docs reflecting virtual port
reference support.

* v2: add virtual port references to L7 topo test

Add coverage for mixed virtual and target port references to existing
test.

* update failover policy controller tests to work with computed failover policy and assert error conditions against FailoverPolicy and ComputedFailoverPolicy resources

* accumulate services; don't overwrite them in enterprise

---------

Co-authored-by: Michael Zalimeni <michael.zalimeni@hashicorp.com>
Co-authored-by: R.B. Boyer <rb@hashicorp.com>
2024-01-29 10:43:41 -08:00
Nathan Coleman 27aecdb8cc
[NET-5075] Implement mesh gateway mode for explicit destinations (#20361) 2024-01-26 17:17:18 -05:00
Nitya Dhanushkodi 0ec7bddb9a
[Net-5594][Net-7466] v2: Only route to endpoints that implement the port being routed to, and make xdscontroller and xdsv2 golden tests use tenancy (#20356)
* If a workload does not implement a port, it should not be included in the list of endpoints for the Envoy cluster for that port.

* Adds tenancy tests for xds controller and xdsv2 resource generation, and adds all those files.

* The original change in this PR was for filtering the list of endpoints by the port being routed to (bullet 1). Since I made changes to sidecarproxycontroller golden files, I realized some of the golden files were unused because of the tenancy changes, so when I deleted those, that broke xds controller tests which weren't correctly using tenancy. So when I fixed that, then the xdsv2 tests broke, so I added tenancy support there too. So now, from sidecarproxy controller -> xds controller -> xdsv2 we now have tenancy support and all the golden files are lined up.
2024-01-26 10:07:21 -08:00
sarahalsmiller 37ebaa6920
Net 7155- Consul API Gateway Controller Stub Work (#20324)
* API Gateway proto

* fix lint issue

* new line

* run make proto format

* checkpoint

* stub

* Update internal/mesh/internal/controllers/apigateways/controller.go
2024-01-25 23:16:20 +00:00
John Maguire cfe4d59938
[NET-7265] Panic when passing an incorrect type to the data fetcher for gatewayproxy (#20238)
* panic when passing an incorrect type to the data fetcher

* Add assertions for sidecarproxy datafetcher as well

* rename assertion function

* Add in comments to ensure devs know about potential panics for using
invalid types

* fix method call
2024-01-24 14:16:56 -05:00
skpratt 0abf8f8426
Net 5092/internal l7 traffic permissions (#20276)
* wire up L7 Traffic Permissions

* testing

* update comment
2024-01-23 20:07:58 -06:00
Nathan Coleman 995ba32cc0
Use null route cluster for default router when no matches on v2 mesh gateway (#20270)
* Use black hole cluster for default router when no matches

* Update test assertions

* Use null route cluster instead of black hole cluster concept

* Update test assertions
2024-01-22 10:50:04 -08:00
Matt Keeler f9c04881f9
Failover policy cache (#20244)
* Migrate the Failover controller to use the controller cache
* Remove the Catalog FailoverMapper and its usage in the mesh routes controller.
2024-01-19 09:35:34 -05:00
Nathan Coleman c40b59823a
[NET-6431] Remove explicit endpoints function from PST builder (#20262)
This isn't needed since we just populate RequiredEndpoints, which is already done for the base case
2024-01-18 19:13:37 -05:00
Matt Keeler 59cb12c798
Migrate the Endpoints controller to use the controller cache (#20241)
* Add cache resource decoding helpers

* Implement a common package for workload selection facilities. This includes:

   * Controller cache Index
   * ACL hooks
   * Dependency Mapper to go from workload to list of resources which select it
   * Dependency Mapper to go from a resource which selects workloads to all the workloads it selects.

* Update the endpoints controller to use the cache instead of custom mappers.

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
2024-01-18 17:52:52 -05:00
John Maguire 7888d00e49
[NET6429] Add listeners for mesh-gateway v2 (#20253)
Add listeners for mesh-gateway v2
2024-01-18 17:52:06 +00:00
Nathan Coleman d2e991ddfc
Remove unnecessary fetching of gateway (#20172)
The fetched gateway isn't currently used anywhere
2024-01-17 14:13:13 -05:00
Michael Zalimeni 76b5de5039
[NET-4968] Upgrade Go to 1.21 (#20062)
* Upgrade Go to 1.21

* ci: detect Go backwards compatibility test version automatically

For our submodules and other places we choose to test against previous
Go versions, detect this version automatically from the current one
rather than hard-coding it.
2024-01-12 09:57:38 -05:00
John Maguire c6c2d8bf82
[NET-6426] Modify Reconcile Loop for Mesh Gateway Resources to Correctly Write Proxy State Template (#20085) 2024-01-08 23:26:00 -05:00
Kumar Kavish 9c8e9cebaa
[NET-6765] Audit the routes controller and add missing tenancy tests (#20016)
- moved resources to different tenancies.
2023-12-28 16:00:18 +05:30
Nathan Coleman ab60fec15a
[NET-6426] Add gateway proxy controller that generates empty proxy state template (#19901)
* NET-6426 Create ProxyStateTemplate when reconciling MeshGateway resource

* Add TODO for switching fetch method based on gateway type

* Use gateway-kind in workload metadata instead of owner reference

* Create ProxyStateTemplate builder for gatewayproxy controller

* Update to use new controller interface

* Add copyright headers

* Set correct name for ProxyStateTemplate identity reference

* Generate empty ProxyStateTemplate by fetching MeshGateway

This cheats and looks up the MeshGateway directly. In the future, we will need a Workload => xGateway mapper

* Specify owner reference when writing ProxyStateTemplate

* Update dependency mapper to account for multiple controllers per resource type

* Regenerate v2 resource dependencies map

* Add helpful trace logs, tag TODOs with ticket identifiers
2023-12-21 16:37:47 -05:00
Nathan Coleman 874e68f1eb
[NET-6899] Create name-aligned Service when reconciling MeshGateway resource (#19900)
* NET-6899 Create name-aligned Service when reconciling MeshGateway resource

The Service has an owner reference added to it indicating that it belongs to a MeshGateway

* Specify port list when creating Service

* Use constants, add TODO w/ ticket reference

* Include gateway-kind in metadata of Service resource
2023-12-21 13:26:25 -05:00
Nathan Coleman 010bf533d1
NET-6663 Modify sidecarproxy controller to skip xGateway resources (#19902)
* NET-6663 Modify sidecarproxy controller to skip xGateway resources

* Check workload metadata after nil-check for workload

* Add test asserting that workloads with meta gateway-kind are ignored

* Use more common pattern for map access to increase readability
2023-12-18 21:54:41 +00:00
aahel a6496898de
added tenancy to TestBuildL4TrafficPermissions (#19932) 2023-12-14 10:41:24 +05:30
Matt Keeler 123bc95e1a
Add Common Controller Caching Infrastructure (#19767)
* Add Common Controller Caching Infrastructure
2023-12-13 10:06:39 -05:00
Matt Keeler efe279f802
Retry lint fixes (#19151)
* Add a make target to run lint-consul-retry on all the modules
* Cleanup sdk/testutil/retry
* Fix a bunch of retry.Run* usage to not use the outer testing.T
* Fix some more recent retry lint issues and pin to v1.4.0 of lint-consul-retry
* Fix codegen copywrite lint issues
* Don’t perform cleanup after each retry attempt by default.
* Use the common testutil.TestingTB interface in test-integ/tenancy
* Fix retry tests
* Update otel access logging extension test to perform requests within the retry block
2023-12-06 12:11:32 -05:00
Semir Patel c1bbda8128
resource: block default namespace deletion + test refactorings (#19822) 2023-12-05 14:00:06 -05:00
Ashesh Vidyut 82f6a8d7f3
Net 6585 (#19797)
Add multi tenancy to sidecar proxy controller
2023-12-01 21:28:57 +05:30
Michael Zalimeni d1f2fa1841
[NET-6725] test: Address occasional flakes in sidecarproxy/controller_test.go (#19760)
test: Address occasional flakes in sidecarproxy/controller_test.go

We've observed an occasional flake in this test where some state check
fails. Adding in some wait wrappers to these state checks will hopefully
address the issue, assuming it is a simple flake.
2023-11-29 16:56:14 +00:00
Thomas Eckert 419677cc9e
[NET-6420] Add MeshConfiguration Controller stub (#19745)
* Add meshconfiguration/controller

* Add MeshConfiguration Registration function

* Fix the TODOs on the RegisterMeshGateway function

* Call RegisterMeshConfiguration

* Add comment to MeshConfigurationRegistration

* Add a test for Reconcile and some comments
2023-11-28 18:56:07 +00:00
Chris S. Kim 5107764115
Move test setup out of subtest (#19753) 2023-11-28 18:39:37 +00:00
Ganesh S ba2422596f
Add tenancy tests for routes controller (#19706) 2023-11-22 21:52:10 +05:30
Ganesh S 4020c002d6
Add tenancy tests for proxy cfg controller (#19649) 2023-11-15 21:36:08 +05:30
Ashesh Vidyut d68a23aa85
NET 6539 - Add tenancy tests for folder - internal/mesh/internal/controllers/sidecarproxy (#19646)
* Add tenancy tests for folder - internal/mesh/internal/controllers/sidecarproxy

* removed rej files

* added missed out file
2023-11-15 13:49:40 +05:30
Ashesh Vidyut 443461318a
NET 6525 (#19645)
Removed resourcetest func
2023-11-15 06:32:15 +00:00