consul

Commit Graph

Author	SHA1	Message	Date
John Murret	7c6a3c83f2	NET-7165 - v2 - add service questions (#20390 ) * NET-7165 - v2 - add service questions * removing extraneous copied over code from autogen PR script. * fixing license checking	2024-01-29 22:33:45 +00:00
Melissa Kam	3b9bb8d6f9	[CC-7044] Start HCP manager as part of link creation (#20312 ) * Check for ACL write permissions on write Link eventually will be creating a token, so require acl:write. * Convert Run to Start, only allow to start once * Always initialize HCP components at startup * Support for updating config and client * Pass HCP manager to controller * Start HCP manager in link resource Start as part of link creation rather than always starting. Update the HCP manager with values from the link before starting as well. * Fix metrics sink leaked goroutine * Remove the hardcoded disabled hostname prefix The HCP metrics sink will always be enabled, so the length of sinks will always be greater than zero. This also means that we will also always default to prefixing metrics with the hostname, which is what our documentation states is the expected behavior anyway. * Add changelog * Check and set running status in one method * Check for primary datacenter, add back test * Clarify merge reasoning, fix timing issue in test * Add comment about controller placement * Expand on breaking change, fix typo in changelog	2024-01-29 16:31:44 -06:00
Dan Stough	0ca7313b07	feat(v2dns): add PTR query support (#20362 )	2024-01-29 11:40:10 -05:00
Tauhid Anjum	5d294b26d3	NET-5824 Exported services api (#20015 ) * Exported services api implemented * Tests added, refactored code * Adding server tests * changelog added * Proto gen added * Adding codegen changes * changing url, response object * Fixing lint error by having namespace and partition directly * Tests changes * refactoring tests * Simplified uniqueness logic for exported services, sorted the response in order of service name * Fix lint errors, refactored code	2024-01-23 10:06:59 +05:30
Dan Stough	97ae244d8a	feat(v2dns): add grpc DNS support (#20296 )	2024-01-22 10:10:03 -05:00
John Murret	938d2315e0	DNS v2 - add virtual ip questions (#20245 )	2024-01-17 23:46:18 +00:00
R.B. Boyer	7f9ed032fd	agent: remove data race in agent config (#20200 ) To fix an issue displaying the current reloaded config in the v1/agent/self endpoint #18681 caused the agent's internal config struct member to be deepcopied and replaced on reload. This is not safe because the field is not protected by a lock, nor should it be due to how it is accessed by the rest of the system. This PR does the same deepcopy, but into a new field solely for the point of capturing the current reloaded values for display purposes. If there has been no reload then the original config is used.	2024-01-12 15:11:21 -06:00
Dan Stough	d52e80b619	[OSS] feat: add experiments flag for v2 dns and skeleton interfaces (#20115 ) feat: add experiments flag for v2 dns and skeleton interfaces	2024-01-10 11:19:20 -05:00
Melissa Kam	5dc8eabcce	[CC-7041] Update and start the SCADA provider in HCP manager (#19976 ) * Update SCADA provider version Also update mocks for SCADA provider. * Create SCADA provider w/o HCP config, then update Adds a placeholder config option to allow us to initialize a SCADA provider without the HCP configuration. Also adds an update method to then add the HCP configuration. We need this to be able to eventually always register a SCADA listener at startup before the HCP config values are known. * Pass cloud configuration to HCP manager Save the entire cloud configuration and pass it to the HCP manager. * Update and start SCADA provider in HCP manager Move config updating and starting to the HCP manager. The HCP manager will eventually be responsible for all processes that contribute to linking to HCP.	2024-01-08 09:49:29 -06:00
cskh	15b40f36f3	Use safeio to write server metadata file (#20101 ) * Use safeio to write server metadata file * guard the conversion	2024-01-05 14:46:19 -05:00
Derek Menteer	8f4c43727d	[NET-5916] Fix locality-aware routing config and tests (CE) (#19483 ) Fix locality-aware routing config and tests	2023-11-02 14:05:06 -05:00
John Murret	f0cf8f2f40	NET-6294 - v1 Agentless proxycfg datasource errors after v2 changes (#19365 )	2023-10-27 14:06:38 -06:00
Derek Menteer	48c4a5b736	Add grpc keepalive configuration. (#19339 ) Prior to the introduction of this configuration, grpc keepalive messages were sent after 2 hours of inactivity on the stream. This posed issues in various scenarios where the server-side xds connection balancing was unaware that envoy instances were uncleanly killed / force-closed, since the connections would only be cleaned up after ~5 minutes of TCP timeouts occurred. Setting this config to a 30 second interval with a 20 second timeout ensures that at most, it should take up to 50 seconds for a dead xds connection to be closed.	2023-10-24 08:05:31 -05:00
R.B. Boyer	b9ab63c55d	server: when the v2 catalog experiment is enabled reject api and rpc requests that are for the v1 catalog (#19129 ) When the v2 catalog experiment is enabled the old v1 catalog apis will be forcibly disabled at both the API (json) layer and the RPC (msgpack) layer. This will also disable anti-entropy as it uses the v1 api. This includes all of /v1/catalog/, /v1/health/, most of /v1/agent/, /v1/config/, and most of /v1/internal/*.	2023-10-11 10:44:03 -05:00
Eric Haberkorn	170417ac97	Honor Default Traffic Permissions in V2 (#18886 ) wire up v2 default traffic permissions	2023-09-19 10:42:32 -04:00
Nitya Dhanushkodi	78b170ad50	xds controller: setup watches for and compute leaf cert references in ProxyStateTemplate, and wire up leaf cert manager dependency (#18756 ) * Refactors the leafcert package to not have a dependency on agent/consul and agent/cache to avoid import cycles. This way the xds controller can just import the leafcert package to use the leafcert manager. The leaf cert logic in the controller: * Sets up watches for leaf certs that are referenced in the ProxyStateTemplate (which generates the leaf certs too). * Gets the leaf cert from the leaf cert cache * Stores the leaf cert in the ProxyState that's pushed to xds * For the cert watches, this PR also uses a bimapper + a thin wrapper to map leaf cert events to related ProxyStateTemplates Since bimapper uses a resource.Reference or resource.ID to map between two resource types, I've created an internal type for a leaf certificate to use for the resource.Reference, since it's not a v2 resource. The wrapper allows mapping events to resources (as opposed to mapping resources to resources) The controller tests: Unit: Ensure that we resolve leaf cert references Lifecycle: Ensure that when the CA is updated, the leaf cert is as well Also adds a new spiffe id type, and adds workload identity and workload identity URI to leaf certs. This is so certs are generated with the new workload identity based SPIFFE id. * Pulls out some leaf cert test helpers into a helpers file so it can be used in the xds controller tests. * Wires up leaf cert manager dependency * Support getting token from proxytracker * Add workload identity spiffe id type to the authorize and sign functions --------- Co-authored-by: John Murret <john.murret@hashicorp.com>	2023-09-12 12:56:43 -07:00
John Murret	62062fd4fd	NET-5132 - Configure multiport routing for connect proxies in TProxy mode (#18606 ) * mesh-controller: handle L4 protocols for a proxy without upstreams * sidecar-controller: Support explicit destinations for L4 protocols and single ports. * This controller generates and saves ProxyStateTemplate for sidecar proxies. * It currently supports single-port L4 ports only. * It keeps a cache of all destinations to make it easier to compute and retrieve destinations. * It will update the status of the pbmesh.Upstreams resource if anything is invalid. * endpoints-controller: add workload identity to the service endpoints resource * small fixes * review comments * Address PR comments * sidecar-proxy controller: Add support for transparent proxy This currently does not support inferring destinations from intentions. * PR review comments * mesh-controller: handle L4 protocols for a proxy without upstreams * sidecar-controller: Support explicit destinations for L4 protocols and single ports. * This controller generates and saves ProxyStateTemplate for sidecar proxies. * It currently supports single-port L4 ports only. * It keeps a cache of all destinations to make it easier to compute and retrieve destinations. * It will update the status of the pbmesh.Upstreams resource if anything is invalid. * endpoints-controller: add workload identity to the service endpoints resource * small fixes * review comments * Make sure endpoint refs route to mesh port instead of an app port * Address PR comments * fixing copyright * tidy imports * sidecar-proxy controller: Add support for transparent proxy This currently does not support inferring destinations from intentions. * tidy imports * add copyright headers * Prefix sidecar proxy test files with source and destination. * Update controller_test.go * NET-5132 - Configure multiport routing for connect proxies in TProxy mode * formatting golden files * reverting golden files and adding changes in manually. build implicit destinations still has some issues. * fixing files that were incorrectly repeating the outbound listener * PR comments * extract AlpnProtocol naming convention to getAlpnProtocolFromPortName(portName) * removing address level filtering. * adding license to resources_test.go --------- Co-authored-by: Iryna Shustava <iryna@hashicorp.com> Co-authored-by: R.B. Boyer <rb@hashicorp.com> Co-authored-by: github-team-consul-core <github-team-consul-core@hashicorp.com>	2023-09-12 01:17:56 +00:00
Semir Patel	576ffdf705	fix: emit consul version metric on a regular interval (#18724 )	2023-09-08 13:09:07 -05:00
Gerard Nguyen	56d6e54ac7	fix: NET-1521 show latest config in /v1/agent/self (#18681 ) * fix: NET-1521 show latest config in /v1/agent/self	2023-09-08 09:47:31 +10:00
Phil Porada	7ea986783d	Add TCP+TLS Healthchecks (#18381 ) * Begin adding TCPUseTLS * More TCP with TLS plumbing * Making forward progress * Keep on adding TCP+TLS support for healthchecks * Removed too many lines * Unit tests for TCP+TLS * Update tlsutil/config.go Co-authored-by: Samantha <hello@entropy.cat> * Working on the tcp+tls unit test * Updated the runtime integration tests * Progress * Revert this file back to HEAD * Remove debugging lines * Implement TLS enabled TCP socket server and make a successful TCP+TLS healthcheck on it * Update docs * Update agent/agent_test.go Co-authored-by: Samantha <hello@entropy.cat> * Update website/content/docs/ecs/configuration-reference.mdx Co-authored-by: Samantha <hello@entropy.cat> * Update website/content/docs/ecs/configuration-reference.mdx Co-authored-by: Samantha <hello@entropy.cat> * Update agent/checks/check.go Co-authored-by: Samantha <hello@entropy.cat> * Address comments * Remove extraneous bracket * Update agent/agent_test.go Co-authored-by: Samantha <hello@entropy.cat> * Update agent/agent_test.go Co-authored-by: Samantha <hello@entropy.cat> * Update website/content/docs/ecs/configuration-reference.mdx Co-authored-by: Samantha <hello@entropy.cat> * Update the mockTLSServer * Remove trailing newline * Address comments * Fix merge problem * Add changelog entry --------- Co-authored-by: Samantha <hello@entropy.cat>	2023-09-05 13:34:44 -07:00
Derek Menteer	b56fbc7a62	[NET-4958] Fix issue where envoy endpoints would fail to populate after snapshot restore (#18636 ) Fix issue where agentless endpoints would fail to populate after snapshot restore. Fixes an issue that was introduced in #17775. This issue happens because a long-lived pointer to the state store is held, which is unsafe to do. Snapshot restorations will swap out this state store, meaning that the proxycfg watches would break for agentless.	2023-09-01 10:18:10 -05:00
Ashwin Venkatesh	797e42dc24	Watch the ProxyTracker from xDS controller (#18611 )	2023-08-29 14:39:29 -07:00
John Murret	0e606504bc	NET-4944 - wire up controllers with proxy tracker (#18603 ) Co-authored-by: github-team-consul-core <github-team-consul-core@hashicorp.com>	2023-08-29 09:15:34 -06:00
John Murret	051f250edb	NET-5338 - NET-5338 - Run a v2 mode xds server (#18579 ) * NET-5338 - NET-5338 - Run a v2 mode xds server * fix linting	2023-08-24 16:44:14 -06:00
Semir Patel	53e28a4963	OSS -> CE (community edition) changes (#18517 )	2023-08-22 09:46:03 -05:00
hashicorp-copywrite[bot]	5fb9df1640	[COMPLIANCE] License changes (#18443 ) * Adding explicit MPL license for sub-package This directory and its subdirectories (packages) contain files licensed with the MPLv2 `LICENSE` file in this directory and are intentionally licensed separately from the BSL `LICENSE` file at the root of this repository. * Adding explicit MPL license for sub-package This directory and its subdirectories (packages) contain files licensed with the MPLv2 `LICENSE` file in this directory and are intentionally licensed separately from the BSL `LICENSE` file at the root of this repository. * Updating the license from MPL to Business Source License Going forward, this project will be licensed under the Business Source License v1.1. Please see our blog post for more details at <Blog URL>, FAQ at www.hashicorp.com/licensing-faq, and details of the license at www.hashicorp.com/bsl. * add missing license headers * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 --------- Co-authored-by: hashicorp-copywrite[bot] <110428419+hashicorp-copywrite[bot]@users.noreply.github.com>	2023-08-11 09:12:13 -04:00
Poonam Jadhav	5208ea90e4	NET-4657/add resource service client (#18053 ) ### Description <!-- Please describe why you're making this change, in plain English. --> Dan had already started on this [task](https://github.com/hashicorp/consul/pull/17849) which is needed to start building the HTTP APIs. This just needed some cleanup to get it ready for review. Overview: - Rename `internalResourceServiceClient` to `insecureResourceServiceClient` for name consistency - Configure a `secureResourceServiceClient` with auth enabled ### PR Checklist * [ ] ~updated test coverage~ * [ ] ~external facing docs updated~ * [x] appropriate backport labels added * [ ] ~not a security concern~	2023-07-14 14:09:02 -04:00
Vijay	2f20c77e4d	Displays Consul version of each nodes in UI nodes section (#17754 ) * update UINodes and UINodeInfo response with consul-version info added as NodeMeta, fetched from serf members * update test cases TestUINodes, TestUINodeInfo * added nil check for map * add consul-version in local agent node metadata * get consul version from serf member and add this as node meta in catalog register request * updated ui mock response to include consul versions as node meta * updated ui trans and added version as query param to node list route * updates in ui templates to display consul version with filter and sorts * updates in ui - model class, serializers,comparators,predicates for consul version feature * added change log for Consul Version Feature * updated to get version from consul service, if for some reason not available from serf * updated changelog text * updated dependent testcases * multiselection version filter * Update agent/consul/state/catalog.go comments updated Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com> --------- Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>	2023-07-12 13:34:39 -06:00
Ashesh Vidyut	2af6bc434a	feature - [NET - 4005] - [Supportability] Reloadable Configuration - enable_debug (#17565 ) * # This is a combination of 9 commits. # This is the 1st commit message: init without tests # This is the commit message #2: change log # This is the commit message #3: fix tests # This is the commit message #4: fix tests # This is the commit message #5: added tests # This is the commit message #6: change log breaking change # This is the commit message #7: removed breaking change # This is the commit message #8: fix test # This is the commit message #9: keeping the test behaviour same * # This is a combination of 12 commits. # This is the 1st commit message: init without tests # This is the commit message #2: change log # This is the commit message #3: fix tests # This is the commit message #4: fix tests # This is the commit message #5: added tests # This is the commit message #6: change log breaking change # This is the commit message #7: removed breaking change # This is the commit message #8: fix test # This is the commit message #9: keeping the test behaviour same # This is the commit message #10: made enable debug atomic bool # This is the commit message #11: fix lint # This is the commit message #12: fix test true enable debug * parent `10f500e895` author absolutelightning <ashesh.vidyut@hashicorp.com> 1687352587 +0530 committer absolutelightning <ashesh.vidyut@hashicorp.com> 1687352592 +0530 init without tests change log fix tests fix tests added tests change log breaking change removed breaking change fix test keeping the test behaviour same made enable debug atomic bool fix lint fix test true enable debug using enable debug in agent as atomic bool test fixes fix tests fix tests added update on correct locaiton fix tests fix reloadable config enable debug fix tests fix init and acl 403 * revert commit	2023-06-30 08:30:29 +05:30
Derek Menteer	04edace1de	Fix issue with streaming service health watches. (#17775 ) Fix issue with streaming service health watches. This commit fixes an issue where the health streams were unaware of service export changes. Whenever an exported-services config entry is modified, it is effectively an ACL change. The bug would be triggered by the following situation: - no services are exported - an upstream watch to service X is spawned - the streaming backend filters out data for service X (due to lack of exports) - service X is finally exported In the situation above, the streaming backend does not trigger a refresh of its data. This means that any events that were supposed to have been received prior to the export are NOT backfilled, and the watches never see service X spawning. We currently have decided to not trigger a stream refresh in this situation due to the potential for a thundering herd effect (touching exports would cause a re-fetch of all watches for that partition, potentially). Therefore, a local blocking-query approach was added by this commit for agentless. It's also worth noting that the streaming subscription is currently bypassed most of the time with agentful, because proxycfg has a `req.Source.Node != ""` which prevents the `streamingEnabled` check from passing. This means that while agents should technically have this same issue, they don't experience it with mesh health watches. Note that this is a temporary fix that solves the issue for proxycfg, but not service-discovery use cases.	2023-06-15 12:46:58 -05:00
R.B. Boyer	72f991d8d3	agent: remove agent cache dependency from service mesh leaf certificate management (#17075 ) * agent: remove agent cache dependency from service mesh leaf certificate management This extracts the leaf cert management from within the agent cache. This code was produced by the following process: 1. All tests in agent/cache, agent/cache-types, agent/auto-config, agent/consul/servercert were run at each stage. - The tests in agent matching .Leaf were run at each stage. - The tests in agent/leafcert were run at each stage after they existed. 2. The former leaf cert Fetch implementation was extracted into a new package behind a "fake RPC" endpoint to make it look almost like all other cache type internals. 3. The old cache type was shimmed to use the fake RPC endpoint and generally cleaned up. 4. I selectively duplicated all of Get/Notify/NotifyCallback/Prepopulate from the agent/cache.Cache implementation over into the new package. This was renamed as leafcert.Manager. - Code that was irrelevant to the leaf cert type was deleted (inlining blocking=true, refresh=false) 5. Everything that used the leaf cert cache type (including proxycfg stuff) was shifted to use the leafcert.Manager instead. 6. agent/cache-types tests were moved and gently replumbed to execute as-is against a leafcert.Manager. 7. Inspired by some of the locking changes from derek's branch I split the fat lock into N+1 locks. 8. The waiter chan struct{} was eventually replaced with a singleflight.Group around cache updates, which was likely the biggest net structural change. 9. The awkward two layers or logic produced as a byproduct of marrying the agent cache management code with the leaf cert type code was slowly coalesced and flattened to remove confusion. 10. The .Leaf tests from the agent package were copied and made to work directly against a leafcert.Manager to increase direct coverage. I have done a best effort attempt to port the previous leaf-cert cache type's tests over in spirit, as well as to take the e2e-ish tests in the agent package with Leaf in the test name and copy those into the agent/leafcert package to get more direct coverage, rather than coverage tangled up in the agent logic. There is no net-new test coverage, just coverage that was pushed around from elsewhere.	2023-06-13 10:54:45 -05:00
Ronald	8118aae5c1	Add writeAuditRPCEvent to agent_oss (#17607 ) * Add writeAuditRPCEvent to agent_oss * fix the other diffs * backport change log	2023-06-07 22:35:48 +00:00
cskh	cf4059f3ce	chore: fix the error message format (#17554 )	2023-06-02 13:37:44 +00:00
skpratt	fdda7adeaa	issue a warning if major FIPS assumptions are broken (#17524 )	2023-05-31 09:01:44 -05:00
Dan Bond	8dee353492	agent: don't write server metadata in dev mode (#17383 ) Signed-off-by: Dan Bond <danbond@protonmail.com>	2023-05-16 02:50:27 -07:00
Dan Bond	95f462d5f1	agent: prevent very old servers re-joining a cluster with stale data (#17171 ) * agent: configure server lastseen timestamp Signed-off-by: Dan Bond <danbond@protonmail.com> * use correct config Signed-off-by: Dan Bond <danbond@protonmail.com> * add comments Signed-off-by: Dan Bond <danbond@protonmail.com> * use default age in test golden data Signed-off-by: Dan Bond <danbond@protonmail.com> * add changelog Signed-off-by: Dan Bond <danbond@protonmail.com> * fix runtime test Signed-off-by: Dan Bond <danbond@protonmail.com> * agent: add server_metadata Signed-off-by: Dan Bond <danbond@protonmail.com> * update comments Signed-off-by: Dan Bond <danbond@protonmail.com> * correctly check if metadata file does not exist Signed-off-by: Dan Bond <danbond@protonmail.com> * follow instructions for adding new config Signed-off-by: Dan Bond <danbond@protonmail.com> * add comments Signed-off-by: Dan Bond <danbond@protonmail.com> * update comments Signed-off-by: Dan Bond <danbond@protonmail.com> * Update agent/agent.go Co-authored-by: Dan Upton <daniel@floppy.co> * agent/config: add validation for duration with min Signed-off-by: Dan Bond <danbond@protonmail.com> * docs: add new server_rejoin_age_max config definition Signed-off-by: Dan Bond <danbond@protonmail.com> * agent: add unit test for checking server last seen Signed-off-by: Dan Bond <danbond@protonmail.com> * agent: log continually for 60s before erroring Signed-off-by: Dan Bond <danbond@protonmail.com> * pr comments Signed-off-by: Dan Bond <danbond@protonmail.com> * remove unneeded todo * agent: fix error message Signed-off-by: Dan Bond <danbond@protonmail.com> --------- Signed-off-by: Dan Bond <danbond@protonmail.com> Co-authored-by: Dan Upton <daniel@floppy.co>	2023-05-15 04:05:47 -07:00
Freddy	e02ef16f02	Update HCP bootstrapping to support existing clusters (#16916 ) * Persist HCP management token from server config We want to move away from injecting an initial management token into Consul clusters linked to HCP. The reasoning is that by using a separate class of token we can have more flexibility in terms of allowing HCP's token to co-exist with the user's management token. Down the line we can also more easily adjust the permissions attached to HCP's token to limit it's scope. With these changes, the cloud management token is like the initial management token in that iit has the same global management policy and if it is created it effectively bootstraps the ACL system. * Update SDK and mock HCP server The HCP management token will now be sent in a special field rather than as Consul's "initial management" token configuration. This commit also updates the mock HCP server to more accurately reflect the behavior of the CCM backend. * Refactor HCP bootstrapping logic and add tests We want to allow users to link Consul clusters that already exist to HCP. Existing clusters need care when bootstrapped by HCP, since we do not want to do things like change ACL/TLS settings for a running cluster. Additional changes: * Deconstruct MaybeBootstrap so that it can be tested. The HCP Go SDK requires HTTPS to fetch a token from the Auth URL, even if the backend server is mocked. By pulling the hcp.Client creation out we can modify its TLS configuration in tests while keeping the secure behavior in production code. * Add light validation for data received/loaded. * Sanitize initial_management token from received config, since HCP will only ever use the CloudConfig.MangementToken. * Add changelog entry	2023-04-27 22:27:39 +02:00
Michael Wilkerson	0dd4ea2033	* added Sameness Group to proto files (#16998 ) - added Sameness Group to config entries - added Sameness Group to subscriptions * generated proto files * added Sameness Group events to the state store - added test cases * Refactored health RPC Client - moved code that is common to rpcclient under rpcclient common.go. This will help set us up to support future RPC clients * Refactored proxycfg glue views - Moved views to rpcclient config entry. This will allow us to reuse this code for a config entry client * added config entry RPC Client - Copied most of the testing code from rpcclient/health * hooked up new rpcclient in agent * fixed documentation and comments for clarity	2023-04-14 09:24:46 -07:00
Poonam Jadhav	8255cc97f5	feat: add reporting config with reload (#16890 )	2023-04-11 15:04:02 -04:00
Ronald	94ec4eb2f4	copyright headers for agent folder (#16704 ) * copyright headers for agent folder * Ignore test data files * fix proto files and remove headers in agent/uiserver folder * ignore deep-copy files	2023-03-28 14:39:22 -04:00
Eric Haberkorn	57e2493415	allow setting locality on services and nodes (#16581 )	2023-03-10 09:36:15 -05:00
Eric Haberkorn	dbaf8bf49c	add agent locality and replicate it across peer streams (#16522 )	2023-03-07 14:05:23 -05:00
R.B. Boyer	9a485cdb49	proxycfg: ensure that an irrecoverable error in proxycfg closes the xds session and triggers a replacement proxycfg watcher (#16497 ) Receiving an "acl not found" error from an RPC in the agent cache and the streaming/event components will cause any request loops to cease under the assumption that they will never work again if the token was destroyed. This prevents log spam (#14144, #9738). Unfortunately due to things like: - authz requests going to stale servers that may not have witnessed the token creation yet - authz requests in a secondary datacenter happening before the tokens get replicated to that datacenter - authz requests from a primary TO a secondary datacenter happening before the tokens get replicated to that datacenter The caller will get an "acl not found" before the token exists, rather than just after. The machinery added above in the linked PRs will kick in and prevent the request loop from looping around again once the tokens actually exist. For `consul-dataplane` usages, where xDS is served by the Consul servers rather than the clients ultimately this is not a problem because in that scenario the `agent/proxycfg` machinery is on-demand and launched by a new xDS stream needing data for a specific service in the catalog. If the watching goroutines are terminated it ripples down and terminates the xDS stream, which CDP will eventually re-establish and restart everything. For Consul client usages, the `agent/proxycfg` machinery is ahead-of-time launched at service registration time (called "local" in some of the proxycfg machinery) so when the xDS stream comes in the data is already ready to go. If the watching goroutines terminate it should terminate the xDS stream, but there's no mechanism to re-spawn the watching goroutines. If the xDS stream reconnects it will see no `ConfigSnapshot` and will not get one again until the client agent is restarted, or the service is re-registered with something changed in it. This PR fixes a few things in the machinery: - there was an inadvertent deadlock in fetching snapshot from the proxycfg machinery by xDS, such that when the watching goroutine terminated the snapshots would never be fetched. This caused some of the xDS machinery to get indefinitely paused and not finish the teardown properly. - Every 30s we now attempt to re-insert all locally registered services into the proxycfg machinery. - When services are re-inserted into the proxycfg machinery we special case "dead" ones such that we unilaterally replace them rather that doing that conditionally.	2023-03-03 14:27:53 -06:00
Dan Upton	73b9b407ba	grpc: fix data race in balancer registration (#16229 ) Registering gRPC balancers is thread-unsafe because they are stored in a global map variable that is accessed without holding a lock. Therefore, it's expected that balancers are registered _once_ at the beginning of your program (e.g. in a package `init` function) and certainly not after you've started dialing connections, etc. > NOTE: this function must only be called during initialization time > (i.e. in an init() function), and is not thread-safe. While this is fine for us in production, it's challenging for tests that spin up multiple agents in-memory. We currently register a balancer per- agent which holds agent-specific state that cannot safely be shared. This commit introduces our own registry that _is_ thread-safe, and implements the Builder interface such that we can call gRPC's `Register` method once, on start-up. It uses the same pattern as our resolver registry where we use the dial target's host (aka "authority"), which is unique per-agent, to determine which builder to use.	2023-02-28 10:18:38 +00:00
cskh	8e5942f5ca	fix: add tls config to unix socket when https is used (#16301 ) * fix: add tls config to unix socket when https is used * unit test and changelog	2023-02-21 08:28:13 -05:00
Matt Keeler	085c0addc0	Protobuf Refactoring for Multi-Module Cleanliness (#16302 ) Protobuf Refactoring for Multi-Module Cleanliness This commit includes the following: Moves all packages that were within proto/ to proto/private Rewrites imports to account for the packages being moved Adds in buf.work.yaml to enable buf workspaces Names the proto-public buf module so that we can override the Go package imports within proto/buf.yaml Bumps the buf version dependency to 1.14.0 (I was trying out the version to see if it would get around an issue - it didn't but it also doesn't break things and it seemed best to keep up with the toolchain changes) Why: In the future we will need to consume other protobuf dependencies such as the Google HTTP annotations for openapi generation or grpc-gateway usage. There were some recent changes to have our own ratelimiting annotations. The two combined were not working when I was trying to use them together (attempting to rebase another branch) Buf workspaces should be the solution to the problem Buf workspaces means that each module will have generated Go code that embeds proto file names relative to the proto dir and not the top level repo root. This resulted in proto file name conflicts in the Go global protobuf type registry. The solution to that was to add in a private/ directory into the path within the proto/ directory. That then required rewriting all the imports. Is this safe? AFAICT yes The gRPC wire protocol doesn't seem to care about the proto file names (although the Go grpc code does tack on the proto file name as Metadata in the ServiceDesc) Other than imports, there were no changes to any generated code as a result of this.	2023-02-17 16:14:46 -05:00
Paul Banks	5397e9ee7f	Adding experimental support for a more efficient LogStore implementation (#16176 ) * Adding experimental support for a more efficient LogStore implementation * Adding changelog entry * Fix go mod tidy issues	2023-02-08 16:50:22 +00:00
cskh	25396d81c9	Apply agent partition to load services and agent api (#16024 ) * Apply agent partition to load services and agent api changelog	2023-01-20 12:59:26 -05:00
Dan Upton	7a55de375c	xds: don't attempt to load-balance sessions for local proxies (#15789 ) Previously, we'd begin a session with the xDS concurrency limiter regardless of whether the proxy was registered in the catalog or in the server's local agent state. This caused problems for users who run `consul connect envoy` directly against a server rather than a client agent, as the server's locally registered proxies wouldn't be included in the limiter's capacity. Now, the `ConfigSource` is responsible for beginning the session and we only do so for services in the catalog. Fixes: https://github.com/hashicorp/consul/issues/15753	2023-01-18 12:33:21 -06:00
Paul Glass	f5231b9157	Add new config_file_service_registration token (#15828 )	2023-01-10 10:24:02 -06:00

1 2 3 4 5 ...

604 Commits (4ca65733847abe538cd210b5d24e7471138202d9)