consul

Commit Graph

Author	SHA1	Message	Date
FFMMM	a0bba9171d	fix consul_autopilot_healthy metric emission (#11231 ) https://github.com/hashicorp/consul/issues/10730	2021-10-08 10:31:50 -07:00
Daniel Nephin	0acfc2c65b	agent: fix a data race in DNS tests The dnsConfig pulled from the atomic.Value is a pointer, so modifying it in place creates a data race. Use the exported ReloadConfig interface instead.	2021-07-14 18:58:16 -04:00
Daniel Nephin	970f5d78ec	agent: fix two data race in agent tests The LogOutput io.Writer used by TestAgent must allow concurrent reads and writes, and a bytes.Buffer does not allow this. The bytes.Buffer must be wrapped with a lock to make this safe.	2021-07-14 18:58:16 -04:00
Matt Keeler	da31e0449e	Move some things around to allow for license updating via config reload The bulk of this commit is moving the LeaderRoutineManager from the agent/consul package into its own package: lib/gort. It also got a renaming and its Start method now requires a context. Requiring that context required updating a whole bunch of other places in the code.	2021-05-25 09:57:50 -04:00
Daniel Nephin	1618912cf6	Fix some test flakes - return errors in TestAgent.Start so that the retry works correctly - remove duplicate logging, the error is returned already - add a missing t.Helper() to retry.Run - properly set a.Agent to nil so that subsequent retry attempts will actually try to start	2021-05-10 13:20:45 -04:00
Hans Hasselberg	53e9c134af	introduce certopts (#9606 ) * introduce cert opts * it should be using the same signer * lint and omit serial	2021-03-22 10:16:41 +01:00
Daniel Nephin	1d2d15b1e1	agent: add a test for streaming in the service health endpoint Co-authored-by: Paul Banks <banks@banksco.de>	2021-02-25 14:08:10 -05:00
Daniel Nephin	32d36d0dd4	config: replace calls to config.NewBuilder with config.Load This is another incremental change to reduce config loading to a single small interface. All calls to NewBuilder can be replaced with Load.	2021-01-27 17:34:43 -05:00
Daniel Nephin	97a577502d	config: improve the interface of Load This commit reduces the interface to Load() a bit, in preparation for unexporting NewBuilder and having everything call Load. The three arguments are reduced to a single argument by moving the other two into the options struct. The three return values are reduced to two by moving the RuntimeConfig and Warnings into a LoadResult struct.	2021-01-27 17:34:43 -05:00
Hans Hasselberg	444cdeb8fb	Add flags to support CA generation for Connect (#9585 )	2021-01-27 08:52:15 +01:00
Paul Banks	e4db845246	Refactor uiserver to separate package, cleaner Reloading	2020-10-01 11:32:25 +01:00
Daniel Nephin	282fbdfa75	api: rename HTTPServer to HTTPHandlers Resolves a TODO about naming. This type is a set of handlers for an http.Server, it is not itself a Server. It provides http.Handler functions.	2020-09-18 17:38:23 -04:00
Daniel Nephin	0bb9c318b7	http: fix tests incorrectly using HTTPAddr to get the address of the https server. In #8234 I changed a few tests to use TestAgent.HTTPAddr() to find the addr used in the test. Due to the way HTTPAddr() was implemented these tests were passing, but I think the pass was incidental. HTTPAddr() was not matching any servers, and was instead returning the last server, which happened to be the one these tests wanted. This commit fixes the implementation of HTTPAddr to panic if no match was found. The tests which require an HTTPS server are changed to use a new firstAddr() to look up the correct address.	2020-09-04 15:29:17 -04:00
Daniel Nephin	6ca45e1a61	agent: add apiServers type for managing HTTP servers Remove Server field from HTTPServer. The field is no longer used.	2020-09-03 13:40:12 -04:00
Daniel Nephin	63bad36de7	testing: disable global metrics sink in tests This might be better handled by allowing configuration for the InMemSink interval and retail, and disabling the global. For now this is a smaller change to remove the goroutine leak caused by tests because go-metrics does not provide any way of shutting down the global goroutine.	2020-08-18 19:04:57 -04:00
Daniel Nephin	5d4df54296	agent: extract dependency creation from New With this change, Agent.New() accepts many of the dependencies instead of creating them in New. Accepting fully constructed dependencies from a constructor makes the type easier to test, and easier to change. There are still a number of dependencies created in Start() which can be addressed in a follow up.	2020-08-18 19:04:55 -04:00
Daniel Nephin	070e843113	testutil: Add t.Cleanup to TempDir TempDir registers a Cleanup so that the directory is always removed. To disable to cleanup, set the TEST_NOCLEANUP env var.	2020-08-14 13:19:10 -04:00
Daniel Nephin	3a4e62836b	testing: Remove TestAgent.Key and change TestAgent.DataDir TestAgent.Key was only used by 3 tests. Extracting it from the common helper that is used in hundreds of tests helps keep the shared part small and more focused. This required a second change (which I was planning on making anyway), which was to change the behaviour of DataDir. Now in all cases the TestAgent will use the DataDir, and clean it up once the test is complete.	2020-08-13 17:53:24 -04:00
Daniel Nephin	b1679508d4	testing: use t.Cleanup in TestAgent for returnPorts	2020-08-13 17:09:37 -04:00
Daniel Nephin	9919e5dfa5	agent: unmethod consulConfig To allow us to move newConsulConfig out of Agent.	2020-08-13 11:58:21 -04:00
Daniel Nephin	38980ebb4c	config: Make Source an interface This will allow us to accept config from auto-config without needing to go through a serialziation cycle.	2020-08-10 12:46:28 -04:00
Daniel Nephin	51efba2c7d	testutil: NewLogBuffer - buffer logs until a test fails Replaces #7559 Running tests in parallel, with background goroutines, results in test output not being associated with the correct test. `go test` does not make any guarantees about output from goroutines being attributed to the correct test case. Attaching log output from background goroutines also cause data races. If the goroutine outlives the test, it will race with the test being marked done. Previously this was noticed as a panic when logging, but with the race detector enabled it is shown as a data race. The previous solution did not address the problem of correct test attribution because test output could still be hidden when it was associated with a test that did not fail. You would have to look at all of the log output to find the relevant lines. It also made debugging test failures more difficult because each log line was very long. This commit attempts a new approach. Instead of printing all the logs, only print when a test fails. This should work well when there are a small number of failures, but may not work well when there are many test failures at the same time. In those cases the failures are unlikely a result of a specific test, and the log output is likely less useful. All of the logs are printed from the test goroutine, so they should be associated with the correct test. Also removes some test helpers that were not used, or only had a single caller. Packages which expose many functions with similar names can be difficult to use correctly. Related: https://github.com/golang/go/issues/38458 (may be fixed in go1.15) https://github.com/golang/go/issues/38382#issuecomment-612940030	2020-07-21 12:50:40 -04:00
Daniel Nephin	a5e45defb1	agent/http: un-embed the HTTPServer The embedded HTTPServer struct is not used by the large HTTPServer struct. It is used by tests and the agent. This change is a small first step in the process of removing that field. The eventual goal is to reduce the scope of HTTPServer making it easier to test, and split into separate packages.	2020-07-02 17:21:12 -04:00
Matt Keeler	d6e05482ab	Allow cancelling startup when performing auto-config (#8157 ) Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>	2020-06-19 15:16:00 -04:00
Matt Keeler	3dbbd2d37d	Implement Client Agent Auto Config There are a couple of things in here. First, just like auto encrypt, any Cluster.AutoConfig RPC will implicitly use the less secure RPC mechanism. This drastically modifies how the Consul Agent starts up and moves most of the responsibilities (other than signal handling) from the cli command and into the Agent.	2020-06-17 16:49:46 -04:00
Daniel Nephin	77101eee82	config: rename Flags to BuilderOpts Flags is an overloaded term in this context. It generally is used to refer to command line flags. This struct, however, is a data object used as input to the construction. It happens to be partially populated by command line flags, but otherwise has very little to do with them. Renaming this struct should make the actual responsibility of this struct more obvious, and remove the possibility that it is confused with command line flags. This change is in preparation for adding additional fields to BuilderOpts.	2020-06-16 12:51:19 -04:00
R.B. Boyer	ffb9c7d6f7	acl: remove the deprecated `acl_enforce_version_8` option (#7991 ) Fixes #7292	2020-05-29 16:16:03 -05:00
Daniel Nephin	e759daafdd	Rename NewTestAgentWithFields to StartTestAgent This function now only starts the agent. Using: git grep -l 'StartTestAgent(t, true,' \| \ xargs sed -i -e 's/StartTestAgent(t, true,/StartTestAgent(t,/g'	2020-03-31 17:14:55 -04:00
Daniel Nephin	f9f6b14533	Convert the remaining calls to NewTestAgentWithFields After removing the t.Name() parameter with sed, convert the last few tests which use a custom name to call NewTestAgentWithFields instead.	2020-03-31 17:14:55 -04:00
Daniel Nephin	475659a132	Remove name from NewTestAgent Using: git grep -l 'NewTestAgent(t, t.Name(),' \| \ xargs sed -i -e 's/NewTestAgent(t, t.Name(),/NewTestAgent(t,/g'	2020-03-31 16:13:44 -04:00
Daniel Nephin	ad7c78f134	Remove t.Name() from TestAgent.Name And re-add the name to the logger so that log messages from different agents in a single can be identified.	2020-03-30 16:47:24 -04:00
Daniel Nephin	dd40a1535e	testing: reduce verbosity of output log Previously the log output included the test name twice and a long date format. The test output is already grouped by test, so adding the test name did not add any new information. The date and time are only useful to understand elapsed time, so using a short format should provide succident detail. Also fixed a bug in NewTestAgentWithFields where nil was returned instead of the test agent.	2020-03-30 13:23:13 -04:00
R.B. Boyer	6adad71125	wan federation via mesh gateways (#6884 ) This is like a Möbius strip of code due to the fact that low-level components (serf/memberlist) are connected to high-level components (the catalog and mesh-gateways) in a twisty maze of references which make it hard to dive into. With that in mind here's a high level summary of what you'll find in the patch: There are several distinct chunks of code that are affected: * new flags and config options for the server * retry join WAN is slightly different * retry join code is shared to discover primary mesh gateways from secondary datacenters * because retry join logic runs in the agent and the results of that operation for primary mesh gateways are needed in the server there are some methods like `RefreshPrimaryGatewayFallbackAddresses` that must occur at multiple layers of abstraction just to pass the data down to the right layer. * new cache type `FederationStateListMeshGatewaysName` for use in `proxycfg/xds` layers * the function signature for RPC dialing picked up a new required field (the node name of the destination) * several new RPCs for manipulating a FederationState object: `FederationState:{Apply,Get,List,ListMeshGateways}` * 3 read-only internal APIs for debugging use to invoke those RPCs from curl * raft and fsm changes to persist these FederationStates * replication for FederationStates as they are canonically stored in the Primary and replicated to the Secondaries. * a special derivative of anti-entropy that runs in secondaries to snapshot their local mesh gateway `CheckServiceNodes` and sync them into their upstream FederationState in the primary (this works in conjunction with the replication to distribute addresses for all mesh gateways in all DCs to all other DCs) * a "gateway locator" convenience object to make use of this data to choose the addresses of gateways to use for any given RPC or gossip operation to a remote DC. This gets data from the "retry join" logic in the agent and also directly calls into the FSM. * RPC (`:8300`) on the server sniffs the first byte of a new connection to determine if it's actually doing native TLS. If so it checks the ALPN header for protocol determination (just like how the existing system uses the type-byte marker). * 2 new kinds of protocols are exclusively decoded via this native TLS mechanism: one for ferrying "packet" operations (udp-like) from the gossip layer and one for "stream" operations (tcp-like). The packet operations re-use sockets (using length-prefixing) to cut down on TLS re-negotiation overhead. * the server instances specially wrap the `memberlist.NetTransport` when running with gateway federation enabled (in a `wanfed.Transport`). The general gist is that if it tries to dial a node in the SAME datacenter (deduced by looking at the suffix of the node name) there is no change. If dialing a DIFFERENT datacenter it is wrapped up in a TLS+ALPN blob and sent through some mesh gateways to eventually end up in a server's :8300 port. * a new flag when launching a mesh gateway via `consul connect envoy` to indicate that the servers are to be exposed. This sets a special service meta when registering the gateway into the catalog. * `proxycfg/xds` notice this metadata blob to activate additional watches for the FederationState objects as well as the location of all of the consul servers in that datacenter. * `xds:` if the extra metadata is in place additional clusters are defined in a DC to bulk sink all traffic to another DC's gateways. For the current datacenter we listen on a wildcard name (`server.<dc>.consul`) that load balances all servers as well as one mini-cluster per node (`<node>.server.<dc>.consul`) * the `consul tls cert create` command got a new flag (`-node`) to help create an additional SAN in certs that can be used with this flavor of federation.	2020-03-09 15:59:02 -05:00
gaoxinge	216eb29d6b	tests: convert windows style path to posix style path to avoid hcl parsing error (#6351 )	2020-02-11 10:13:31 +01:00
Chris Piraino	401221de58	Allow users to configure either unstructured or JSON logging (#7130 ) * hclog Allow users to choose between unstructured and JSON logging	2020-01-28 17:50:41 -06:00
Matt Keeler	8f0ab0129e	Miscellaneous Fixes (#6896 ) Ensure we close the Sentinel Evaluator so as not to leak go routines Fix a bunch of test logging so that various warnings when starting a test agent go to the ltest logger and not straight to stdout. Various canned ent meta types always return a valid pointer (no more nils). This allows us to blindly deref + assign in various places. Update ACL index tracking to ensure oss -> ent upgrades will work as expected. Update ent meta parsing to include function to disallow wildcarding.	2019-12-06 14:01:34 -05:00
Matt Keeler	deb91f3d3c	[Feature] API: Add a internal endpoint to query for ACL authori… (#6888 ) * Implement endpoint to query whether the given token is authorized for a set of operations * Updates to allow for remote ACL authorization via RPC This is only used when making an authorization request to a different datacenter.	2019-12-06 09:25:26 -05:00
Matt Keeler	923d8671a4	Add support for parameterizing the ACL config used with a TestA… (#6559 ) * Add support for parameterizing the ACL config used with a TestAgent Using tokens that are UUIDs will get rid of some warnings * Refactor to allow setting all tokens and change the template to ignore unset values.	2019-09-27 17:06:43 -04:00
R.B. Boyer	f9496dc627	sdk: add freelist tracking and ephemeral port range skipping to freeport This should cut down on test flakiness. Problems handled: - If you had enough parallel test cases running, the former circular approach to handling the port block could hand out the same port to multiple cases before they each had a chance to bind them, leading to one of the two tests to fail. - The freeport library would allocate out of the ephemeral port range. This has been corrected for Linux (which should cover CI). - The library now waits until a formerly-in-use port is verified to be free before putting it back into circulation.	2019-09-17 14:30:43 -05:00
R.B. Boyer	a86e63f81e	test: actually wait for the TestAgent to be fully shutdown (#6441 )	2019-09-05 13:36:26 -05:00
Sarah Adams	001137e5e5	test: ensure all TestAgent constructions use a constructor (#6443 ) ensure all TestAgent constructions use a constructor to get start retries + test logs going to the right place Fixes #6435	2019-09-05 10:24:36 -07:00
Sarah Adams	74461406e0	remove funky panic/recover in agent tests (#6442 )	2019-09-04 13:59:11 -07:00
Sarah Adams	4ed5515fca	refactor & add better retry logic to NewTestAgent (#6363 ) Fixes #6361	2019-09-03 15:05:51 -07:00
R.B. Boyer	7deaba63e1	test: ensure the node name is a valid dns name (#6424 ) The space in the node name was making every test emit a useless warning.	2019-08-29 16:52:13 -05:00
R.B. Boyer	b962fe38cd	test: send testagent logs through testing.Logf (#6411 )	2019-08-27 12:21:30 -05:00
R.B. Boyer	91da908d2f	test: fix TestAgent.Start() to not segfault if the DNSServer cannot ListenAndServe (#6409 ) The embedded `Server` field on a `DNSServer` is only set inside of the `ListenAndServe` method. If that method fails for reasons like the address being in use and is not bindable, then the `Server` field will not be set and the overall `Agent.Start()` will fail. This will trigger the inner loop of `TestAgent.Start()` to invoke `ShutdownEndpoints` which will attempt to pretty print the DNS servers using fields on that inner `Server` field. Because it was never set, this causes a nil pointer dereference and crashes the test.	2019-08-27 10:45:05 -05:00
Mike Morris	65be58703c	connect: remove managed proxies (#6220 ) * connect: remove managed proxies implementation and all supporting config options and structs * connect: remove deprecated ProxyDestination * command: remove CONNECT_PROXY_TOKEN env var * agent: remove entire proxyprocess proxy manager * test: remove all managed proxy tests * test: remove irrelevant managed proxy note from TestService_ServerTLSConfig * test: update ContentHash to reflect managed proxy removal * test: remove deprecated ProxyDestination test * telemetry: remove managed proxy note * http: remove /v1/agent/connect/proxy endpoint * ci: remove deprecated test exclusion * website: update managed proxies deprecation page to note removal * website: remove managed proxy configuration API docs * website: remove managed proxy note from built-in proxy config * website: add note on removing proxy subdirectory of data_dir	2019-08-09 15:19:30 -04:00
Hans Hasselberg	33a7df3330	tls: auto_encrypt enables automatic RPC cert provisioning for consul clients (#5597 )	2019-06-27 22:22:07 +02:00
R.B. Boyer	40336fd353	agent: fix several data races and bugs related to node-local alias checks (#5876 ) The observed bug was that a full restart of a consul datacenter (servers and clients) in conjunction with a restart of a connect-flavored application with bring-your-own-service-registration logic would very frequently cause the envoy sidecar service check to never reflect the aliased service. Over the course of investigation several bugs and unfortunate interactions were corrected: (1) local.CheckState objects were only shallow copied, but the key piece of data that gets read and updated is one of the things not copied (the underlying Check with a Status field). When the stock code was run with the race detector enabled this highly-relevant-to-the-test-scenario field was found to be racy. Changes: a) update the existing Clone method to include the Check field b) copy-on-write when those fields need to change rather than incrementally updating them in place. This made the observed behavior occur slightly less often. (2) If anything about how the runLocal method for node-local alias check logic was ever flawed, there was no fallback option. Those checks are purely edge-triggered and failure to properly notice a single edge transition would leave the alias check incorrect until the next flap of the aliased check. The change was to introduce a fallback timer to act as a control loop to double check the alias check matches the aliased check every minute (borrowing the duration from the non-local alias check logic body). This made the observed behavior eventually go away when it did occur. (3) Originally I thought there were two main actions involved in the data race: A. The act of adding the original check (from disk recovery) and its first health evaluation. B. The act of the HTTP API requests coming in and resetting the local state when re-registering the same services and checks. It took awhile for me to realize that there's a third action at work: C. The goroutines associated with the original check and the later checks. The actual sequence of actions that was causing the bad behavior was that the API actions result in the original check to be removed and re-added _without waiting for the original goroutine to terminate_. This means for brief windows of time during check definition edits there are two goroutines that can be sending updates for the alias check status. In extremely unlikely scenarios the original goroutine sees the aliased check start up in `critical` before being removed but does not get the notification about the nearly immediate update of that check to `passing`. This is interlaced wit the new goroutine coming up, initializing its base case to `passing` from the current state and then listening for new notifications of edge triggers. If the original goroutine "finishes" its update, it then commits one more write into the local state of `critical` and exits leaving the alias check no longer reflecting the underlying check. The correction here is to enforce that the old goroutines must terminate before spawning the new one for alias checks.	2019-05-24 13:36:56 -05:00
Matt Keeler	f665695b6b	Ensure ServiceName is populated correctly for agent service checks Also update some snapshot agent docs * Enforce correct permissions when registering a check Previously we had attempted to enforce service:write for a check associated with a service instead of node:write on the agent but due to how we decoded the health check from the request it would never do it properly. This commit fixes that. * Update website/source/docs/commands/snapshot/agent.html.markdown.erb Co-Authored-By: mkeeler <mkeeler@users.noreply.github.com>	2019-04-30 19:00:57 -04:00

1 2

86 Commits (76bbeb3baf0972885995dcd8a09c7f0312548d0c)