This patch did give some better results, but break watches on
the services of a node.
It is possible to apply the same optimization for nodes than
to services (one index per instance), but it would complicate
further the patch.
Let's do it in another PR.
The root cause is actually that the agent's streaming HTTP API didn't flush until the first log line was found which commonly was pretty soon since the default level is INFO. In cases where there were no logs immediately due to level for instance, the client gets stuck in the HTTP code waiting on a response packet from the server before we enter the loop that checks the shutdown channel from the signal handler.
This fix flushes the initial status immediately on the streaming endpoint which lets the client code get into it's expected state where it's listening for shutdown or log lines.
This patch improves the watches for services on large cluster:
each service has now its own index, such watches on a specific service
are not modified by changes in the global catalog.
It should improve a lot the performance of tools such as consul-template
or libraries performing watches on very large clusters with many
services/watches.
- register endpoints with supported methods
- support OPTIONS requests, indicating supported methods
- extract method validation (error 405) from individual endpoints
- on 405 where multiple methods are allowed, create a single Allow
header with comma-separated values, not multiple Allow headers.
Because this code was doing pointer equality checks, it would work for
the case of a failed attempted RPC because the objects are from the
manager itself:
https://github.com/hashicorp/consul/blob/v1.0.3/agent/consul/rpc.go#L283-L302
But the pointer check would always fail for events coming in from the
Serf path because the server object is newly-created:
https://github.com/hashicorp/consul/blob/v1.0.3/agent/router/serf_adapter.go#L14-L40
This means that we didn't proactively shift RPC traffic away from a
failed server, we'd have to wait for an RPC to fail, which exposes
the error to the calling client.
By switching over to a name check vs. a pointer check we get the correct
behavior. We added a DEBUG log as well to help observe this behavior during
integrated testing.
Related to #3863 since the fix here needed the same logic duplicated, owing
to the complicated atomic stuff.
/cc @dadgar for a heads up in case this also affects Nomad.
Previously a change was made to make the file writing atomic,
but that wasn't enough to cover something like an OS crash so we
needed something here to handle the situation more gracefully.
Fixes#1221.
Docker/Openshift/Kubernetes mount the config file as a symbolic link and
IsDir returns true if the file is a symlink. Before calling IsDir, the
symlink should be resolved to determine if it points at a file or
directory.
Fixes#3753
Since commit 9685bdcd0b, service tags are added to the health checks.
Otherwise, when adding a service, tags are not added to its check.
In updateSyncState, we compare the checks of the local agent with the checks of the catalog.
It appears that the service tags are different (missing in one case), and so the check is synchronized.
That increase the ModifyIndex periodically when nothing changes.
Fixed it by adding serviceTags to the check.
Note that the issue appeared in version 0.8.2.
Looks related to #3259.
The lock isn't needed after we clean up the expire bin, and as seen
in #3700 we can get into a deadlock waiting to place the expire index
into the channel while holding this lock.
Fixes#3700
There were places where we still didn't have the script vs. args sorted
correctly so changed all the logging to be just based on check IDs and
also made everything uniform.
Also removed some annoying debug logging, and moved some of the large output
logging to TRACE level.
Closes#3602
* Refactors the HTTP listen path to create servers in the same spot.
* Adds HTTP/2 support to Consul's HTTPS server.
* Vendors Go HTTP/2 library and associated deps.
* config: refactor ReadPath(s) methods without side-effects
Return the sources instead of modifying the state.
* config: clean data dir before every test
* config: add tests for config-file and config-dir
* config: add -config-format option
Starting with Consul 1.0 all config files must have a '.json' or '.hcl'
extension to make it unambigous how the data should be parsed. Some
automation tools generate temporary files by appending a random string
to the generated file which obfuscates the extension and prevents the
file type detection.
This patch adds a -config-format option which can be used to override
the auto-detection behavior by forcing all config files or all files
within a config directory independent of their extension to be
interpreted as of this format.
Fixes#3620
* Relaxes Autopilot promotion logic.
When we defaulted the Raft protocol version to 3 in #3477 we made
the numPeers() routine more strict to only count voters (this is
more conservative and more correct). This had the side effect of
breaking rolling updates because it's at odds with the Autopilot
non-voter promotion logic.
That logic used to wait to only promote to maintain an odd quorum
of servers. During a rolling update (add one new server, wait, and
then kill an old server) the dead server cleanup would still count
the old server as a peer, which is conservative and the right thing
to do, and no longer count the non-voter. This would wait to promote,
so you could get into a stalemate. It is safer to promote early than
remove early, so by promoting as soon as possible we have chosen
that as the solution here.
Fixes#3611
* Gets rid of unnecessary extra not-a-voter check.
This patch adds a /v1/coordinate/node/:node endpoint to get the network
coordinates for a single node in the network.
Since Consul Enterprise supports network segments it is still possible
to receive mutiple entries for a single node - one per segment.
The Docker agent closes the connection during read after we have
read the body. This causes a "connection reset by peer" even though
the command was successful.
We ignore that error here since we got the correct status code
and a response body.
The anti-entropy tests relied on the side-effect of the StartSync()
method to perform a full sync instead of a partial sync. This lead to
multiple anti-entropy go routines being started unnecessary retry loops.
This change changes the behavior to perform synchronous full syncs when
necessary removing the need for all of the time.Sleep and most of the
retry loops.
The state of the service and health check records was spread out over
multiple maps guarded by a single lock. Access to the maps has to happen
in a coordinated effort and the tests often violated this which made
them brittle and racy.
This patch replaces the multiple maps with a single one for both checks
and services to make the code less fragile.
This is also necessary since moving the local state into its own package
creates circular dependencies for the tests. To avoid this the tests can
no longer access internal data structures which they should not be doing
in the first place.
The tests still don't compile but this is a ncessary step in that
direction.
The anti-entropy code manages background synchronizations of the local
state on a regular basis or on demand when either the state has changed
or a new consul server has been added.
This patch moves the anti-entropy code into its own package and
decouples it from the local state code since they are performing
two different functions.
To simplify code-review this revision does not make any optimizations,
renames or refactorings. This will happen in subsequent commits.
The `consul agent` command was ignoring extra command line arguments
which can lead to confusion when the user has for example forgotten to
add a dash in front of an argument or is not using an `=` when setting
boolean flags to `true`. `-bootstrap true` is not the same as
`-bootstrap=true`, for example.
Since all command line flags are known and we don't expect unparsed
arguments we can return an error. However, this may make it slightly
more difficult in the future if we ever wanted to have these kinds of
arguments.
Fixes#3397
DNS recursors can be added through go-sockaddr templates. Entries
are deduplicated while the order is maintained.
Originally proposed by @taylorchu
See #2932