Commit Graph

3803 Commits (7a814fce63d5bb9c6070bffd68ba0f78280832c3)

Author SHA1 Message Date
Sean Chittenden 4d4806ab02 Add CHANGELOG entry re: agent rebalancing 2016-03-23 22:36:12 -07:00
Sean Chittenden 828606232e Correct a bogus goimport rewrite for tests 2016-03-23 22:35:49 -07:00
Sean Chittenden da872fee63 Test ServerManager.refreshServerRebalanceTimer
Change the signature so it returns a value so that this can be tested externally with mock data.  See the sample table in TestServerManagerInternal_refreshServerRebalanceTimer() for the rate at which it will back off.  This function is mostly used to not cripple large clusters in the event of a partition.
2016-03-23 22:10:50 -07:00
Sean Chittenden a63d5ab963 Add a handful more unit tests to the public interface 2016-03-23 22:10:50 -07:00
Sean Chittenden 8e3c83a258 Rename GetNumServers to NumServers()
Matches the style of the rest of the repo
2016-03-23 22:10:50 -07:00
Sean Chittenden 18f7befba9 Rename NewServerManger to just New
Follow go style recommendations now that this has been refactored out of the consul package and doesn't need the qualifier in the name.
2016-03-23 22:10:50 -07:00
Sean Chittenden e932e9a435 Rename FindHealthyServer() to FindServer()
There is no guarantee the server coming back is healthy.  It's apt to be healthy by virtue of its place in the server list, but it's not guaranteed.
2016-03-23 22:10:50 -07:00
Sean Chittenden 49a5a1ab84 cycleServer is a pure function, save the result 2016-03-23 22:10:50 -07:00
Sean Chittenden 94f79d2c3d Missed unit test cruft 2016-03-23 22:10:50 -07:00
Sean Chittenden bc62de541c Update comments to reflect reality 2016-03-23 22:10:50 -07:00
Sean Chittenden d2d55f4bb0 Remove additional cruft from ServerManager's channels
No longer needed code.
2016-03-23 22:10:50 -07:00
Sean Chittenden d13e3c18c9 Emulate a TryLock using atomic.CompareAndSwap
Prevent possible queueing behind serverConfigLock in the event that a server fails on a busy host.
2016-03-23 22:10:50 -07:00
Sean Chittenden 295af01680 Make use of interfaces
Use an interface instead of serf.Serf as arg to NewServerManager.  Bonus points for improved testability.

Pointed out by: @slackpad
2016-03-23 22:10:50 -07:00
Sean Chittenden fdbb142c3f Simplify error handling
Rely on Serf for liveliness.  In the event of a failure, simply cycle the server to the end of the list.  If the server is unhealthy, Serf will reap the dead server.

Additional simplifications:

*) Only rebalance servers based on timers, not when a new server is readded to the cluster.
*) Back out the failure count in server_details.ServerDetails
2016-03-23 22:10:50 -07:00
Sean Chittenden c2c73bfeab Unbreak client tests by reverting to original test
Debugging code crept into the actual test and hung out for much longer than it should have.
2016-03-23 22:10:50 -07:00
Sean Chittenden 0c87463b7e Introduce asynchronous management of consul server lists
Instead of blocking the RPC call path and performing a potentially expensive calculation (including a call to `c.LANMembers()`), introduce a channel to request a rebalance.  Some events don't force a reshuffle, instead the extend the duration of the current rebalance window because the environment thrashed enough to redistribute a client's load.
2016-03-23 22:10:50 -07:00
Sean Chittenden bad6cb8897 Comment nits 2016-03-23 22:10:50 -07:00
Sean Chittenden bf8c860663 Update Serf to include `serf.NumNodes()` 2016-03-23 22:10:50 -07:00
Sean Chittenden 74bcbc63f8 Use saveServerConfig vs atomic.Value.Store(config) 2016-03-23 22:10:50 -07:00
Sean Chittenden 7f55931d02 Commit a handful of refactoring && copy/paste-o fixes 2016-03-23 22:10:50 -07:00
Sean Chittenden e53704b032 Mutate copies of serverCfg.servers, not original
Removing any ambiguity re: ownership of the mutated server lists is a win for maintenance and debugging.
2016-03-23 22:10:50 -07:00
Sean Chittenden e6c27325d9 rebalanceTimer may be nil during initialization
When first starting the server manager, it's possible that the rebalanceTimer in serverConfig will be nil, test accordingly.
2016-03-23 22:10:50 -07:00
Sean Chittenden a7091b0837 Properly retain a pointer to the rebalanceTimer 2016-03-23 22:10:50 -07:00
Sean Chittenden 00ff8e5307 Cosmetic and various other wordsmithing cleanups 2016-03-23 22:10:50 -07:00
Sean Chittenden b4db49a62e Document the various functions and their locking 2016-03-23 22:10:50 -07:00
Sean Chittenden 9eb6481d73 Use config convenience method to get config
'cause ELETTHECOMPILERSDOTHEWORK.  I don't need that cluttering up the subconscious with more complexity.
2016-03-23 22:10:50 -07:00
Sean Chittenden 579e536f58 Move consul.serverConfig out of the consul package
Relocated to its own package, server_manager.  This now greatly simplifies the RPC() call path and appropriately hides the locking behind the package boundary.  More work is needed to be done here
2016-03-23 22:10:50 -07:00
Sean Chittenden c7c551dbe0 Rename serverConfigMtx to serverConfigLock
Pointed out by: @slackpad
2016-03-23 22:10:50 -07:00
Sean Chittenden e48b910f87 Refactor out the management of Consul servers
Move the management of c.consulServers (fka c.consuls) into consul/server_manager.go.

This commit brings in a background task that proactively manages the server list and:

*) reshuffles the list
*) manages the timer out of the RPC() path
*) uses atomics to detect a server has failed

This is a WIP, more work in testing needs to be completed.
2016-03-23 22:10:50 -07:00
Sean Chittenden 01b637114c Move consul.serverConfig out of the consul package
Relocated to its own package, server_manager.  This now greatly simplifies the RPC() call path and appropriately hides the locking behind the package boundary.  More work is needed to be done here
2016-03-23 22:10:50 -07:00
Sean Chittenden 117c65dc55 Rename serverConfigMtx to serverConfigLock
Pointed out by: @slackpad
2016-03-23 22:10:32 -07:00
Sean Chittenden 0eac826573 Refactor out the management of Consul servers
Move the management of c.consulServers (fka c.consuls) into consul/server_manager.go.

This commit brings in a background task that proactively manages the server list and:

*) reshuffles the list
*) manages the timer out of the RPC() path
*) uses atomics to detect a server has failed

This is a WIP, more work in testing needs to be completed.
2016-03-23 22:09:46 -07:00
Sean Chittenden b9e5588620 Move consul.serverConfig out of the consul package
Relocated to its own package, server_manager.  This now greatly simplifies the RPC() call path and appropriately hides the locking behind the package boundary.  More work is needed to be done here
2016-03-23 22:05:29 -07:00
Sean Chittenden a482eaef70 Rename serverConfigMtx to serverConfigLock
Pointed out by: @slackpad
2016-03-23 22:05:05 -07:00
Sean Chittenden 9b8767aa67 Refactor out the management of Consul servers
Move the management of c.consulServers (fka c.consuls) into consul/server_manager.go.

This commit brings in a background task that proactively manages the server list and:

*) reshuffles the list
*) manages the timer out of the RPC() path
*) uses atomics to detect a server has failed

This is a WIP, more work in testing needs to be completed.
2016-03-23 22:03:20 -07:00
Sean Chittenden 075d1b628f Commit miss re: consuls variable rename 2016-03-23 16:24:29 -07:00
Sean Chittenden 2ca4cc58ce Move consul.serverConfig out of the consul package
Relocated to its own package, server_manager.  This now greatly simplifies the RPC() call path and appropriately hides the locking behind the package boundary.  More work is needed to be done here
2016-03-23 16:16:22 -07:00
Sean Chittenden 0925b26250 Refactor consul.serverParts into server_details.ServerDetails
This may be short-lived, but it also seems like this is going to lead us down a path where ServerDetails is going to evolve into a more powerful package that will encapsulate more behavior behind a coherent API.
2016-03-23 16:15:47 -07:00
Sean Chittenden 5be956c310 Rename serverConfigMtx to serverConfigLock
Pointed out by: @slackpad
2016-03-23 16:15:47 -07:00
Sean Chittenden b1e392405c Handle the case where there are no healthy servers
Pointed out by: @slackpad
2016-03-23 16:15:47 -07:00
Sean Chittenden d4ca349e21 Refactor out the management of Consul servers
Move the management of c.consulServers (fka c.consuls) into consul/server_manager.go.

This commit brings in a background task that proactively manages the server list and:

*) reshuffles the list
*) manages the timer out of the RPC() path
*) uses atomics to detect a server has failed

This is a WIP, more work in testing needs to be completed.
2016-03-23 16:15:47 -07:00
Sean Chittenden 7b308d8d7e Add a flag to denote that a server is disabled
A server is not normally disabled, but in the event of an RPC error, we want to mark a server as down to allow for fast failover to a different server.  This value must be an int in order to support atomic operations.

Additionally, this is the preliminary work required to bring up a server in a disabled state.  RPC health checks in the future could mark the server as alive, thereby creating an organic "slow start" feature for Consul.
2016-03-23 16:14:59 -07:00
Sean Chittenden 6af781d9d5 Rename `lastServer` to `preferredServer`
Expanding the domain of lastServer beyond RPC() changes the meaning of this variable.  Rename accordingly to match the intent coming in a subsequent commit: a background thread will be in charge of rotating preferredServer.
2016-03-23 16:14:59 -07:00
Sean Chittenden fb0bfcc3cf Introduce GOTEST_FLAGS to conditionally add -v to go test
Trivial change that makes it possible for developers to set an environment variable and change the output of `go test` to be detailed (i.e. `GOTEST_FLAGS=-v`).
2016-03-23 16:14:11 -07:00
Sean Chittenden f6ffbf4e96 Warn if serf events have queued up past 80% of the limit
It is theoretically possible that the number of queued serf events can back up.  If this happens, emit a warning message if there are more than 200 events in queue.

Most notably, this can happen if `c.consulServerLock` is held for an "extended period of time".  The probability of anyone ever seeing this log message is hopefully low to nonexistent, but if it happens, the warning message indicating a large number of serf events fired while a lock was held is likely to be helpful (vs serf mysteriously blocking when attempting to add an event to a channel).
2016-03-23 16:14:11 -07:00
Sean Chittenden 54016f5276 Commit miss re: consuls variable rename 2016-03-23 16:13:49 -07:00
Sean Chittenden f9aa968bf4 Remove lastRPCTime
This mechanism isn't going to provide much value in the future.  Preemptively reduce the complexity of future work.
2016-03-23 16:13:49 -07:00
Sean Chittenden cc86eb0a1a Rename c.consuls to c.consulServers
Prep for breaking out maintenance of consuls into a new goroutine.
2016-03-23 16:10:27 -07:00
Sean Chittenden a92cda7bcd Fix whitespace alignment in a comment 2016-03-23 16:00:39 -07:00
Sean Chittenden 146c5b0a59 Use `rand.Int31n()` to get power of two optimization
In cases where i+1 is a power of two, skip one modulo operation.
2016-03-23 16:00:39 -07:00