Commit Graph

4665 Commits (31453c7dbdf811aff2361ecf4f38225a68f32062)

Author SHA1 Message Date
Sean Chittenden b684b15d94 Refactor out recocileServerList anon function
Add testing to reconcileServerList and test various server sizes.

Test that a percentage of nodes fail their Ping (50% in testing atm)
2016-03-29 02:45:38 -07:00
Sean Chittenden c56f039884 Teach fauxConnPool to fail a pct of the time
50% failure rate seems legit as a starting point w/ 100 servers.
2016-03-28 14:53:29 -07:00
Sean Chittenden 43aed6376a Call NotifyFailedServers to rotate the server list 2016-03-28 14:12:41 -07:00
Sean Chittenden 6648d31579 Add log line re: server manager backing off and sleeping
This is useful in situations where the RPC rotate duration is greater than 1µs.  WTB exponential backoff of logging so we don't spam forever.
2016-03-28 14:04:04 -07:00
Sean Chittenden 544a263a62 Remove old debugging lines of questionable future value 2016-03-28 14:02:53 -07:00
Sean Chittenden e5b5078415 Shuffle in place
Don't create a copy and save the copy, not necessary any more.
2016-03-28 14:02:27 -07:00
Sean Chittenden a81eb95acc Nuke unnecessary comment
See above function comments for details
2016-03-28 13:57:36 -07:00
Sean Chittenden e237904d80 Move FIXME comment to the right call site 2016-03-28 13:49:55 -07:00
Sean Chittenden 3dfd9e02b7 Rename the ConnPoolPinger interface to Pinger 2016-03-28 13:46:01 -07:00
Sean Chittenden c9afc16d96 Return error from PingConsulServer
In order to report why a Ping failed, change the signature of PingConsulServers to include an error message.
2016-03-28 13:38:58 -07:00
Sean Chittenden 28dc6451d9 Change the definition of the ServerDetails struct key
Use only the serf Name for now.  Leaving the plumbing for now.
2016-03-28 12:53:19 -07:00
Sean Chittenden 62c38ecb39 Correct the comment to match reality 2016-03-28 12:32:30 -07:00
Sean Chittenden 8ac35d84f0 Rename serverCfg to sc for consistency 2016-03-28 12:06:26 -07:00
Sean Chittenden 664ab238fd Add a quick length check
Verify that AddServer behaved as expected
2016-03-28 11:38:12 -07:00
Sean Chittenden 4589c36308 Switch the order of ServerDetails.String()
It's more natrual to have the network first.  I think I flipped the order accidentally.
2016-03-28 11:37:25 -07:00
Hrishikesh Barua a938cf00e6 Added help text for -dev option #1804 for zsh completion. 2016-03-28 19:32:02 +05:30
Hrishikesh Barua 958a4e43b7 Added help text for -dev option #1804 for zsh completion. 2016-03-28 19:07:55 +05:30
Sean Chittenden e8a2e56370 Move rebalance log statement from INFO to DEBUG 2016-03-27 01:32:04 -07:00
Sean Chittenden 060dd49a95 Chase the API bump re: refreshServerRebalanceTimer
If it works in prod, why shouldn't it work in the tests?
2016-03-27 00:04:52 -07:00
Sean Chittenden 3d00208351 Move initialization of the rebalanceTimer to New() 2016-03-27 00:03:48 -07:00
Sean Chittenden 5652c60971 Add a test for ConnPool.PingConsulServer
Spin up 5x servers, join and ping each server
2016-03-26 23:52:06 -07:00
Sean Chittenden efc1113406 Expose ServerManager.ResetRebalanceTimer
Move the rebalance timer from ServerManager.Start's stack to struct ServerManager.  This makes it possible to shuffle during tests without actually waiting >120s.
2016-03-26 23:41:01 -07:00
Sean Chittenden d275cac1a3 Logging improvements
Comment out noisly loggers for the time being.

Improve the final logging statement to be useful and hint what the next active server for the client is going to be.
2016-03-26 22:41:08 -07:00
Sean Chittenden 35ff2eb71a Standardize the log message based on the package
This log statement used to belong in the consul package but has since moved to the server manager package.
2016-03-26 22:29:00 -07:00
Sean Chittenden e327630523 Reduce the error level from Fatal when unit testing 2016-03-26 22:07:09 -07:00
Sean Chittenden 92c2e8e668 Start server rebalance task after init'ing Serf
Now that there is no longer an event loop driven directly by Serf, start the ServerManager task after Serf has been setup.  When testing and adjusting timers and timeouts to unreasonably low values, it's possible to tickle a race condition where Serf's NumNodes() would fail because Serf had not been initialized.
2016-03-26 22:04:41 -07:00
Sean Chittenden ee95c55d88 Catch up to a few renames 2016-03-26 19:32:11 -07:00
Sean Chittenden dc291e4027 Use empty string for addr in ServerDetails.String() 2016-03-26 19:30:04 -07:00
Sean Chittenden 970938c2dd Guard against a nil ServerDetails.Addr
It's not clear how or why this would ever be nil, but some of the unit tests produce a nil addr.  Be defensive.
2016-03-26 19:29:31 -07:00
Sean Chittenden ca5950a538 Proactively ping server before rotation
Before shuffling the server list, proactively ping the next server in the list to establish the connection and verify the remote endpoint is healthy.
2016-03-26 19:28:13 -07:00
Sean Chittenden 18270bbd04 Factor out the shuffle server 2016-03-26 19:19:04 -07:00
Sean Chittenden 90d626b5d9 Revise comments re: cycleServer
Improve the comments to discuss what happens presently.  Add a note to consider possibly calling to TestConsulServer proactively.
2016-03-26 18:53:13 -07:00
Sean Chittenden cf59a860e7 Comment why the interface is needed: cyclic import 2016-03-26 18:38:35 -07:00
Sean Chittenden 3ecd72f3b5 Add a struct key type for server_details 2016-03-26 17:58:12 -07:00
Sean Chittenden ae32a3ceae Merge pull request #1873 from hashicorp/f-rebalance-worker-0.7
Periodically rebalance the servers that agents talk to
2016-03-25 15:03:18 -07:00
Sean Chittenden 5893bd5a35 Add additional checks 2016-03-25 14:40:46 -07:00
Sean Chittenden d18e4d7455 Delete the right tag
"role" != "consul"
2016-03-25 14:31:48 -07:00
Sean Chittenden b1194e83cb Don't pass in sm, server manager is already in scope
Go closures are implicitly capturing lambdas.
2016-03-25 14:10:09 -07:00
Sean Chittenden a3a0eeeadd Trim residual complexity from server join notifications
Now that serf node join events are decoupled from rebalancing activities completely, remove the complixity of draining the channel and ensuring only one go routine was rebalancing the server list.

Now that we're no longer initializing a notification channel, we can remove the config load/save from `Start()`
2016-03-25 14:06:35 -07:00
Sean Chittenden a71fbe57e3 Only log in FindServers
In FindServer this is a useful warning hinting why its call failed. RPC returns error and leaves it to the higher level caller to do whatever it wants. As an operator, I'd have the detail necessary to know why the RPC call(s) failed.
2016-03-25 13:58:50 -07:00
Sean Chittenden 58246fcc0b Initialize the rebalancce to clientRPCMinReuseDuration
In an earlier version there was a channel to notify when a new server was added, however this has long since been removed.  Just default to the sane value of 2min before the first rebalance calc takes place.

Pointed out by: slackpad
2016-03-25 13:46:18 -07:00
Sean Chittenden 4fec6a9608 Guard against very small or negative rates
Pointed out by: slackpad
2016-03-25 13:31:55 -07:00
Sean Chittenden d9251e30c8 Use range vs for
Returning a new array vs mutating an array in place so we can use range now.
2016-03-25 13:08:08 -07:00
Sean Chittenden 9c18bb5f1c Comment updates 2016-03-25 13:06:59 -07:00
Sean Chittenden 0b3f6932df Only rotate server list with more than one server
Fantastic observation by slackpad.  This was left over from when there was a boolean for health in the server struct (vs current strategy where we use server position in the list and rely on serf to cleanup the stale members).

Pointed out by: slackpad
2016-03-25 12:54:36 -07:00
Sean Chittenden 24eb274860 Relocate saveServerConfig next to getServerConfig
Requested by: slackpad
2016-03-25 12:41:22 -07:00
Sean Chittenden b00db393e7 Clarify that ConsulClusterInfo is an interface over serf
An interface was used to break a cyclic import dependency.
2016-03-25 12:38:40 -07:00
Sean Chittenden b8bbdb9e7a Reword comment after moving code into new packages 2016-03-25 12:34:46 -07:00
Sean Chittenden 68183c4378 Change initialReblaanaceTimeout to a time.Duration
Pointed out by: @slackpad
2016-03-25 12:34:12 -07:00
Sean Chittenden 3433feb93b Negative check: test an invalid condition 2016-03-25 12:22:33 -07:00