Commit Graph

54 Commits (7db346886528fe463cf9bbfd2f6f70acc17666d3)

Author SHA1 Message Date
James Phillips 0b05dbeb21 Merge pull request #1235 from wuub/master
fix conflict between handleReload and antiEntropy critical sections
2015-09-17 07:28:39 -07:00
Wojciech Bederski c4537ed26f panic when unbalanced localState.Resume() is detected 2015-09-17 11:32:08 +02:00
Wojciech Bederski b014c0f91b make Pause()/Resume()/isPaused() behave more like a semaphore
see: https://github.com/hashicorp/consul/issues/1173 #1173

Reasoning: somewhere during consul development Pause()/Resume() and
PauseSync()/ResumeSync() were added to protect larger changes to
agent's localState.  A few of the places that it tries to protect are:

- (a *Agent) AddService(...)      # part of the method
- (c *Command) handleReload(...)  # almost the whole method
- (l *localState) antiEntropy(...)# isPaused() prevents syncChanges()

The main problem is, that in the middle of handleReload(...)'s
critical section it indirectly (loadServices()) calls  AddService(...).
AddService() in turn calls Pause() to protect itself against
syncChanges(). At the end of AddService() a defered call to Resume() is
made.

With the current implementation, this releases
isPaused() "lock" in the middle of handleReload() allowing antiEntropy
to kick in while configuration reload is still in progress.
Specifically almost all services and probably all check are unloaded
when syncChanges() is allowed to run.

This in turn can causes massive service/check de-/re-registration,
and since checks are by default registered in the critical state,
a majority of services on a node can be marked as failing.
It's made worse with automation, often calling `consul reload` in close
proximity on many nodes in the cluster.

This change basically turns Pause()/Resume() into P()/V() of
a garden-variety semaphore. Allowing Pause() to be called multiple times,
and releasing isPaused() only after all matching/defered Resumes() are
called as well.

TODO/NOTE: as with many semaphore implementations, it might be reasonable
to panic() if l.paused ever becomes negative.
2015-09-11 18:28:06 +02:00
Shawn Cook 66fd8fb2a0 Rename EnableTagOverride and update formatting 2015-09-11 08:35:29 -07:00
Shawn Cook d7ce0b3c6b Remove debug lines 2015-09-11 08:32:59 -07:00
Shawn Cook 96785edd9a Add EnableTagDrift logic to command/agent/local.go 2015-08-18 14:03:48 -07:00
Shawn Cook 6a835939b8 EnableTagDrift in NodeService struct 2015-08-18 10:34:55 -07:00
Ryan Uber 739d1fdf03 Merge pull request #891 from hashicorp/f-token
ACL tokens for service/check registration
2015-05-05 22:17:31 -07:00
Ryan Uber 2b62f2f172 agent: use an additional parameter for passing tokens 2015-05-04 17:48:05 -07:00
Ryan Uber 35f5a65fb7 agent: more tests 2015-04-28 13:06:02 -07:00
Ryan Uber 442933650e agent: safer read methods for tokens 2015-04-28 11:53:53 -07:00
Ryan Uber 1264f7edf3 agent: fix deadlock reading tokens from state 2015-04-27 22:26:03 -07:00
Ryan Uber bebb5d9641 agent: add service/check token methods to reduce invasiveness 2015-04-27 22:01:01 -07:00
Ryan Uber bfb27d18cd agent: initial pass threading through tokens for services/checks 2015-04-27 18:33:46 -07:00
artushin cc07734d6e remove config 2015-04-24 09:51:40 -05:00
artushin 7b4720a957 use existing randomStagger 2015-04-23 17:08:17 -05:00
artushin 8decf5d394 adding check_update_stagger 2015-04-23 16:27:42 -05:00
Ryan Uber 60a6da213f agent: handle nil node services in anti-entropy 2015-04-10 11:15:31 -07:00
Ryan Uber 7e170b047e agent: fix anti-entropy check sync 2015-04-09 10:40:05 -07:00
Ryan Uber a60f4adf95 agent: anti-entropy sync services/checks if they don't exist in the catalog 2015-04-08 12:21:01 -07:00
foostan 2df98c1824 Validation ServiceID/CheckID when deleting in deleteService() in local.go 2015-01-27 18:11:57 +09:00
Ryan Uber 46d5dcfc17 agent: comments for new anti-entropy functionality 2015-01-20 21:48:46 -08:00
Ryan Uber a4039aaa4d agent: simplify anti-entropy of services with multiple checks, add tests 2015-01-20 21:48:46 -08:00
Ryan Uber 0c31e5851c agent: only send service with check sync if it is out of sync 2015-01-20 21:48:46 -08:00
Ryan Uber 949ddefbc8 agent: refactor syncChecks 2015-01-20 21:48:46 -08:00
Ryan Uber 674be58e55 agent: support multiple checks per service 2015-01-20 21:48:42 -08:00
Veres Lajos 3b1068387a typofixes - https://github.com/vlajos/misspell_fixer 2014-12-04 23:25:06 +00:00
Armon Dadgar 5887242db2 agent: Handle service ACLs when doing anti-entropy 2014-12-01 11:43:01 -08:00
Ryan Uber cfca160cd5 formatting 2014-10-15 14:56:15 -07:00
Ryan Uber aa6ffc90f0 agent: remove special case of consul service, adjust tests 2014-10-15 14:52:00 -07:00
Armon Dadgar 8c9ab7ba58 agent: Cleanup handling of defer checks 2014-06-10 10:42:55 -07:00
Armon Dadgar a88c36bdc1 agent: Prevent anti-entropy from doing early sync of check output 2014-06-09 16:00:25 -07:00
Armon Dadgar 8a0b86df10 agent: leave inSync until the defer runs 2014-06-09 12:57:50 -07:00
Armon Dadgar 500bb3931b agent: Defer sync based on CheckUpdateInterval 2014-06-09 12:46:29 -07:00
Armon Dadgar a5f05fa902 agent: Ensure we don't retry too often 2014-06-06 14:38:01 -07:00
Armon Dadgar cc51bf6926 agent: Adding debug log messages 2014-04-23 12:21:47 -07:00
Armon Dadgar 903789aee4 agent: Adding random stagger to anti-entropy. Fixes #72. 2014-04-23 12:21:34 -07:00
Armon Dadgar 018482dc4c Store check output in dedicated field. Fixes #59. 2014-04-21 16:20:22 -07:00
Armon Dadgar d7d30f5cf5 agent: Simplify the local state sync 2014-04-14 12:57:54 -07:00
Armon Dadgar 3cf1a64f87 agent: Handle API changes 2014-03-05 15:03:23 -08:00
Armon Dadgar 4f3adcfdda agent: simplify a select block 2014-02-19 12:39:03 -08:00
Armon Dadgar e12e5f7f68 agent: adding ability to reload services and checks 2014-02-07 12:19:56 -08:00
Armon Dadgar c1637b4978 agent: Adding server up callback to make state sync faster 2014-02-07 12:11:34 -08:00
Armon Dadgar f8bd1a1ac3 agent: Adding support to edge trigger consul server coming up for state sync 2014-02-07 12:03:31 -08:00
Armon Dadgar 01b1104175 agent: adding ability to pause syncing 2014-02-07 11:58:24 -08:00
Armon Dadgar c58c53f448 agent: RPC changes and blocking query support 2014-02-05 14:36:13 -08:00
Armon Dadgar 66b232f53e agent: syncCheck provides the relevant check to prevent a race condition 2014-01-30 13:17:34 -08:00
Armon Dadgar 410a0de0c8 Seperate localState from Agent 2014-01-21 11:52:25 -08:00
Armon Dadgar 7aa278e2ef Allow setting the health check notes 2014-01-20 17:19:20 -10:00
Armon Dadgar a6e4235b96 Adding tests for checks and services endpoints 2014-01-20 15:06:44 -10:00