From 3af280213cb8b0eb9870a823ab5f7b9eb9d7a697 Mon Sep 17 00:00:00 2001 From: Armon Dadgar Date: Mon, 19 Jan 2015 16:43:24 -1000 Subject: [PATCH] website: Document a distributed semaphore --- .../source/docs/guides/index.html.markdown | 2 + .../docs/guides/semaphore.html.markdown | 133 ++++++++++++++++++ website/source/layouts/docs.erb | 4 + 3 files changed, 139 insertions(+) create mode 100644 website/source/docs/guides/semaphore.html.markdown diff --git a/website/source/docs/guides/index.html.markdown b/website/source/docs/guides/index.html.markdown index af804320b0..6e618b0483 100644 --- a/website/source/docs/guides/index.html.markdown +++ b/website/source/docs/guides/index.html.markdown @@ -27,3 +27,5 @@ The following guides are available: * [Multiple Datacenters](/docs/guides/datacenters.html) - Configuring Consul to support multiple datacenters. * [Outage Recovery](/docs/guides/outage.html) - This guide covers recovering a cluster that has become unavailable due to server failures. + +* [Semaphore](/docs/guides/semaphore.html) - This guide covers using the Key/Value store to implement a semaphore. diff --git a/website/source/docs/guides/semaphore.html.markdown b/website/source/docs/guides/semaphore.html.markdown new file mode 100644 index 0000000000..bb56cffc6f --- /dev/null +++ b/website/source/docs/guides/semaphore.html.markdown @@ -0,0 +1,133 @@ +--- +layout: "docs" +page_title: "Semaphore" +sidebar_current: "docs-guides-semaphore" +description: |- + This guide demonstrates how to implement a distributed semaphore using the Consul Key/Value store. +--- + +# Semaphore + +The goal of this guide is to cover how to build a client-side semaphore using Consul. +This is useful when you want to coordinate many services while restricting access to +certain resources. + +If you only need mutual exclusion or leader election, [this guide](/docs/guides/leader-election.html) +provides a simpler algorithm that can be used instead. + +There are a number of ways that a semaphore can be built, so our goal is not to +cover all the possible methods. Instead, we will focus on using Consul's support for +[sessions](/docs/internals/sessions.html), which allow us to build a system that can +gracefully handle failures. + +Note that JSON output in this guide has been pretty-printed for easier +reading. Actual values returned from the API will not be formatted. + +## Contending Nodes + +The primary flow is for nodes who are attempting to acquire a slot in the semaphore. +All nodes that are participating should agree on a given prefix being used to coordinate, +a single lock key, and a limit of slot holders. A good choice is simply: + +```text +service//lock/ +``` + +We will refer to this as just `` for simplicity. + +The first step is to create a session. This is done using the [/v1/session/create endpoint][session-api]: + +[session-api]: http://www.consul.io/docs/agent/http.html#_v1_session_create + +```text +curl -X PUT -d '{"Name": "dbservice"}' \ + http://localhost:8500/v1/session/create + ``` + +This will return a JSON object contain the session ID: + +```text +{ + "ID": "4ca8e74b-6350-7587-addf-a18084928f3c" +} +``` + +The session by default makes use of only the gossip failure detector. Additional checks +can be specified if desired. + +Next, we create a contender entry. Each contender makes an entry that is tied +to a session. This is done so that if a contender is holding a slot and fails +it can be detected by the other contenders. Optionally, an opaque value +can be associated with the contender via a ``. + +Create the contender key by doing an `acquire` on `/` by doing a `PUT`. +This is something like: + +```text +curl -X PUT -d http://localhost:8500/v1/kv//?acquire= + ``` + +Where `` is the ID returned by the call to `/v1/session/create`. + +This will either return `true` or `false`. If `true` is returned, the contender +entry has been created. If `false` is returned, the contender node was not created and +likely this indicates a session invalidation. + +The next step is to use a single key to coordinate which holders are currently +reserving a slot. A good choice is simply `/.lock`. We will refer to this +special coordinating key as ``. The current state of the semaphore is read by +doing a `GET` on the entire ``: + +```text +curl http://localhost:8500/v1/kv/?recurse + ``` + +Within the list of the entries, we should find the ``. That entry should hold +both the slot limit and the current holders. A simple JSON body like the following works: + +```text +{ + "Limit": 3, + "Holders": { + "4ca8e74b-6350-7587-addf-a18084928f3c": true, + "adf4238a-882b-9ddc-4a9d-5b6758e4159e": true + } +} +``` + +When the `` is read, we can verify the remote `Limit` agrees with the local value. This +is used to detect a potential conflict. The next step is to determine which of the current +slot holders are still alive. As part of the results of the `GET`, we have all the contender +entries. By scanning those entries, we create a set of all the `Session` values. Any of the +`Holders` that are not in that set are pruned. In effect, we are creating a set of live contenders +based on the list results, and doing a set difference with the `Holders` to detect and prune +any potentially failed holders. + +If the number of holders (after pruning) is less than the limit, a contender attempts acquisition +by adding its own session to the `Holders` and doing a Check-And-Set update of the ``. This +performs an optimistic update. + +This is done by: + +```text +curl -X PUT -d http://localhost:8500/v1/kv/?cas= + ``` + +If this suceeds with `true` the condenter now holds a slot in the semaphore. If this fails +with `false`, then likely there was a race with another contender to acquire the slot. +Both code paths now go into an idle waiting state. In this state, we watch for changes +on ``. This is because a slot may be released, a node may fail, etc. +Slot holders must also watch for changes since the slot may be released by an operator, +or automatically released due to a false positive in the failure detector. + +Watching for changes is done by doing a blocking query against ``. If a contender +holds a slot, then on any change the `` should be re-checked to ensure the slot is +still held. If no slot is held, then the same acquisition logic is triggered to check +and potentially re-attempt acquisition. This allows a contender to steal the slot from +a failed contender or one that has voluntarily released its slot. + +If a slot holder ever wishes to release voluntarily, this should be done by doing a +Check-And-Set operation against `` to remove its session from the `Holders`. Once +that is done, the contender entry at `/` should be delete. Finally the +session should be destroyed. + diff --git a/website/source/layouts/docs.erb b/website/source/layouts/docs.erb index d6764e7929..d6c5e2aaa7 100644 --- a/website/source/layouts/docs.erb +++ b/website/source/layouts/docs.erb @@ -187,6 +187,10 @@ > Outage Recovery + + > + Semaphore + >