Updates memberlist to pick up Lifeguard research findings. (#3287)

See https://www.hashicorp.com/blog/making-gossip-more-robust-with-lifeguard/.
7 years ago · 31a7701891
parent e9c733eefb
commit 31a7701891
3 changed files with 9 additions and 80 deletions
--- a/vendor/github.com/hashicorp/memberlist/README.md
+++ b/vendor/github.com/hashicorp/memberlist/README.md
@ -65,82 +65,11 @@ For complete documentation, see the associated [Godoc](http://godoc.org/github.c
 ## Protocol
-memberlist is based on ["SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol"](http://www.cs.cornell.edu/~asdas/research/dsn02-swim.pdf),
+memberlist is based on ["SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol"](http://www.cs.cornell.edu/~asdas/research/dsn02-swim.pdf). However, we extend the protocol in a number of ways:
-with a few minor adaptations, mostly to increase propagation speed and
+
 * Several extensions are made to increase propagation speed and
 convergence rate.
 * Another set of extensions, that we call Lifeguard, are made to make memberlist more robust in the presence of slow message processing (due to factors such as CPU starvation, and network delay or loss).
-A high level overview of the memberlist protocol (based on SWIM) is
+For details on all of these extensions, please read our paper "[Lifeguard : SWIM-ing with Situational Awareness](https://arxiv.org/abs/1707.00788)", along with the memberlist source.  We welcome any questions related
 described below, but for details please read the full
 [SWIM paper](http://www.cs.cornell.edu/~asdas/research/dsn02-swim.pdf)
 followed by the memberlist source. We welcome any questions related
 to the protocol on our issue tracker.
 ### Protocol Description
 memberlist begins by joining an existing cluster or starting a new
 cluster. If starting a new cluster, additional nodes are expected to join
 it. New nodes in an existing cluster must be given the address of at
 least one existing member in order to join the cluster. The new member
 does a full state sync with the existing member over TCP and begins gossiping its
 existence to the cluster.
 Gossip is done over UDP with a configurable but fixed fanout and interval.
 This ensures that network usage is constant with regards to number of nodes, as opposed to
 exponential growth that can occur with traditional heartbeat mechanisms.
 Complete state exchanges with a random node are done periodically over
 TCP, but much less often than gossip messages. This increases the likelihood
 that the membership list converges properly since the full state is exchanged
 and merged. The interval between full state exchanges is configurable or can
 be disabled entirely.
 Failure detection is done by periodic random probing using a configurable interval.
 If the node fails to ack within a reasonable time (typically some multiple
 of RTT), then an indirect probe as well as a direct TCP probe are attempted. An
 indirect probe asks a configurable number of random nodes to probe the same node,
 in case there are network issues causing our own node to fail the probe. The direct
 TCP probe is used to help identify the common situation where networking is
 misconfigured to allow TCP but not UDP. Without the TCP probe, a UDP-isolated node
 would think all other nodes were suspect and could cause churn in the cluster when
 it attempts a TCP-based state exchange with another node. It is not desirable to
 operate with only TCP connectivity because convergence will be much slower, but it
 is enabled so that memberlist can detect this situation and alert operators.
 If both our probe, the indirect probes, and the direct TCP probe fail within a
 configurable time, then the node is marked "suspicious" and this knowledge is
 gossiped to the cluster. A suspicious node is still considered a member of
 cluster. If the suspect member of the cluster does not dispute the suspicion
 within a configurable period of time, the node is finally considered dead,
 and this state is then gossiped to the cluster.
 This is a brief and incomplete description of the protocol. For a better idea,
 please read the
 [SWIM paper](http://www.cs.cornell.edu/~asdas/research/dsn02-swim.pdf)
 in its entirety, along with the memberlist source code.
 ### Changes from SWIM
 As mentioned earlier, the memberlist protocol is based on SWIM but includes
 minor changes, mostly to increase propagation speed and convergence rates.
 The changes from SWIM are noted here:
 * memberlist does a full state sync over TCP periodically. SWIM only propagates
  changes over gossip. While both eventually reach convergence, the full state
  sync increases the likelihood that nodes are fully converged more quickly,
  at the expense of more bandwidth usage. This feature can be totally disabled
  if you wish.
 * memberlist has a dedicated gossip layer separate from the failure detection
  protocol. SWIM only piggybacks gossip messages on top of probe/ack messages.
  memberlist also piggybacks gossip messages on top of probe/ack messages, but
  also will periodically send out dedicated gossip messages on their own. This
  feature lets you have a higher gossip rate (for example once per 200ms)
  and a slower failure detection rate (such as once per second), resulting
  in overall faster convergence rates and data propagation speeds. This feature
  can be totally disabed as well, if you wish.
 * memberlist stores around the state of dead nodes for a set amount of time,
  so that when full syncs are requested, the requester also receives information
  about dead nodes. Because SWIM doesn't do full syncs, SWIM deletes dead node
  state immediately upon learning that the node is dead. This change again helps
  the cluster converge more quickly.
--- a/vendor/github.com/hashicorp/memberlist/config.go
+++ b/vendor/github.com/hashicorp/memberlist/config.go
@ -235,7 +235,7 @@ func DefaultLANConfig() *Config {
 		TCPTimeout:              10 * time.Second,       // Timeout after 10 seconds
 		IndirectChecks:          3,                      // Use 3 nodes for the indirect ping
 		RetransmitMult:          4,                      // Retransmit a message 4 * log(N+1) nodes
-		SuspicionMult:           5,                      // Suspect a node for 5 * log(N+1) * Interval
+		SuspicionMult:           4,                      // Suspect a node for 4 * log(N+1) * Interval
 		SuspicionMaxTimeoutMult: 6,                      // For 10k nodes this will give a max timeout of 120 seconds
 		PushPullInterval:        30 * time.Second,       // Low frequency
 		ProbeTimeout:            500 * time.Millisecond, // Reasonable RTT time for LAN
--- a/vendor/vendor.json
+++ b/vendor/vendor.json
@ -720,10 +720,10 @@
 			"revisionTime": "2015-06-09T07:04:31Z"
 		},
 		{
-			"checksumSHA1": "ooNF6amPOxyz9OULy2hfbtNlIw4=",
+			"checksumSHA1": "zcZtXfxrusJpcaPeGJOBnPG1xjs=",
 			"path": "github.com/hashicorp/memberlist",
-			"revision": "924edbc42f05a6f7ccd6d4184b8147f2c20ae5f3",
+			"revision": "99594a4f171a77cb7cff01f143ccf608ac577c47",
-			"revisionTime": "2017-07-07T07:05:14Z"
+			"revisionTime": "2017-07-17T19:31:21Z"
 		},
 		{
 			"checksumSHA1": "qnlqWJYV81ENr61SZk9c65R1mDo=",