website: Adding the vs nagios/sensu page

2014-04-10 14:49:12 -07:00 · 2014-04-10 14:49:12 -07:00 · e85ee9f4a5
parent 8d36f647c6
commit e85ee9f4a5
2 changed files with 65 additions and 4 deletions
--- a/website/source/intro/vs/nagios-sensu.html.markdown
+++ b/website/source/intro/vs/nagios-sensu.html.markdown
@ -0,0 +1,49 @@
+---
+layout: "intro"
+page_title: "Consul vs. Nagios, Sensu"
+sidebar_current: "vs-other-nagios-sensu"
+---
+
+# Consul vs. Nagios, Sensu
+
+Nagios and Sensu are both tools built for monitoring. They are used
+to quickly notify operators when an issue occurs.
+
+Nagios uses a group of central servers that are configured to perform
+checks on remote hosts. This design makes it difficult to scale Nagios,
+as large fleets quickly reach the limit of vertical scaling, and Nagios
+does not easily horizontal scale either. Nagios is also notoriously
+difficult to use with modern DevOps and configuration management tools,
+as local configurations must be updated when remote servers are added
+or removed.
+
+Sensu has a much more modern design, relying on local agents to run
+checks and pushing results to an AMQP broker. A number of servers
+ingest and handle the result of the health checks from the broker. This model
+is more scalable than Nagios, as it allows for much more horizontal scaling,
+and a weaker coupling between the servers and agents. However, the central broker
+has scaling limits, and acts as a single point of failure in the system.
+
+Consul provides the same health checking abilities as both Nagios and Sensu,
+is friendly to modern DevOps, and avoids the scaling issues inherint in the
+other systems. Consul runs all checks locally like Sensu avoiding placing
+a burden on central servers. The status of checks is maintained by the Consul
+servers, which are fault tolerant and have no single point of failure.
+Lastly, Consul can scale to vastly more checks because it relies on edge triggered
+updates. This means only when a check transitions from "passing" to "failing"
+or visa versa an update is triggered.
+
+In a large fleet, the majority of checks are passing, and even the minority
+that are failing are persistent. By capturing only changes, Consul reduces
+the amount of networking and compute resources used by the health checks,
+allowing the system to be much more scalable.
+
+An astute reader may notice that if a Consul agent dies, then no edge triggered
+updates will occur. From the perspective of other nodes all checks will appear
+to be in a steady state. However, Consul guards against this as well. The
+[gossip protocol](/docs/internals/gossip.html) used between clients and servers
+integrates a distributed failure detector. This means that if a Consul agent fails,
+the failure will be detected, and thus all checks being run by that node can be
+assumed failed. This failure detector distributes the work among the entire cluster,
+and critically enables the edge triggered architecture to work.
+
--- a/website/source/layouts/intro.erb
+++ b/website/source/layouts/intro.erb
@ -15,11 +15,23 @@

 					<li<%= sidebar_current("vs-other-chef") %>>
 					<a href="/intro/vs/chef-puppet.html">Chef, Puppet, etc.</a>
-					</li>
+                    </li>

-					<li<%= sidebar_current("vs-other-fabric") %>>
-					<a href="/intro/vs/fabric.html">Fabric</a>
-					</li>
+					<li<%= sidebar_current("vs-other-nagios-sensu") %>>
+					<a href="/intro/vs/nagios-sensu.html">Nagios, Sensu</a>
+                    </li>
+
+					<li<%= sidebar_current("vs-other-skydns") %>>
+					<a href="/intro/vs/skydns.html">SkyDNS</a>
+                    </li>
+
+					<li<%= sidebar_current("vs-other-smartstack") %>>
+					<a href="/intro/vs/smartstack.html">SmartStack</a>
+                    </li>
+
+					<li<%= sidebar_current("vs-other-serf") %>>
+					<a href="/intro/vs/serf.html">Serf</a>
+                    </li>

 					<li<%= sidebar_current("vs-other-custom") %>>
 					<a href="/intro/vs/custom.html">Custom Solutions</a>