website: health checks page

11 years ago · af786b0ab5
parent 11d09948cb
commit af786b0ab5
2 changed files with 42 additions and 36 deletions
--- a/website/source/intro/getting-started/checks.html.markdown
+++ b/website/source/intro/getting-started/checks.html.markdown
@ -4,33 +4,47 @@ page_title: "Registering Health Checks"
 sidebar_current: "gettingstarted-checks"
 ---

-# Registering Health Checks
+# Health Checks

-We've already seen how simple registering a service is. In this section we will
-continue by adding both a service level health check, as well as a host level
-health check.
+We've now seen how simple it is to run Consul, add nodes and services, and
+query those nodes and services. In this section we will continue by adding
+health checks to both nodes and services, a critical component of service
+discovery that prevents using services that are unhealthy.
+
+This page will build upon the previous page and assumes you have a
+two node cluster running.

 ## Defining Checks

 Similarly to a service, a check can be registered either by providing a
-[check definition](/docs/agent/checks.html), or by making the appropriate calls to the
-[HTTP API](/docs/agent/http.html). We will use a simple check definition to get started.
-On the second node, we start by adding some additional configuration:
+[check definition](/docs/agent/checks.html)
+, or by making the appropriate calls to the
+[HTTP API](/docs/agent/http.html).
+
+We will use the check definition, because just like services, definitions
+are the most common way to setup checks.
+
+Create two definition files in the Consul configuration directory of
+the second node.
+The first file will add a host-level check, and the second will modify the web
+service definition to add a service-level check.

 ```
-$ echo '{"check": {"name": "ping", "script": "ping -c1 google.com >/dev/null", "interval": "30s"}}' | sudo tee /etc/consul/ping.json
+$ echo '{"check": {"name": "ping", "script": "ping -c1 google.com >/dev/null", "interval": "30s"}}' >/etc/consul.d/ping.json

 $ echo '{"service": {"name": "web", "tags": ["rails"], "port": 80,
-  "check": {"script": "curl localhost:80 >/dev/null 2>&1", "interval": "10s"}}}' | sudo tee /etc/consul/web.json
+  "check": {"script": "curl localhost:80 >/dev/null 2>&1", "interval": "10s"}}}' >/etc/consul.d/web.json
 ```

-The first command adds a "ping" check. This check runs on a 30 second interval, invoking
-the "ping -c1 google.com" command. The second command is modifying our previous definition of
-the `web` service to include a check. This check uses curl every 10 seconds to verify that
-our web server is running.
+The first definition adds a host-level check named "ping". This check runs
+on a 30 second interval, invoking `ping -c1 google.com`. If the command
+exits with a non-zero exit code, then the node will be flagged unhealthy.
+
+The second command modifies the web service and adds a check that uses
+curl every 10 seconds to verify that the web server is running.

-We now restart the second agent, with the same parameters as before. We should now see the following
-log lines:
+Restart the second agent, or send a `SIGHUP` to it. We should now see the
+following log lines:

 ```
 ==> Starting Consul agent...
@ -41,43 +55,36 @@ log lines:
    [WARN] Check 'service:web' is now critical
 ```

-The first few log lines indicate that the agent has synced the new checks and service updates
-with the Consul servers. The last line indicates that the check we added for the `web` service
-is critical. This is because we are not actually running a web server and the curl test
-we've added is failing!
+The first few log lines indicate that the agent has synced the new
+definitions. The last line indicates that the check we added for
+the `web` service is critical. This is because we're not actually running
+a web server and the curl test is failing!

 ## Checking Health Status

-Now that we've added some simple checks, we can use the HTTP API to check them. First,
-we can look for any failing checks:
+Now that we've added some simple checks, we can use the HTTP API to check
+them. First, we can look for any failing checks. You can run this curl
+on either node:

 ```
 $ curl http://localhost:8500/v1/health/state/critical
 [{"Node":"agent-two","CheckID":"service:web","Name":"Service 'web' check","Status":"critical","Notes":"","ServiceID":"web","ServiceName":"web"}]
 ```

-We can see that there is only a single check in the `critical` state, which is our
-`web` service check. If we try to perform a DNS lookup for the service, we will see that
-we don't get any results:
+We can see that there is only a single check in the `critical` state, which is
+our `web` service check.
+
+Additionally, we can attempt to query the web service using DNS. Consul
+will not return any results, since the service is unhealthy:

 ```
 dig @127.0.0.1 -p 8600 web.service.consul
-
-; <<>> DiG 9.8.1-P1 <<>> @127.0.0.1 -p 8600 web.service.consul
-; (1 server found)
-;; global options: +cmd
-;; Got answer:
-;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35753
-;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
-;; WARNING: recursion requested but not available
+...

 ;; QUESTION SECTION:
 ;web.service.consul.		IN	A
 ```

-The DNS interface uses the health information and avoids routing to nodes that
-are failing their health checks. This is all managed for us automatically.
-
 This section should have shown that checks can be easily added. Check definitions
 can be updated by changing configuration files and sending a `SIGHUP` to the agent.
 Alternatively the HTTP API can be used to add, remove and modify checks dynamically.
--- a/website/source/intro/getting-started/join.html.markdown
+++ b/website/source/intro/getting-started/join.html.markdown
@ -119,4 +119,3 @@ To leave the cluster, you can either gracefully quit an agent (using
 the node to transition into the _left_ state, otherwise other nodes
 will detect it as having _failed_. The difference is covered
 in more detail [here](/intro/getting-started/agent.html#toc_3).
-