website: health checks page

pull/36/head
Mitchell Hashimoto 11 years ago
parent 11d09948cb
commit af786b0ab5

@ -4,33 +4,47 @@ page_title: "Registering Health Checks"
sidebar_current: "gettingstarted-checks"
---
# Registering Health Checks
# Health Checks
We've already seen how simple registering a service is. In this section we will
continue by adding both a service level health check, as well as a host level
health check.
We've now seen how simple it is to run Consul, add nodes and services, and
query those nodes and services. In this section we will continue by adding
health checks to both nodes and services, a critical component of service
discovery that prevents using services that are unhealthy.
This page will build upon the previous page and assumes you have a
two node cluster running.
## Defining Checks
Similarly to a service, a check can be registered either by providing a
[check definition](/docs/agent/checks.html), or by making the appropriate calls to the
[HTTP API](/docs/agent/http.html). We will use a simple check definition to get started.
On the second node, we start by adding some additional configuration:
[check definition](/docs/agent/checks.html)
, or by making the appropriate calls to the
[HTTP API](/docs/agent/http.html).
We will use the check definition, because just like services, definitions
are the most common way to setup checks.
Create two definition files in the Consul configuration directory of
the second node.
The first file will add a host-level check, and the second will modify the web
service definition to add a service-level check.
```
$ echo '{"check": {"name": "ping", "script": "ping -c1 google.com >/dev/null", "interval": "30s"}}' | sudo tee /etc/consul/ping.json
$ echo '{"check": {"name": "ping", "script": "ping -c1 google.com >/dev/null", "interval": "30s"}}' >/etc/consul.d/ping.json
$ echo '{"service": {"name": "web", "tags": ["rails"], "port": 80,
"check": {"script": "curl localhost:80 >/dev/null 2>&1", "interval": "10s"}}}' | sudo tee /etc/consul/web.json
"check": {"script": "curl localhost:80 >/dev/null 2>&1", "interval": "10s"}}}' >/etc/consul.d/web.json
```
The first command adds a "ping" check. This check runs on a 30 second interval, invoking
the "ping -c1 google.com" command. The second command is modifying our previous definition of
the `web` service to include a check. This check uses curl every 10 seconds to verify that
our web server is running.
The first definition adds a host-level check named "ping". This check runs
on a 30 second interval, invoking `ping -c1 google.com`. If the command
exits with a non-zero exit code, then the node will be flagged unhealthy.
The second command modifies the web service and adds a check that uses
curl every 10 seconds to verify that the web server is running.
We now restart the second agent, with the same parameters as before. We should now see the following
log lines:
Restart the second agent, or send a `SIGHUP` to it. We should now see the
following log lines:
```
==> Starting Consul agent...
@ -41,43 +55,36 @@ log lines:
[WARN] Check 'service:web' is now critical
```
The first few log lines indicate that the agent has synced the new checks and service updates
with the Consul servers. The last line indicates that the check we added for the `web` service
is critical. This is because we are not actually running a web server and the curl test
we've added is failing!
The first few log lines indicate that the agent has synced the new
definitions. The last line indicates that the check we added for
the `web` service is critical. This is because we're not actually running
a web server and the curl test is failing!
## Checking Health Status
Now that we've added some simple checks, we can use the HTTP API to check them. First,
we can look for any failing checks:
Now that we've added some simple checks, we can use the HTTP API to check
them. First, we can look for any failing checks. You can run this curl
on either node:
```
$ curl http://localhost:8500/v1/health/state/critical
[{"Node":"agent-two","CheckID":"service:web","Name":"Service 'web' check","Status":"critical","Notes":"","ServiceID":"web","ServiceName":"web"}]
```
We can see that there is only a single check in the `critical` state, which is our
`web` service check. If we try to perform a DNS lookup for the service, we will see that
we don't get any results:
We can see that there is only a single check in the `critical` state, which is
our `web` service check.
Additionally, we can attempt to query the web service using DNS. Consul
will not return any results, since the service is unhealthy:
```
dig @127.0.0.1 -p 8600 web.service.consul
; <<>> DiG 9.8.1-P1 <<>> @127.0.0.1 -p 8600 web.service.consul
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35753
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available
...
;; QUESTION SECTION:
;web.service.consul. IN A
```
The DNS interface uses the health information and avoids routing to nodes that
are failing their health checks. This is all managed for us automatically.
This section should have shown that checks can be easily added. Check definitions
can be updated by changing configuration files and sending a `SIGHUP` to the agent.
Alternatively the HTTP API can be used to add, remove and modify checks dynamically.

@ -119,4 +119,3 @@ To leave the cluster, you can either gracefully quit an agent (using
the node to transition into the _left_ state, otherwise other nodes
will detect it as having _failed_. The difference is covered
in more detail [here](/intro/getting-started/agent.html#toc_3).

Loading…
Cancel
Save