One of the primary roles of the agent is the management of system and application level health checks. A health check is considered to be application level if it associated with a service. A check is defined in a configuration file, or added at runtime over the HTTP interface.
One of the primary roles of the agent is management of system- and application-level health checks. A health check is considered to be application-level if it is associated with a service. A check is defined in a configuration file or added at runtime over the HTTP interface.
---
---
# Checks
# Checks
One of the primary roles of the agent is the management of system and
One of the primary roles of the agent is management of system-level and application-level health
application level health checks. A health check is considered to be application
checks. A health check is considered to be application-level if it is associated with a
level if it associated with a service. A check is defined in a configuration file,
service. If not associated with a service, the check monitors the health of the entire node.
or added at runtime over the HTTP interface.
A check is defined in a configuration file or added at runtime over the HTTP interface. Checks
created via the HTTP interface persist with that node.
There are three different kinds of checks:
There are three different kinds of checks:
* Script + Interval - These checks depend on invoking an external application
* Script + Interval - These checks depend on invoking an external application
that does the health check and exits with an appropriate exit code, potentially
that performs the health check, exits with an appropriate exit code, and potentially
generating some output. A script is paired with an invocation interval (e.g.
generates some output. A script is paired with an invocation interval (e.g.
every 30 seconds). This is similar to the Nagios plugin system.
every 30 seconds). This is similar to the Nagios plugin system.
* HTTP + Interval - These checks make an `HTTP GET` request every Interval (e.g.
* HTTP + Interval - These checks make an HTTP `GET` request every Interval (e.g.
every 30 seconds) to the specified URL. The status of the service depends on the HTTP Response Code.
every 30 seconds) to the specified URL. The status of the service depends on the HTTP response code:
any `2xx` code is passing, `429 Too Many Requests` is warning and anything else is failing.
any `2xx` code is considered passing, a `429 Too Many Requests` is a warning, and anything else is a failure.
This type of check should be preferred over a script that for example uses `curl`.
This type of check should be preferred over a script that uses `curl` or another external process
to check a simple HTTP operation.
* Time to Live (TTL) - These checks retain their last known state for a given TTL.
* Time to Live (TTL) - These checks retain their last known state for a given TTL.
The state of the check must be updated periodically over the HTTP interface. If an
The state of the check must be updated periodically over the HTTP interface. If an
external system fails to update the status within a given TTL, the check is
external system fails to update the status within a given TTL, the check is
set to the failed state. This mechanism is used to allow an application to
set to the failed state. This mechanism, conceptually similar to a dead man's switch,
directly report its health. For example, a web app can periodically curl the
relies on the application to directly report its health. For example, a healthy app
endpoint, and if the app fails, then the TTL will expire and the health check
can periodically `PUT` a status update to the HTTP endpoint; if the app fails, the TTL will
enters a critical state. This is conceptually similar to a dead man's switch.
expire and the health check enters a critical state.
## Check Definition
## Check Definition
A check definition that is a script looks like:
A script check:
```javascript
```javascript
{
{
@ -48,7 +51,7 @@ A check definition that is a script looks like:
}
}
```
```
An HTTP based check looks like:
A HTTP check:
```javascript
```javascript
{
{
@ -61,7 +64,7 @@ An HTTP based check looks like:
}
}
```
```
A TTL based check is very similar:
A TTL check:
```javascript
```javascript
{
{
@ -74,18 +77,18 @@ A TTL based check is very similar:
}
}
```
```
Each type of definitions must include a `name`, and may optionally
Each type of definition must include a `name` and may optionally
provide an `id` and `notes` field. The `id` is set to the `name` if not
provide an `id` and `notes` field. The `id` is set to the `name` if not
provided. It is required that all checks have a unique ID per node, so if names
provided. It is required that all checks have a unique ID per node: if names
might conflict then unique ID's should be provided.
might conflict, unique IDs should be provided.
The `notes` field is opaque to Consul, but may be used for human
The `notes` field is opaque to Consul but can be used to provide a human-readable
readable descriptions. The field is set to any output that a script
description of the current state of the check. With a script check, the field is
generates, and similarly the TTL update hooks can update the `notes`
set to any output generated by the script. Similarly, an external process updating
as well.
a TTL check via the HTTP interface can set the `notes` value.
To configure a check, either provide it as a `-config-file` option to the
To configure a check, either provide it as a `-config-file` option to the
agent, or place it inside the `-config-dir` of the agent. The file must
agent or place it inside the `-config-dir` of the agent. The file must
end in the ".json" extension to be loaded by Consul. Check definitions can
end in the ".json" extension to be loaded by Consul. Check definitions can
also be updated by sending a `SIGHUP` to the agent. Alternatively, the
also be updated by sending a `SIGHUP` to the agent. Alternatively, the
check can be registered dynamically using the [HTTP API](/docs/agent/http.html).
check can be registered dynamically using the [HTTP API](/docs/agent/http.html).
@ -93,8 +96,8 @@ check can be registered dynamically using the [HTTP API](/docs/agent/http.html).
## Check Scripts
## Check Scripts
A check script is generally free to do anything to determine the status
A check script is generally free to do anything to determine the status
of the check. The only limitations placed are that the exit codes must convey
of the check. The only limitations placed are that the exit codes must obey
a specific meaning. Specifically:
this convention:
* Exit code 0 - Check is passing
* Exit code 0 - Check is passing
* Exit code 1 - Check is warning
* Exit code 1 - Check is warning
@ -106,7 +109,7 @@ by human operators.
## Service-bound checks
## Service-bound checks
Health checks may also be optionally bound to a specific service. This ensures
Health checks may optionally be bound to a specific service. This ensures
that the status of the health check will only affect the health status of the
that the status of the health check will only affect the health status of the
given service instead of the entire node. Service-bound health checks may be
given service instead of the entire node. Service-bound health checks may be
provided by adding a `service_id` field to a check configuration:
provided by adding a `service_id` field to a check configuration:
@ -123,12 +126,12 @@ provided by adding a `service_id` field to a check configuration:
```
```
In the above configuration, if the web-app health check begins failing, it will
In the above configuration, if the web-app health check begins failing, it will
only affect the availability of the web-app service and no other services
only affect the availability of the web-app service. All other services
provided by the node.
provided by the node will remain unchanged.
## Multiple Check Definitions
## Multiple Check Definitions
Multiple check definitions can be provided at once using the `checks` (plural)
Multiple check definitions can be defined using the `checks` (plural)