Exporter for machine metrics

prometheus metrics host-metrics machine-metrics node-metrics procfs prometheus-exporter system-information system-metrics

Go to file

Ben Kochie f17a85d63d Merge branch 'master' into netclass-filter-before-parsing		3 years ago
.circleci	Update build	4 years ago
.github	Add additional field to github issue template. (#645 )	7 years ago
collector	Merge branch 'master' into netclass-filter-before-parsing	3 years ago
docs	fix interval in graphs panels of node dashboard	4 years ago
examples	Add init.d script for OpenWrt.	5 years ago
scripts	Add ErrorLog plumbing to promhttp	4 years ago
text_collector_examples	Remove text_collector_examples/ (#1441 )	5 years ago
.dockerignore	Added s390x support for docker image (#1539 )	5 years ago
.gitignore	Remove vendor directory	4 years ago
.golangci.yml	Remove vendor directory	4 years ago
.promu-cgo.yml	Add Darwin arm64 build	4 years ago
.promu.yml	Update build	4 years ago
CHANGELOG.md	Add tapestats to collect tape devices statistics	3 years ago
CODE_OF_CONDUCT.md	Update common Prometheus files	4 years ago
CONTRIBUTING.md	Document DCO in CONTRIBUTING.md	7 years ago
Dockerfile	Update build	4 years ago
LICENSE	License cleanup	10 years ago
MAINTAINERS.md	Remove continue with label (#1084 )	6 years ago
Makefile	Use rm -rf to remove fixtures	4 years ago
Makefile.common	Update common Prometheus files	3 years ago
NOTICE	Vendor github.com/mdlayher/wifi and dependencies	8 years ago
README.md	Add nvme collector	3 years ago
SECURITY.md	Update common Prometheus files	4 years ago
VERSION	Release version 1.1.2	4 years ago
checkmetrics.sh	Fix and simplify checkmetrics make target (#1731 )	5 years ago
end-to-end-test.sh	ethtool: Remove end-to-end testing.	4 years ago
example-rules.yml	Fix cpu utilization rule.	7 years ago
go.mod	mod: update procfs dependency to v0.7.0	3 years ago
go.sum	mod: update procfs dependency to v0.7.0	3 years ago
node_exporter.go	Use new client_golang collectors package.	3 years ago
node_exporter_test.go	update procfs to v0.0.2 (#1376 )	6 years ago
staticcheck.conf	Add staticcheck.conf to enable ST1003 (#1214 )	6 years ago
test_image.sh	Resolves prometheus/node_exporter#585 (#586 )	7 years ago
tls_config_noAuth.bad.yml	Adding TLS to node exporter - cleaner version (#1277 )	5 years ago
ttar	Vendor ttar from github.com/ideaship/ttar	7 years ago

README.md

Node exporter

Prometheus exporter for hardware and OS metrics exposed by *NIX kernels, written in Go with pluggable metric collectors.

The Windows exporter is recommended for Windows users. To expose NVIDIA GPU metrics, prometheus-dcgm can be used.

Installation and Usage

If you are new to Prometheus and node_exporter there is a simple step-by-step guide.

The node_exporter listens on HTTP port 9100 by default. See the --help output for more options.

Ansible

For automated installs with Ansible, there is the Cloud Alchemy role.

RHEL/CentOS/Fedora

There is a community-supplied COPR repository which closely follows upstream releases.

Docker

The node_exporter is designed to monitor the host system. It's not recommended to deploy it as a Docker container because it requires access to the host system.

For situations where Docker deployment is needed, some extra flags must be used to allow the node_exporter access to the host namespaces.

Be aware that any non-root mount points you want to monitor will need to be bind-mounted into the container.

If you start container for host monitoring, specify path.rootfs argument. This argument must match path in bind-mount of host root. The node_exporter will use path.rootfs as prefix to access host filesystem.

docker run -d \
  --net="host" \
  --pid="host" \
  -v "/:/host:ro,rslave" \
  quay.io/prometheus/node-exporter:latest \
  --path.rootfs=/host

For Docker compose, similar flag changes are needed.

---
version: '3.8'

services:
  node_exporter:
    image: quay.io/prometheus/node-exporter:latest
    container_name: node_exporter
    command:
      - '--path.rootfs=/host'
    network_mode: host
    pid: host
    restart: unless-stopped
    volumes:
      - '/:/host:ro,rslave'

On some systems, the timex collector requires an additional Docker flag, --cap-add=SYS_TIME, in order to access the required syscalls.

Collectors

There is varying support for collectors on each operating system. The tables below list all existing collectors and the supported systems.

Collectors are enabled by providing a --collector.<name> flag. Collectors that are enabled by default can be disabled by providing a --no-collector.<name> flag. To enable only some specific collector(s), use --collector.disable-defaults --collector.<name> ....

Enabled by default

Name	Description	OS
arp	Exposes ARP statistics from `/proc/net/arp`.	Linux
bcache	Exposes bcache statistics from `/sys/fs/bcache/`.	Linux
bonding	Exposes the number of configured and active slaves of Linux bonding interfaces.	Linux
btrfs	Exposes btrfs statistics	Linux
boottime	Exposes system boot time derived from the `kern.boottime` sysctl.	Darwin, Dragonfly, FreeBSD, NetBSD, OpenBSD, Solaris
conntrack	Shows conntrack statistics (does nothing if no `/proc/sys/net/netfilter/` present).	Linux
cpu	Exposes CPU statistics	Darwin, Dragonfly, FreeBSD, Linux, Solaris, OpenBSD
cpufreq	Exposes CPU frequency statistics	Linux, Solaris
diskstats	Exposes disk I/O statistics.	Darwin, Linux, OpenBSD
edac	Exposes error detection and correction statistics.	Linux
entropy	Exposes available entropy.	Linux
exec	Exposes execution statistics.	Dragonfly, FreeBSD
fibrechannel	Exposes fibre channel information and statistics from `/sys/class/fc_host/`.	Linux
filefd	Exposes file descriptor statistics from `/proc/sys/fs/file-nr`.	Linux
filesystem	Exposes filesystem statistics, such as disk space used.	Darwin, Dragonfly, FreeBSD, Linux, OpenBSD
hwmon	Expose hardware monitoring and sensor data from `/sys/class/hwmon/`.	Linux
infiniband	Exposes network statistics specific to InfiniBand and Intel OmniPath configurations.	Linux
ipvs	Exposes IPVS status from `/proc/net/ip_vs` and stats from `/proc/net/ip_vs_stats`.	Linux
loadavg	Exposes load average.	Darwin, Dragonfly, FreeBSD, Linux, NetBSD, OpenBSD, Solaris
mdadm	Exposes statistics about devices in `/proc/mdstat` (does nothing if no `/proc/mdstat` present).	Linux
meminfo	Exposes memory statistics.	Darwin, Dragonfly, FreeBSD, Linux, OpenBSD
netclass	Exposes network interface info from `/sys/class/net/`	Linux
netdev	Exposes network interface statistics such as bytes transferred.	Darwin, Dragonfly, FreeBSD, Linux, OpenBSD
netstat	Exposes network statistics from `/proc/net/netstat`. This is the same information as `netstat -s`.	Linux
nfs	Exposes NFS client statistics from `/proc/net/rpc/nfs`. This is the same information as `nfsstat -c`.	Linux
nfsd	Exposes NFS kernel server statistics from `/proc/net/rpc/nfsd`. This is the same information as `nfsstat -s`.	Linux
nvme	Exposes NVMe info from `/sys/class/nvme/`	Linux
powersupplyclass	Exposes Power Supply statistics from `/sys/class/power_supply`	Linux
pressure	Exposes pressure stall statistics from `/proc/pressure/`.	Linux (kernel 4.20+ and/or CONFIG_PSI)
rapl	Exposes various statistics from `/sys/class/powercap`.	Linux
schedstat	Exposes task scheduler statistics from `/proc/schedstat`.	Linux
sockstat	Exposes various statistics from `/proc/net/sockstat`.	Linux
softnet	Exposes statistics from `/proc/net/softnet_stat`.	Linux
stat	Exposes various statistics from `/proc/stat`. This includes boot time, forks and interrupts.	Linux
textfile	Exposes statistics read from local disk. The `--collector.textfile.directory` flag must be set.	any
thermal_zone	Exposes thermal zone & cooling device statistics from `/sys/class/thermal`.	Linux
time	Exposes the current system time.	any
timex	Exposes selected adjtimex(2) system call stats.	Linux
udp_queues	Exposes UDP total lengths of the rx_queue and tx_queue from `/proc/net/udp` and `/proc/net/udp6`.	Linux
uname	Exposes system information as provided by the uname system call.	Darwin, FreeBSD, Linux, OpenBSD
vmstat	Exposes statistics from `/proc/vmstat`.	Linux
xfs	Exposes XFS runtime statistics.	Linux (kernel 4.4+)
zfs	Exposes ZFS performance statistics.	Linux, Solaris

Disabled by default

node_exporter also implements a number of collectors that are disabled by default. Reasons for this vary by collector, and may include:

High cardinality
Prolonged runtime that exceeds the Prometheus scrape_interval or scrape_timeout
Significant resource demands on the host

You can enable additional collectors as desired by adding them to your init system's or service supervisor's startup configuration for node_exporter but caution is advised. Enable at most one at a time, testing first on a non-production system, then by hand on a single production node. When enabling additional collectors, you should carefully monitor the change by observing the scrape_duration_seconds metric to ensure that collection completes and does not time out. In addition, monitor the scrape_samples_post_metric_relabeling metric to see the changes in cardinality.

The perf collector may not work out of the box on some Linux systems due to kernel configuration and security settings. To allow access, set the following sysctl parameter:

sysctl -w kernel.perf_event_paranoid=X

2 allow only user-space measurements (default since Linux 4.6).
1 allow both kernel and user measurements (default before Linux 4.6).
0 allow access to CPU-specific data but not raw tracepoint samples.
-1 no restrictions.

Depending on the configured value different metrics will be available, for most cases 0 will provide the most complete set. For more information see man 2 perf_event_open.

By default, the perf collector will only collect metrics of the CPUs that node_exporter is running on (ie runtime.NumCPU. If this is insufficient (e.g. if you run node_exporter with its CPU affinity set to specific CPUs), you can specify a list of alternate CPUs by using the --collector.perf.cpus flag. For example, to collect metrics on CPUs 2-6, you would specify: --collector.perf --collector.perf.cpus=2-6. The CPU configuration is zero indexed and can also take a stride value; e.g. --collector.perf --collector.perf.cpus=1-10:5 would collect on CPUs 1, 5, and 10.

The perf collector is also able to collect tracepoint counts when using the --collector.perf.tracepoint flag. Tracepoints can be found using perf list or from debugfs. And example usage of this would be --collector.perf.tracepoint="sched:sched_process_exec".

Name	Description	OS
buddyinfo	Exposes statistics of memory fragments as reported by /proc/buddyinfo.	Linux
devstat	Exposes device statistics	Dragonfly, FreeBSD
drbd	Exposes Distributed Replicated Block Device statistics (to version 8.4)	Linux
ethtool	Exposes network interface and network driver statistics equivalent to `ethtool -S`.	Linux
interrupts	Exposes detailed interrupts statistics.	Linux, OpenBSD
ksmd	Exposes kernel and system statistics from `/sys/kernel/mm/ksm`.	Linux
logind	Exposes session counts from logind.	Linux
meminfo_numa	Exposes memory statistics from `/proc/meminfo_numa`.	Linux
mountstats	Exposes filesystem statistics from `/proc/self/mountstats`. Exposes detailed NFS client statistics.	Linux
network_route	Exposes the routing table as metrics	Linux
ntp	Exposes local NTP daemon health to check time	any
perf	Exposes perf based metrics (Warning: Metrics are dependent on kernel configuration and settings).	Linux
processes	Exposes aggregate process statistics from `/proc`.	Linux
qdisc	Exposes queuing discipline statistics	Linux
runit	Exposes service status from runit.	any
supervisord	Exposes service status from supervisord.	any
systemd	Exposes service and system status from systemd.	Linux
tcpstat	Exposes TCP connection status information from `/proc/net/tcp` and `/proc/net/tcp6`. (Warning: the current version has potential performance issues in high load situations.)	Linux
wifi	Exposes WiFi device and station statistics.	Linux
zoneinfo	Exposes NUMA memory zone metrics.	Linux

Textfile Collector

The textfile collector is similar to the Pushgateway, in that it allows exporting of statistics from batch jobs. It can also be used to export static metrics, such as what role a machine has. The Pushgateway should be used for service-level metrics. The textfile module is for metrics that are tied to a machine.

To use it, set the --collector.textfile.directory flag on the node_exporter commandline. The collector will parse all files in that directory matching the glob *.prom using the text format. Note: Timestamps are not supported.

To atomically push completion time for a cron job:

echo my_batch_job_completion_time $(date +%s) > /path/to/directory/my_batch_job.prom.$$
mv /path/to/directory/my_batch_job.prom.$$ /path/to/directory/my_batch_job.prom

To statically set roles for a machine using labels:

echo 'role{role="application_server"} 1' > /path/to/directory/role.prom.$$
mv /path/to/directory/role.prom.$$ /path/to/directory/role.prom

Filtering enabled collectors

The node_exporter will expose all metrics from enabled collectors by default. This is the recommended way to collect metrics to avoid errors when comparing metrics of different families.

For advanced use the node_exporter can be passed an optional list of collectors to filter metrics. The collect[] parameter may be used multiple times. In Prometheus configuration you can use this syntax under the scrape config.

  params:
    collect[]:
      - foo
      - bar

This can be useful for having different Prometheus servers collect specific metrics from nodes.

Development building and running

Prerequisites:

Go compiler
RHEL/CentOS: glibc-static package.

Building:

git clone https://github.com/prometheus/node_exporter.git
cd node_exporter
make
./node_exporter <flags>

To see all available configuration flags:

./node_exporter -h

Running tests

make test

TLS endpoint

** EXPERIMENTAL **

The exporter supports TLS via a new web configuration file.

./node_exporter --web.config=web-config.yml

See the exporter-toolkit https package for more details.