The format of /proc/diskstats is changing in linux-4.19 to include some
additional fields. See: https://www.kernel.org/doc/Documentation/iostats.txt
* collector/diskstats: use constants for some hard coded strings
* collector/diskstats: update diskstats for linux-4.19
* collector/diskstats: remove kernel doc url from individual metrics
Signed-off-by: Paul Gier <pgier@redhat.com>
* State that wifi collector is disabled by default
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Add the 'processes' collector to the Readme
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
LaunchDaemons are the correct way to create services that are restart proof.
There is now only a single destination place mentioned in the readme for the plist file.
Signed-off-by: Dávid Balakirev <dave00ster@gmail.com>
This is mostly required to fix a bug with histograms on 32bit platforms.
(Which might or might not be used in node_exporter. Just in case...)
Signed-off-by: beorn7 <beorn@soundcloud.com>
* infiniband: Add not connected i40iw0/ports/1 fixtures
* infiniband: Handle issue when iWARP* RDMA modules are not available
This is related to #966, and handle this error,
Jun 07 13:33:24 hostname node_exporter[81888]: time="2018-06-07T13:33:24+02:00" level=error msg="ERROR: infiniband
collector failed after 0.000929s: strconv.ParseUint: parsing \"N/A (no PMA)\": invalid syntax" source="collector.go:132"
Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>
* strip rootfs prefix for run in docker
* Use `/` as default value of path.rootfs, and parse mounts from `/proc/1/mounts`.
* No need to mount `/proc` and `/sys` because we share host's PID
namespace, which allows processes within the container to see all of the
processes on the system.
Closes: #66
Signed-off-by: Ivan Mikheykin <ivan.mikheykin@flant.com>
Signed-off-by: Yecheng Fu <cofyc.jackson@gmail.com>
Starting with (not yet released) OpenBSD 6.4, sysctl KERN_CPTIME2 will
return ENODEV for offline CPUs.
SMT siblings are reported as offline when hw.smt is disabled, which is
the default since one of the later Spectre variants. So this might
affect a few systems.
For more details see:
https://cvsweb.openbsd.org/src/sys/kern/kern_sysctl.c#rev1.348
Signed-off-by: Ralf Horstmann <ralf+github@ackstorm.de>
When starting Docker containers a whole bunch of netns (network
namespace) mounts are created that the node exporter can't make any
sense of (and can't read either).
This ignores all nsfs filesystems.
Fixes#875
Signed-off-by: Daniele Sluijters <daenney@users.noreply.github.com>
* Update build
* Only use CGO when building non-Linux.
* Update build to Go 1.11
* Use tab indenting consistently.
Signed-off-by: Ben Kochie <superq@gmail.com>
* Change systemd unit filtering
Get all units from systemd and filter in Go.
* Improves compatibility with older versions of systemd.
* Improve debugging by printing when units pass the filter.
* Remove extraneous newlines from log messages.
Signed-off-by: Ben Kochie <superq@gmail.com>
This removes the cgo import from timex collector, as it was only used
to define two constants. Those are part of the Linux kernel<->userspace
interface, thus there is no need to depend on libc to source them:
https://github.com/torvalds/linux/blob/v4.18/include/uapi/linux/timex.h
Signed-off-by: Luca Bruno <luca.bruno@coreos.com>
* textfile smartmon.sh
Added functions to also parse megaraid disks.
Added parsing to also detect the grown_defects counters.
* textfile storcli.py
Reworked the example file to export lots more information about
megaraid attached controllers, VDs and PDs.
Signed-off-by: Christopher Blum <christopher.blum@profitbricks.com>
* Correctly cast Darwin memory info
* Cast stats to float64 before doing math on them to avoid integer
wrapping.
* Remove invalid `_total` suffix from gauge values.
* Handle counters in `meminfo.go`.
Signed-off-by: Ben Kochie <superq@gmail.com>
Fix typo on unit description of metric `*read_time_seconds_total` from milliseconds to seconds.
Signed-off-by: Marco Tulio R Braga <marco.tulio@mtulio.eng.br>
Add metrics that expose more information about MD RAID devices and
disks:
- the RAID level in use
- the RAID set that a disk belongs to
This allows for things like alert on unusually high I/O
utilisation for a disk compared to other disks in the same RAID set,
which usually means the disk is failing, and for comparing
write/read latency across RAID sets.
Output looks like:
node_md_disk_info{disk_device="/dev/dm-0", md_device="md1", md_set="A"} 1
node_md_disk_info{disk_device="/dev/dm-3", md_device="md1", md_set="B"} 1
node_md_disk_info{disk_device="/dev/dm-2", md_device="md1", md_set="A"} 1
node_md_disk_info{disk_device="/dev/dm-1", md_device="md1", md_set="B"} 1
node_md_disk_info{disk_device="/dev/dm-4", md_device="md1", md_set="A"} 1
node_md_disk_info{disk_device="/dev/dm-5", md_device="md1", md_set="B"} 1
node_md_info{md_device="md1", md_name="foo", raid_level="10", md_metadata_version="1.2"} 1
The `node_md_info` metric, which gives additional information about the
RAID array, is intentionally separate to avoid adding all of those
labels to each disk. If you need to query using the labels contained in
`node_md_info`, you can do that using PromQL:
https://www.robustperception.io/how-to-have-labels-for-machine-roles/
I looked at adding the array UUID, but there's no sysfs entry for it and
I'm not sure there's a strong use case for it.
This patch to add a sysfs entry for the UUID was apparently not
accepted:
https://www.spinics.net/lists/raid/msg40667.html
Add these metrics as a textfile script rather than adding them to the Go
'md' module as they're perhaps less commonly useful. If lots of people
find them useful, we can later rewrite this in Go.
Signed-off-by: Matt Bostock <mbostock@cloudflare.com>
* If NRestarts or NRefused are not available, don't ignore the unit itself
* Don't report systemd metrics (NRestarts/NRefused) that are not available
Signed-off-by: James Hartig <james@getadmiral.com>
PIDs can vanish (exit) from /proc/ between gathering the list of PIDs
and getting all of their stats.
* Ignore file not found errors.
* Explicitly count the PIDs we find.
* Cleanup some error style issues.
Signed-off-by: Ben Kochie <superq@gmail.com>
* Replace supervisord xmlrpc library
* Use `github.com/mattn/go-xmlrpc` that doesn't leak goroutines.
* Fix uptime metric
* Use Prometheus best practices for uptime metric.
* Use "start time" rather than "uptime".
* Don't emit a start time if the process is down.
* Add changelog entry.
* Add example compatibility rules.
Signed-off-by: Ben Kochie <superq@gmail.com>
* vendor: Update prometheus/procfs
Signed-off-by: Hannes Körber <hannes.koerber@haktec.de>
* mountstats: Use new NFS protocol field
In https://github.com/prometheus/procfs/pull/100, the NFSTransportStats
struct was expanded by a field called protocol that specifies the NFS
protocol in use, either "tcp" or "udp". This commit adds the protocol as
a label to all NFS metrics exported via the mountstats collector.
Signed-off-by: Hannes Körber <hannes.koerber@haktec.de>
* Update fixtures for UDP mount
Signed-off-by: Hannes Körber <hannes.koerber@haktec.de>
It is quite common to put /var/lib/docker itself on a separate partition
and that should be monitored as well.
Signed-off-by: Johannes Wienke <languitar@semipol.de>
While the statfs(2) approach is reliable for normally mounted filesystems, the
flags returned can be inconsistent when filesystem has been remounted read-only
after encountering an error. The returned flags do accurately represent the
internal state of the filesystem, but they do not reflect whether the VFS layer
will accept writes. Instead, it makes sense to parse the current VFS mount
state from the options field in /proc/mounts since it takes precedence.
Signed-off-by: Brandon Gilmore <bgilmore@valvesoftware.com>