Allow filtering APR entries based on device. Useful for ignoring
entries for network namespaces (containers).
Signed-off-by: Ben Kochie <superq@gmail.com>
This adds a new Linux metric, node_softirqs_total, which corresponds
to the 'softirq' line in /proc/stat. This metric is disabled by
default and it can be enabled with '--collector.stat.softirq'.
Signed-off-by: Jacob Vosmaer <jacob@gitlab.com>
TCP timeouts count is a useful signal to show
abnormal network performance and is another
signal to aid debugging. This metric can be
used to generate proactive alerts for host
network namespace workloads.
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
The new `lnstat` collector produces a high number of metrics, per-cpu,
and results in approximately double the number of metrics previously
scraped. For example, a typical server with 64 cores produces 3832
lnstat metrics compared to 4147 metrics for the remaining collectors.
Therefore disable the `lnstat` collector by default.
Signed-off-by: Benjamin Drung <benjamin.drung@ionos.com>
Sanitizing the metric names can lead to duplicate metric names:
```
caller=level.go:63 level=error caller="error gathering metrics: [from Gatherer #2] collected metric \"node_ethtool_giant_hdr\" { label:<name:\"device\" value:\"ens192\" > untyped:<value:0" msg=" > } was collected before with the same name and label values"
```
Generate a map from the sanitized metric names to the metric names from
ethtool. In case of duplicate sanitized metric names drop both metrics,
because it is unknown which one to take.
Fixes: https://github.com/prometheus/node_exporter/issues/2185
Signed-off-by: Benjamin Drung <benjamin.drung@ionos.com>
Use SysctlTimeval from the golang.org/x/sys/unix package to
simplify the implementation of the boottime collector for the BSDs and
allows to build it without cgo.
Tested on macOS 11.6, FreeBSD 13 and OpenBSD 7.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Add a DMI collector to expose the Desktop Management Interface (DMI)
info from `/sys/class/dmi/id/`. This will expose information about the
BIOS, mainboard, chassis, and product.
Closes: https://github.com/prometheus/node_exporter/issues/303
Signed-off-by: Benjamin Drung <benjamin.drung@ionos.com>
Use `time.NewTimer()` and explicit `Stop()` to avoid memory bloat / GC problems with `time.After()` in the Linux filesystem collector timeout handling.
Signed-off-by: bawenmao <bawenmao@sogou-inc.com>
The ethtool_cmd struct from the linux kernel contains information about the speeds and features supported by a
network device. This includes speeds and duplex but also features like autonegotiate and 802.3x pause frames.
Closes#1444
Signed-off-by: W. Andrew Denton <git@flying-snail.net>
* collector: Unwrap glob textfile directories
* collector: Store full path in mtime's file label
The point is to avoid duplicated gauges from files with the same name in
different directories.
This introduces support for exporting from multiple directories matching
given pattern (e.g. `/home/*/metrics/`).
Signed-off-by: Kiril Vladimirov <kiril@vladimiroff.org>
Expose GPU metrics using `sysfs/drm`.
`amdgpu` is the only driver which exposes this information through DRM.
Signed-off-by: Siavash Safi <siavash.safi@gmail.com>
Use the same flag pattern as netdev to make filtering methods the same.
* Move SanitizeMetricName to helper.go
Signed-off-by: Ben Kochie <superq@gmail.com>
* Refactor diskstats_linux to use procfs.
* Add `node_disk_info` metric.
Signed-off-by: W. Andrew Denton <git@flying-snail.net>
Co-authored-by: W. Andrew Denton <git@flying-snail.net>
Currently Node Exporter has a metric called `node_uname_info` which of
course exposes uname info. While this is nice, it does not help if you
are running different OSes which could have similar uname info.
Therefore parse `/etc/os-release` or `/usr/lib/os-release` and expose a
`node_os_info` metric which provide information regarding the OS
release/version of the node. Also expose the major.minor part of the OS
release version as `node_os_version`.
Since the os-release files will not change often, cache the parsed
content and only refresh the cache if the modification time changes.
This `os` collector will read files outside of `/proc` and `/sys`, but
the os-release file is widely used and the format is standardized:
https://www.freedesktop.org/software/systemd/man/os-release.html
Bug: https://github.com/prometheus/node_exporter/issues/1574
Signed-off-by: Benjamin Drung <benjamin.drung@ionos.com>
In high scale virtualized / cloud environments there are typically
no guest VMs. Add a boolean flag to allow disabling the Linux guest
CPU metrics.
Signed-off-by: Ben Kochie <superq@gmail.com>
Add a `node_ethtool_info` metric to all ethtool devices to expose driver
information with following labels:
* bus_info
* driver
* expansion_rom_version
* firmware_version
* version
This metric is useful to monitor the firmware version to be up-to-date.
Note: The version label might be malformed due to bug #39 in ethtool:
https://github.com/safchain/ethtool/issues/39
Signed-off-by: Benjamin Drung <benjamin.drung@ionos.com>
OpenMetrics and the Prometheus exposition format require the metric name
to consist only of alphanumericals and "_", ":" and they must not start
with digits. The metric names from the ethtool stats might contain
spaces, brackets, and dots. Converting them directly to metric names
will produce invalid metric names.
Therefore sanitize the metric names and convert them to lower case.
Fixes: https://github.com/prometheus/node_exporter/issues/2083
Signed-off-by: Benjamin Drung <benjamin.drung@ionos.com>
This adds a new flag --collector.ethtool.metrics-include to the ethtool
collector. Only metrics matching this regexp will be collected.
Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>
Other network related collectors allow to filter out unwanted devices.
Add this support to the new ethtool collector as well.
Signed-off-by: Benjamin Drung <benjamin.drung@ionos.com>
Update procfs library to include ignored fields ParseInt handling.
Wrap error returns so that the user can know more about what failed.
Returns from getAllocatedThreads() are errors anyway.
Fixes: https://github.com/prometheus/node_exporter/issues/2110
Signed-off-by: Ben Kochie <superq@gmail.com>
The Linux CPU idle stat can also jump backwards slightly in some cases.
Allow the jump back up to 3 seconds before we attempt to reset the CPU
counter cache.
Fixes: https://github.com/prometheus/node_exporter/issues/1903
Signed-off-by: Ben Kochie <superq@gmail.com>