* Refactor netclass_rtnl collector
Merge the netclass_rtnl collector into the netclass collector.
* Disabled by default
* Followup to #2492
Signed-off-by: Ben Kochie <superq@gmail.com>
We don't need to fully sanitize the hwmon label values to metric/label
name strings.
* Just make sure they're valid UTF-8.
* Always included the label metric to avoid group_left failures.
Signed-off-by: Ben Kochie <superq@gmail.com>
Signed-off-by: Ben Kochie <superq@gmail.com>
The textfile collector will now provide a unified metric description
(that will look like "Metric read from file/a.prom, file/b.prom")
for metrics collected accross several text-files that don't already
have a description.
Also change the error handling in the textfile collector tests to
ContinueOnError to better mirror the real-life use-case.
Signed-off-by: Guillaume Espanel <guillaume.espanel.ext@ovhcloud.com>
Signed-off-by: Guillaume Espanel <guillaume.espanel.ext@ovhcloud.com>
Since netdev metrics are now read from netlink instead of `/proc/net/dev`, we
can't easily spoof them for the end-to-end tests by reading a fixture file in
place of `/proc/net/dev`.
Therefore, we only get metrics for `lo` and ignore those that would return
unpredictable values (i.e. the byte and packet counters).
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
Set the `--path.udev.data` flag to point to the udev fixture, and update the
output fixture with
```console
$ ./end-to-end-test.sh -u
```
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
Now that we read some data from `/run/udev/data`, add the corresponding
fixtures and update the expected test results accordingly.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
Fix up handling of CPU info collector on non-x86_64 systems due to
fixtures containing `/proc/cpuinfo` from x86_64.
* Update e2e 64k page test fixture from an arm64 system.
* Enable ARM testing in CircleCI.
Fixes: https://github.com/prometheus/node_exporter/issues/1959
Signed-off-by: Ben Kochie <superq@gmail.com>
* Correctly name collector file.
* Fix cgroup summary type as gauge.
* Use a boolean metric rather than a label for enabled.
Signed-off-by: Ben Kochie <superq@gmail.com>
Allow filtering APR entries based on device. Useful for ignoring
entries for network namespaces (containers).
Signed-off-by: Ben Kochie <superq@gmail.com>
This adds a new Linux metric, node_softirqs_total, which corresponds
to the 'softirq' line in /proc/stat. This metric is disabled by
default and it can be enabled with '--collector.stat.softirq'.
Signed-off-by: Jacob Vosmaer <jacob@gitlab.com>
TCP timeouts count is a useful signal to show
abnormal network performance and is another
signal to aid debugging. This metric can be
used to generate proactive alerts for host
network namespace workloads.
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Sanitizing the metric names can lead to duplicate metric names:
```
caller=level.go:63 level=error caller="error gathering metrics: [from Gatherer #2] collected metric \"node_ethtool_giant_hdr\" { label:<name:\"device\" value:\"ens192\" > untyped:<value:0" msg=" > } was collected before with the same name and label values"
```
Generate a map from the sanitized metric names to the metric names from
ethtool. In case of duplicate sanitized metric names drop both metrics,
because it is unknown which one to take.
Fixes: https://github.com/prometheus/node_exporter/issues/2185
Signed-off-by: Benjamin Drung <benjamin.drung@ionos.com>
Add a DMI collector to expose the Desktop Management Interface (DMI)
info from `/sys/class/dmi/id/`. This will expose information about the
BIOS, mainboard, chassis, and product.
Closes: https://github.com/prometheus/node_exporter/issues/303
Signed-off-by: Benjamin Drung <benjamin.drung@ionos.com>
The ethtool_cmd struct from the linux kernel contains information about the speeds and features supported by a
network device. This includes speeds and duplex but also features like autonegotiate and 802.3x pause frames.
Closes#1444
Signed-off-by: W. Andrew Denton <git@flying-snail.net>
* collector: Unwrap glob textfile directories
* collector: Store full path in mtime's file label
The point is to avoid duplicated gauges from files with the same name in
different directories.
This introduces support for exporting from multiple directories matching
given pattern (e.g. `/home/*/metrics/`).
Signed-off-by: Kiril Vladimirov <kiril@vladimiroff.org>
* Refactor diskstats_linux to use procfs.
* Add `node_disk_info` metric.
Signed-off-by: W. Andrew Denton <git@flying-snail.net>
Co-authored-by: W. Andrew Denton <git@flying-snail.net>
Currently Node Exporter has a metric called `node_uname_info` which of
course exposes uname info. While this is nice, it does not help if you
are running different OSes which could have similar uname info.
Therefore parse `/etc/os-release` or `/usr/lib/os-release` and expose a
`node_os_info` metric which provide information regarding the OS
release/version of the node. Also expose the major.minor part of the OS
release version as `node_os_version`.
Since the os-release files will not change often, cache the parsed
content and only refresh the cache if the modification time changes.
This `os` collector will read files outside of `/proc` and `/sys`, but
the os-release file is widely used and the format is standardized:
https://www.freedesktop.org/software/systemd/man/os-release.html
Bug: https://github.com/prometheus/node_exporter/issues/1574
Signed-off-by: Benjamin Drung <benjamin.drung@ionos.com>
Add a `node_ethtool_info` metric to all ethtool devices to expose driver
information with following labels:
* bus_info
* driver
* expansion_rom_version
* firmware_version
* version
This metric is useful to monitor the firmware version to be up-to-date.
Note: The version label might be malformed due to bug #39 in ethtool:
https://github.com/safchain/ethtool/issues/39
Signed-off-by: Benjamin Drung <benjamin.drung@ionos.com>
Add a collector for NVMes to expose the firmware versions. This requires
procfs >= 0.7.0.
Fixes#1891
Signed-off-by: Benjamin Drung <benjamin.drung@ionos.com>
Some devices (ex virtual) don't have a speed and report `-1` as the
speed value. Add a flag to allow ignoring speed on these devices.
Fixes: https://github.com/prometheus/node_exporter/issues/1967
Signed-off-by: Ben Kochie <superq@gmail.com>
Avoid panic on invalid UTF-8 from /sys/class/power_supply by
sanitizing strings parsed from the kernel.
* Add a broken string to the test fixtures.
Fixes: https://github.com/prometheus/node_exporter/issues/1979
Signed-off-by: Ben Kochie <superq@gmail.com>
* Update Build
- Update CircleCI orb.
- Update CIrcleCI Machine image.
- Use golang-builder 1.15.
* Update Go modules.
* Fixup fixtures for XFS bug.
NOTE: We have improved some of the flag naming conventions (PR #1743). The old names are
deprecated and will be removed in 2.0. They will continue to work for backwards
compatibility.
* [CHANGE] Improve filter flag names #1743
* [CHANGE] Add btrfs and powersupplyclass to list of exporters enabled by default #1897
* [FEATURE] Add fibre channel collector #1786
* [FEATURE] Expose cpu bugs and flags as info metrics. #1788
* [FEATURE] Add network_route collector #1811
* [FEATURE] Add zoneinfo collector #1922
* [ENHANCEMENT] Add more InfiniBand counters #1694
* [ENHANCEMENT] Add flag to aggr ipvs metrics to avoid high cardinality metrics #1709
* [ENHANCEMENT] Adding backlog/current queue length to qdisc collector #1732
* [ENHANCEMENT] Include TCP OutRsts in netstat metrics #1733
* [ENHANCEMENT] Add pool size to entropy collector #1753
* [ENHANCEMENT] Remove CGO dependencies for OpenBSD amd64 #1774
* [ENHANCEMENT] bcache: add writeback_rate_debug stats #1658
* [ENHANCEMENT] Add check state for mdadm arrays via node_md_state metric #1810
* [ENHANCEMENT] Expose XFS inode statistics #1870
* [ENHANCEMENT] Expose zfs zpool state #1878
* [ENHANCEMENT] Added an ability to pass collector.supervisord.url via SUPERVISORD_URL environment variable #1947
* [BUGFIX] filesystem_freebsd: Fix label values #1728
* [BUGFIX] Fix various procfs parsing errors #1735
* [BUGFIX] Handle no data from powersupplyclass #1747
* [BUGFIX] udp_queues_linux.go: change upd to udp in two error strings #1769
* [BUGFIX] Fix node_scrape_collector_success behaviour #1816
* [BUGFIX] Fix NodeRAIDDegraded to not use a string rule expressions #1827
* [BUGFIX] Fix node_md_disks state label from fail to failed #1862
* [BUGFIX] Handle EPERM for syscall in timex collector #1938
* [BUGFIX] bcache: fix typo in a metric name #1943
* [BUGFIX] Fix XFS read/write stats (https://github.com/prometheus/procfs/pull/343)
Signed-off-by: Ben Kochie <superq@gmail.com>