node_exporter

Commit Graph

Author	SHA1	Message	Date
Adrian Berger	cc49133321	Add multi-cluster support for Nodes dashboard (#2945 ) Signed-off-by: Adrian Berger <adria.berger94@gmail.com>	2024-03-08 14:41:36 +01:00
Taylor Sly	9f9473859b	Fix description for NodeDiskIOSaturation alert (#2929 ) NodeDiskIOSaturation description should say 30m per the "for" clause Signed-off-by: Taylor Sly <slyt@users.noreply.github.com>	2024-02-16 08:58:22 +01:00
Anton Lugovoi	81fc05c45f	Make filesystem space prediction window configurable (#2844 ) Signed-off-by: fitz123 <alugovoi@ordercapital.com>	2023-11-13 02:10:56 +01:00
Ayoub NASR	7333465abf	Add NodeBondingDegraded alert (#2843 ) Signed-off-by: Ayoub Nasr <ayoub.nasr@scality.com>	2023-11-13 00:36:30 +01:00
Vitaly Zhuravlev	e8d7f4e8b3	Revert alerts pending durtions Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>	2023-06-29 23:26:52 +08:00
Vitaly	3e250a95a0	Update NodeSystemSaturation severity Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>	2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev	b7dfb32bfc	Set severity to NodeCPUHighUsage to info Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>	2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev	6bdc1d9c98	Add thresholds for memory, disk and system alerts Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>	2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev	77ae769179	Add thresholds for memory alerts Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>	2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev	2111e70ac7	Add comma after 'mounted on' Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>	2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev	e48e7909f4	Extend alert description Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>	2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev	da32f8de17	Decrease NodeSystemdServiceFailed severity to warning Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>	2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev	580c497261	Add NodeSystemSaturation and NodeMemoryMajorPagesFaults Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>	2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev	e15e7d6a7b	Fix NodeMemoryHighUtilization alert Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>	2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev	c3ec6e8af1	Add diskDevice selector Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>	2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev	962de6c921	Add %(nodeExporterSelector)s to Network and conntrack alerts Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>	2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev	94fc82e418	Add NodeDiskIOSaturation alert Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>	2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev	614030bb80	Set 'at' everywhere as preposition for instance Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>	2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev	3d8075da7d	Decrease NodeNetwork*Errs pending period Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>	2023-06-29 23:26:51 +08:00
Vitaly Zhuravlev	74794182a7	Add failed systemd service alert Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>	2023-06-29 23:26:51 +08:00
Vitaly Zhuravlev	fd2d62af63	Add CPU and memory alerts Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>	2023-06-29 23:26:51 +08:00
Vitaly Zhuravlev	0e0399d41e	Decrease NodeFilesystem pending time to 15m 30m is too long and there is a risk of running out of disk space/inodes completely if something is filling up disk very fast (like log file). Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>	2023-06-29 23:26:51 +08:00
Vitaly Zhuravlev	fc967aa992	Add mountpoint to NodeFilesystem alerts This helps to identify alerting filesystem. Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>	2023-06-29 23:26:51 +08:00
Will Bollock	0a17e17718	docs (node/mixin): fix annotation for Skew alert (#2671 ) This updates the annotation for the NodeClockSkewDetected mixin alert to match the new threshold set. Original discussion was in this PR: https://github.com/prometheus/node_exporter/pull/1480 I spent an embarrassingly large amount of time trying to figure out how the heck that alert would mean 300s of clock skew. Turns out the annotation was just left the same after the threshold change. Signed-off-by: Will Bollock <wbollock@linode.com>	2023-05-11 10:33:10 +02:00
Ryan J. Geyer	5e552bac02	Replace mistaken ) with }, resulting in parsable promql Signed-off-by: Ryan J. Geyer <me@ryangeyer.com>	2022-12-13 13:30:42 +01:00
Jan Fajerski	87b8e3790d	docs/node-mixin: add fsMointpointSelector to alerts and dashboards (#2446 ) * docs/node-mixin: add fsMountpointSelector This adds the option to add a `mountpoint` selector to filesystem related alerts. The default is `mountpoint!=""`. * docs/node-mixins: add fsMountpointSelector to dashboards Signed-off-by: Jan Fajerski <jfajersk@redhat.com>	2022-10-20 13:06:31 +02:00
Vitaly Zhuravlev	7519830a8a	Change io time units to %util When appying rate() to seconds we have 'seconds per second' or fractions of the second, so actually it actually can be from 0 to 1. Also update intervalFactor to 1 for better rates. Signed-off-by: Vitaly Zhuravlev <zhuravlev.vitaly@gmail.com>	2022-07-26 11:09:43 +02:00
Vitaly Zhuravlev	469600f4bf	Update units of network ad disk graphs https://prometheus.io/docs/prometheus/latest/querying/functions/#rate rate() calculates per-second average rate, therefore Bps units should be used for disks. In networking bandwidth throughput is usually measured in bits/s so units are changed accordingly. Signed-off-by: Vitaly Zhuravlev <zhuravlev.vitaly@gmail.com>	2022-07-26 11:09:43 +02:00
Paweł Krupa (paulfantom)	8571536327	docs/node-mixin: add missing selectors Signed-off-by: Paweł Krupa (paulfantom) <pawel@krupa.net.pl>	2022-07-19 16:44:16 +02:00
Sven Kieske	d64766f43d	fix the following markdownlint issues (#2362 ) fix the following markdownlint errors (and some more): [..]mixins/node-exporter/README.md:13: MD031 Fenced code blocks should be surrounded by blank lines [..]mixins/node-exporter/README.md:21: MD031 Fenced code blocks should be surrounded by blank lines [..]mixins/node-exporter/README.md:27: MD031 Fenced code blocks should be surrounded by blank lines [..]mixins/node-exporter/README.md:33: MD031 Fenced code blocks should be surrounded by blank lines [..]mixins/node-exporter/README.md:41: MD034 Bare URL used A detailed description of the rules is available at https://github.com/markdownlint/markdownlint/blob/master/docs/RULES.md Signed-off-by: Sven Kieske <s.kieske@mittwald.de>	2022-06-28 05:50:06 +02:00
Björn Rabenstein	e5128e83f2	Merge pull request #2364 from grafana/vzhuravlev/fs_table mixin: Change disk graph to disk table	2022-06-08 20:46:47 +02:00
Jan Fajerski	cec414df78	node-mixins/config: Switch fsAvailable warning and critical thresholds Problem: In `0b50eb7294` the usage of the threshold variables was adjusted. The values had been switched as well resulting in reversed thresholds after the commit above. Warnings now have a smaller threshold than critical alerts. Solution: Adjust thresholds to reflect that warnings should be alerted on before critical alerts. Issues: https://github.com/prometheus/node_exporter/pull/2352 Signed-off-by: Jan Fajerski <jfajersk@redhat.com>	2022-06-07 12:10:48 +02:00
Björn Rabenstein	b5a2ad46e3	Merge pull request #2351 from grafana/vzhuravlev/macos Add darwin dashboard	2022-05-03 12:59:29 +02:00
Vitaly Zhuravlev	eef827006a	Change disk graph to disk table Signed-off-by: Vitaly Zhuravlev <zhuravlev.vitaly@gmail.com>	2022-04-27 19:15:50 +04:00
Daniel Lenar	0b50eb7294	Reverse fsSpaceAvailableCriticalThreshold and fsSpaceAvailableWarningThreshold Currently critical alert for space available alerts on warning and warning alert for space available alerts on critical. Signed-off-by: Daniel Lenar <dlenar@vailsys.com>	2022-04-21 11:34:54 -05:00
Gabriel Amaral Antunes	410e069471	Add darwin dashboard to mixin Signed-off-by: Vitaly Zhuravlev <zhuravlev.vitaly@gmail.com>	2022-04-20 15:18:43 +04:00
Vitaly Zhuravlev	8823605f12	Fix NodeFileDescriptorLimit alerts Signed-off-by: Vitaly Zhuravlev <zhuravlev.vitaly@gmail.com>	2022-04-07 16:25:17 +04:00
Severyn Lisovskyi	7b86b7cb29	[node-mixin] change current datasource to grafana's default Signed-off-by: Severyn Lisovskyi <993215+sev3ryn@users.noreply.github.com>	2022-02-02 14:45:26 +01:00
Julian Wiedmann	3e6f4ce627	mixin: exclude iowait and steal from CPU Utilisation (#2194 ) 'iowait' and 'steal' indicate specific idle/wait states, which shouldn't be counted into CPU Utilisation. Also see https://github.com/prometheus-operator/kube-prometheus/pull/796 and https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/667. Per the iostat man page: %idle Show the percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request. %iowait Show the percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request. %steal Show the percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>	2021-11-04 11:03:27 +01:00
Ben Kochie	421fc429f3	Replace deprecated linter (#2176 ) Upstream is replacing `golint` with `revive`. * Cleanup unused mixin go files. Signed-off-by: Ben Kochie <superq@gmail.com>	2021-10-27 11:01:15 +02:00
ngc104	4bc1c02000	fix bug in #2130 (#2170 ) Signed-off-by: Yves Mettier <yves.mettier@orange.com> Co-authored-by: Yves Mettier <yves.mettier@orange.com>	2021-10-21 12:07:38 +02:00
Tom Wilkie	9bc184d236	Datasource template variable should be labelled 'Data Source' Signed-off-by: Tom Wilkie <tom@grafana.com>	2021-10-20 17:10:14 +01:00
Ben Kochie	5a38949451	Fix up mixin tests (#2167 ) Use new Go install format, cleanup working dir setup. Signed-off-by: Ben Kochie <superq@gmail.com>	2021-10-14 11:06:01 +02:00
Julien Pivotto	68a6c78c0d	Update go to 1.17 (#2159 ) Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2021-10-03 13:35:24 +02:00
Ben Kochie	aeef1edd62	mixin: Add fallback for MemAvailable (#2130 ) Add a fallback to Buffers+Cached+MemFree+Slab for older Linux kernels where the MemAvailable metric is not available for memory utilization. Signed-off-by: Ben Kochie <superq@gmail.com>	2021-09-28 10:22:06 +02:00
Johannes 'fish' Ziemke	6f1286b314	mixin: Drop mode label for num cpu metric Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>	2021-09-03 12:13:35 +02:00
Johannes 'fish' Ziemke	fa9926c4eb	mixin: Cheaper calculation for instance:node_num_cpu:sum Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>	2021-09-03 11:34:25 +02:00
paulfantom	832909dd25	docs/node-mixin/alerts: make NodeFilesystemAlmostOutOfSpace fire earlier Signed-off-by: paulfantom <pawel@krupa.net.pl>	2021-08-16 16:35:58 +02:00
Johannes 'fish' Ziemke	7fc5c6045a	Read config from $ Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>	2021-07-27 16:32:05 +02:00
ArthurSens	3731f93fd7	Refactor USE method mixin dashboards with grafonnet-lib, add multi-cluster support. Aiming for cleaner code and following standards used on younger mixins. Signed-off-by: ArthurSens <arthursens2005@gmail.com>	2021-07-27 16:32:05 +02:00

1 2 3

114 Commits (641cf2c6b13adb8ae13b353cbdb76d3332ba8a1d)