node_exporter

Commit Graph

Author	SHA1	Message	Date
Karsten Weiss	a8d7d1101a	cpu: Support processor-less (memory-only) NUMA nodes (#734 ) * cpu: Support processor-less (memory-only) NUMA nodes Processor-less (memory-only) NUMA nodes exist e.g. in systems that use Intel Optane drives for RAM expansion using Intel Memory Drive Technology (IMDT). IMDT RAM expansion supports two modes: * "Unify Remote Memory domains": present a processor-less (memory-only) NUMA domain, which is the default * "Expand local memory domains": to expand each processor’s memory domain with a portion of the memory made available by Optane and IMDT This commit fixes a crash in the first case (when "cpulist" is empty). Here's an example of such a system: $ numastat -m\|head -n5 Per-node system memory usage (in MBs): Node 0 Node 1 Node 2 Total --------------- --------------- --------------- --------------- MemTotal 118239.56 130816.00 464384.00 713439.56 $ for i in {0..2}; do echo -n "$i: " ; cat /sys/bus/node/devices/node$i/cpulist ; done 0: 0-7,16-23 1: 8-15,24-31 2: $ /opt/vsmp/bin/vsmpversion -vvv Memory Drive Technology: 8.2.1455.74 (Sep 28 2017 13:09:59) System configuration: Boards: 3 1 x Proc. + I/O + Memory 2 x NVM devices (Intel SSDPED1K375GAQ) Processors: 2, Cores: 16, Threads: 32 Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz Stepping 01 Memory (MB): 713472 (of 977450), Cache: 251416, Private: 12562 1 x 249088MB [262036/ 678/12270] 1 x 232192MB [357707/125369/ 146] 82:00.0#1 1 x 232192MB [357707/125369/ 146] 83:00.0#1 * cpu: rename some variables (pkg => node) * cpu: Use %v not %q in log.Debugf() format strings	7 years ago
Matt Layher	f6f9c8d6cc	Add and use sysReadFile in hwmon collector (#728 )	7 years ago
Tobias Klauser	d73f1e60c4	Simplify Utsname string conversion (#716 ) * Update golang.org/x/sys/unix This allows to use simplified string conversion of Utsname members. * Simplify Utsname string conversion Use Utsname from golang.org/x/sys/unix which contains byte array instead of int8/uint8 array members. This allows to simplify the string conversions of these members.	7 years ago
Ben Kochie	ea250d73f4	Fix off by one in Linux interrupts collector (#721 ) * Fix off by one in Linux interrupts collector * Fix off by one in CPU column handler. * Add test. * Enable interrupts in end-to-end test.	7 years ago
Matt Layher	296b62acb7	netstat: return nothing when /proc/net/snmp6 not found	7 years ago
Derek Marcotte	0eecaa9547	Correct buffer_bytes > INT_MAX on BSD/amd64. (#712 ) * Correct buffer_bytes > INT_MAX on BSD/amd64. The sysctl vfs.bufspace returns either an int or a long, depending on the value. Large values of vfs.bufspace will result in error messages like: couldn't get meminfo: cannot allocate memory This will detect the returned data type, and cast appropriately. * Added explicit length checks per feedback. * Flatten Value() to make it easier to read. * Simplify per feedback. * Fix style. * Doc updates.	7 years ago
Matt Layher	f9ad88fc03	xfs: expose correct fields, fix metric names	7 years ago
Siavash Safi	f3a7022602	Add `collect[]` parameter (#699 ) * Add `collect[]` parameter * Add TODo comment about staticcheck ignored * Restore promhttp.HandlerOpts * Log a warning and return HTTP error instead of failing * Check collector existence and status, cleanups * Fix warnings and error messages * Don't panic, return error if collector registration failed * Update README	7 years ago
Ben Kochie	deadfef4c9	Update vendoring (#685 ) * Update vendor github.com/coreos/go-systemd/dbus@v15 * Update vendor github.com/ema/qdisc * Update vendor github.com/godbus/dbus * Update vendor github.com/golang/protobuf/proto * Update vendor github.com/lufia/iostat * Update vendor github.com/matttproud/golang_protobuf_extensions/pbutil@v1.0.0 * Update vendor github.com/prometheus/client_golang/... * Update vendor github.com/prometheus/common/... * Update vendor github.com/prometheus/procfs/... * Update vendor github.com/sirupsen/logrus@v1.0.3 Adds vendor golang.org/x/crypto * Update vendor golang.org/x/net/... * Update vendor golang.org/x/sys/... * Update end to end output.	7 years ago
Brett Vickers	b62c7bc0ad	Updated vendored ntp package (#681 ) The github.com/beevik/ntp package was recently updated with some API changes that broke node_exporter. This commit fetches the latest version of the ntp package and brings node_exporter in line with the latest API.	7 years ago
Calle Pettersson	859a825bb8	Replace --collectors.enabled with per-collector flags (#640 ) * Move NodeCollector into package collector * Refactor collector enabling * Update README with new collector enabled flags * Fix out-of-date inline flag reference syntax * Use new flags in end-to-end tests * Add flag to disable all default collectors * Track if a flag has been set explicitly * Add --collectors.disable-defaults to README * Revert disable-defaults flag * Shorten flags * Fixup timex collector registration * Fix end-to-end tests * Change procfs and sysfs path flags * Fix review comments	7 years ago
Sami Kerola	3762191e66	Add timex collector (#664 ) This collector is based on adjtimex(2) system call. The collector returns three values, status if time is synchronised, offset to remote reference, and local clock frequency adjustment. Values are taken from kernel time keeping data structures to avoid getting involved how the synchronisation is implemented. By that I mean one should not care if time is update using ntpd, systemd.timesyncd, ptpd, and so on. Since all time sync implementation will always end up telling to kernel what is the status with time one can simply omit the software in between, and look results of the syncing. As a positive side effect this makes collector very quick and conceptually specific, this does not monitor availability of NTP server, or network in between, or dns resolution, and other unrelated but necessary things. Minimum set of values to keep eye on are the following three: The node_timex_sync_status tells if local clock is in sync with a remote clock. Value is set to zero when synchronisation to a reliable server is lost, or a time sync software is misconfigured. The node_timex_offset_seconds tells how much local clock is off when compared to reference. In case of multiple time references this value is outcome of RFC 5905 adjustment algorithm. Ideally offset should be close to zero, and it depends about use case how large value is acceptable. For example a typical web server is probably fine if offset is about 0.1 or less, but that would not be good enough for mobile phone base station operator. The node_timex_freq tells amount of adjustment to local clock tick frequency. For example if offset is one second and growing the local clock will need instruction to tick quicker. Number value itself is not very important, and occasional small adjustments are fine. When frequency is unusually in stable one can assume quality of time stamps will not be accurate to very far in sub second range. Obviously explaining why local clock frequency behaves like a passenger in roller coaster is different matter. Explanations can vary from system load, to environmental issues such as a machine being physically too hot. Rest of the measurements can help when debugging. If you run a clock server do probably want to collect and keep track of everything. Pull-request: https://github.com/prometheus/node_exporter/pull/664	7 years ago
Leonid Evdokimov	c169b4b1c5	Add metrics from SNTPv4 packet to ntp collector & add ntpd sanity check (#655 ) * Add metrics from SNTPv4 packet to ntp collector & add ntpd sanity check 1. Checking local clock against remote NTP daemon is bad idea, local ntpd acting as a client should do it better and avoid excessive load on remote NTP server so the collector is refactored to query local NTP server. 2. Checking local clock against remote one does not check local ntpd itself. Local ntpd may be down or out of sync due to network issues, but clock will be OK. 3. Checking NTP server using sanity of it's response is tricky and depends on ntpd implementation, that's why common `node_ntp_sanity` variable is exported. * `govendor add golang.org/x/net/ipv4`, it is dependency of github.com/beevik/ntp * Update github.com/beevik/ntp to include boring SNTP fix * Use variable name from RFC5905 * ntp: move code to make export of raw metrics more explicit * Move NTP math to `github.com/beevik/ntp` * Make `golint` happy * Add some brief docs explaining `ntp` #655 and `timex` #664 modules * ntp: drop XXX comment that got its decision * ntp: add `_seconds` suffix to relevant metrics * Better `node_ntp_leap` comment * s/node_ntp_reftime/node_ntp_reference_timestamp_seconds/ as requested by @discordianfish * Extract subsystem name to const as suggested by @SuperQ	7 years ago
Karsten Weiss	b0d5c00832	cpu: Metric 'package_throttles_total' is per package. (#657 ) * cpu: Metric 'package_throttles_total' is per package. 'package_throttles_total' is per package, not per cpu. This also reduces the total number of cpu time series a lot (esp for multi core cpus). * cpu: Better handling of a cpulist edge-case. * cpu: Extract the package number from the directory name. Do not rely on the range index. * cpu: Add package_throttle_count for node0 cpu1 This file must be ignored by the cpu collector.	7 years ago
Matthias Rampke	e1f129c729	Use int64 throughout the ZFS collector. This avoids issues with integer overflows on 32-bit architectures. The Prometheus data format is float64, so regardless of the architecture we should handle large numbers. Fixes #629.	7 years ago
Ben Kochie	8839640cd1	Ignore wifi collector permission errors (#646 ) Ignore the permission denined error when the wifi collector has no permission to read metrics.	7 years ago
Calle Pettersson	dfe07eaae8	Switch to kingpin flags (#639 ) * Switch to kingpin flags * Fix logrus vendoring * Fix flags in main tests * Fix vendoring versions	7 years ago
Ben Kochie	46c31d8a7e	Enable IPVS collector by default (#623 ) * Silence error output when no IPVS present. * Enable by default. * Update end-to-end fixture. * Update README.	7 years ago
Tobias Schmidt	515b5a933d	Fix build tags of loadavg collector The collector is only implemented for a subset of all operating systems supported by go. Compilation will fail if attempted for another OS target.	7 years ago
Tobias Schmidt	016d79535d	Fix build tags of meminfo collector The meminfo collector only supports darwin, dragonfly, freebsd and linux and must not be included in other archtictures.	7 years ago
Andrea De Pasquale	1369763067	Change raid0 status line regexp for mdadm collector (#619 )	7 years ago
Aleksey Zhukov	7a914e58f2	Add parsing /proc/net/snmp6 file for netstat-linux (#615 ) * Add parsing /proc/net/snmp6 file * add /proc/net/snmp6 fixture * fix e2e test * gofmt * remove unuser variable * safe checks * add tests * change help format	7 years ago
Matt Layher	6e82fd1c56	Add XFS block mapping and block map B-tree stats (#575 )	7 years ago
ideaship	8d90276283	Add bcache collector (#597 ) * Add bcache collector for Linux This collector gathers metrics related to the Linux block cache (bcache) from sysfs. * Removed commented out code * Use project comment style * Add _sectors to metric name to indicate unit * Really use project comment style * Rename bcache.go to bcache_linux.go * Keep collector namespace clean Rename: - metric -> bcacheMetric - periodStatsToMetrics -> bcachePeriodStatsToMetric * Shorten slice initialization * Change label names to backing_device, cache_device * Remove five minute metrics (keep only total) * Include units in additional metric names * Enable bcache collector by default * Provide metrics in seconds, not nanoseconds * remove metrics with label "all" * Add fixtures, update end-to-end for bcache collector * Move fixtures/sys into tar.gz This changeset moves the collector/fixtures/sys directory into collector/fixtures/sys.tar.gz and tweaks the Makefile to unpack the tarball before tests are run. The reason for this change is that Windows does not allow colons in a path (colons are present in some of the bcache fixture files), nor can it (out of the box) deal with pathnames longer than 260 characters (which we would be increasingly likely to hit if we tried to replace colons with longer codes that are guaranteed not the turn up in regular file names). * Add ttar: plain text archive, replacement for tar This changeset adds ttar, a plain text replacement for tar, and uses it for the sysfs fixture archive. The syntax is loosely based on tar(1). Using a plain text archive makes it possible to review changes without downloading and extracting the archive. Also, when working on the repo, git diff and git log become useful again, allowing a committer to verify and track changes over time. The code is written in bash, because bash is available out of the box on all major flavors of Linux and on macOS. The feature set used is restricted to bash version 3.2 because that is what Apple is still shipping. The programm also works on Windows if bash is installed. Obviously, it does not solve the Windows limitations (path length limited to 260 characters, no symbolic links) that prompted the move to an archive format in the first place.	7 years ago
kadota kyohei	a077024f51	add diskstats on Darwin (#593 ) * Add diskstats collector for Darwin * Update year in the header * Update README.md * Add github.com/lufia/iostat to vendored packages * Change stats to follow naming guidelines * Add a entry of github.com/lufia/iostat into vendor.json * Remove /proc/diskstats from description	7 years ago
Rene Treffer	56bf8d4b2d	Add link to kernel documentation for sysfs/cpufreq files	8 years ago
Rene Treffer	bcc3cd92b8	Fix cpufreq statistics by converting kHz to Hz	8 years ago
Ben Kochie	182810056f	Fix Linux cpu errors (#606 ) Make the Linux cpu collector soft-error on missing `cpufreq` and `thermal_throttle` features.	8 years ago
Rene Treffer	2e9f1913b8	Move stat_linux to cpu_linux and add cpufreq stats (#548 )	8 years ago
Emanuele Rocca	047003b6bb	Add qdisc collector for Linux (#580 ) * Add qdisc collector for Linux This collector gathers basic queueing discipline metrics via netlink, similarly to what `tc -s qdisc show` does. * qdisc collector: nl-specific code moved, names fixed - netlink-specific parts moved to github.com/ema/qdisc - avoid using shortened names - counters renamed into XXX_total * Get rid of parseMessage error checking leftover * Add github.com/ema/qdisc to vendored packages * Update help texts and comments * Add qdisc collector to README file * qdisc collector end-to-end testing * Update qdisc dependency to latest version Update github.com/ema/qdisc dependency to revision 2c7e72d, which includes unit testing. * qdisc collector: rename "iface" label into "device"	8 years ago
Karsten Weiss	b2f4fd5776	Remove unused devstatCollector struct member 'bytes_total'. This also fixes this golint issue: devstat_freebsd.go:40:2: don't use underscores in Go names; struct field bytes_total should be bytesTotal	8 years ago
Jonas Große Sundrup	e6d031788f	Correct typo (#582 )	8 years ago
Karsten Weiss	bca09abf1c	golint: Fix NewStatCollector() doc string.	8 years ago
Karsten Weiss	b3e7420a27	cpu_darwin.go: s/cpu_ticks/cpuTicks/g	8 years ago
Karsten Weiss	b05c7d8dab	cpu_darwin.go: Fix doc strings.	8 years ago
Karsten Weiss	fff03c6c0c	Fix NewTCPStatCollector doc string.	8 years ago
Karsten Weiss	6720cfdbfe	golint: Fix comment on exported function NewDevstatCollector.	8 years ago
Karsten Weiss	b73af72853	Explicitly check for the rc 3 in call to getloadavg(). Reorder logic.	8 years ago
Karsten Weiss	af358ec800	golint fixes: if block ends with a return statement, so drop this else and outdent its block.	8 years ago
Karsten Weiss	732f839810	sysctl_bsd.go: golint fixes. Typo fix.	8 years ago
Robert Clark	58f50b31f2	Multiply port data XMIT/RCV metrics by 4 (#579 ) According to Mellanox, it is standard practice that the port_xmit_data and port_rcv_data files are split into 4 lanes. To get the actual transmit and receive values for each port, the metric needs to be multiplied by 4. Signed-Off-By: Robert Clark <robert.d.clark@hpe.com>	8 years ago
Kai S	59f9b8c5c1	Handle nonexisting bonding_masters file (#569 ) * silently ignore nonexisting bonding_masters file Add an empty fixtures dir without a bonding_masters file to test. * Moved the check to the Update() method Dropped the empty test dir.	8 years ago
Matt Layher	1feb091b36	Initial XFS collector	8 years ago
Derek Marcotte	5b557bf973	Fix metric name per review.	8 years ago
Derek Marcotte	db8ec9c6b4	Add exec_boot_time for freebsd, dragonfly Adds new sysctl type, bsdSysctlTypeStructTimeval to enable parsing of timevals from raw memory.	8 years ago
Daniele Sluijters	bb9d4ade0b	uname_linux: Build for 32bit MIPS too Since Go 1.8 32bit MIPS Big/Little Endian are supported assuming the target runs Linux and the kernel either emulates an FPU or can access the CPU one. This allows the node_collector to build for mips and mipsle opening up the possibility of running it on things like home routers (DD-\|Open\|ASUS-)Wrt firmware usually has the necessary bits in place.	8 years ago
Brian Brazil	f291d2d6dd	Get full resolution for node_time (#555 )	8 years ago
Karsten Weiss	d9703ff7c6	edac: Fix typo in csrow label of node_edac_csrow_uncorrectable_errors_total metric.	8 years ago
Derek Marcotte	83cecfa696	Fixes broken build on Dragonfly. Undefined err: `84eaa8fecd/collector/devstat_dragonfly.go (L145)`	8 years ago
Karsten Weiss	45ca8db352	Support the 'guest_nice' cpu mode of /proc/stat. 'guest_nice' is available since Linux 2.6.33.	8 years ago

1 2 3 4 5 ...

540 Commits (01ec8c5c5c3636a52d6f9e7c6f6cebaeb9f07af4)