From 75f112c824045ad292d15e4ff8954913582674d6 Mon Sep 17 00:00:00 2001 From: hc-github-team-consul-core Date: Mon, 12 Jun 2023 21:29:40 -0400 Subject: [PATCH] Backport of Fix two WAL metrics in docs/agent/telemetry.mdx into release/1.15.x (#17682) * backport of commit 0191cb110379a90201587822341034c0d2e1ac90 * backport of commit 6c245e7960c4647f9c2e5169c445648b8a2842b9 --------- Co-authored-by: josh --- .changelog/17593.txt | 3 +++ website/content/docs/agent/telemetry.mdx | 6 ++---- 2 files changed, 5 insertions(+), 4 deletions(-) create mode 100644 .changelog/17593.txt diff --git a/.changelog/17593.txt b/.changelog/17593.txt new file mode 100644 index 0000000000..1f84e75f57 --- /dev/null +++ b/.changelog/17593.txt @@ -0,0 +1,3 @@ +```release-note:bug +docs: fix list of telemetry metrics +``` diff --git a/website/content/docs/agent/telemetry.mdx b/website/content/docs/agent/telemetry.mdx index 114702ee73..1fb806c099 100644 --- a/website/content/docs/agent/telemetry.mdx +++ b/website/content/docs/agent/telemetry.mdx @@ -459,10 +459,8 @@ These metrics are used to monitor the health of the Consul servers. | `consul.raft.leader.dispatchNumLogs` | Measures the number of logs committed to disk in a batch. | logs | gauge | | `consul.raft.logstore.verifier.checkpoints_written` | Counts the number of checkpoint entries written to the LogStore. | checkpoints | counter | | `consul.raft.logstore.verifier.dropped_reports` | Counts how many times the verifier routine was still busy when the next checksum came in and so verification for a range was skipped. If you see this happen, consider increasing the interval between checkpoints with [`raft_logstore.verification.interval`](/consul/docs/agent/config/config-files#raft_logstore_verification) | reports dropped | counter | -| `consul.raft.logstore.verifier.ranges_verified` | Counts the number of log ranges for which a verification report has been completed. Refer to [Monitor Raft metrics and logs for WAL -](/consul/docs/agent/wal-logstore/monitoring) for more information. | log ranges verifications | counter | -| `consul.raft.logstore.verifier.read_checksum_failures` | Counts the number of times a range of logs between two check points contained at least one disk corruption. Refer to [Monitor Raft metrics and logs for WAL -](/consul/docs/agent/wal-logstore/monitoring) for more information. | disk corruptions | counter | +| `consul.raft.logstore.verifier.ranges_verified` | Counts the number of log ranges for which a verification report has been completed. Refer to [Monitor Raft metrics and logs for WAL](/consul/docs/agent/wal-logstore/monitoring) for more information. | log ranges verifications | counter | +| `consul.raft.logstore.verifier.read_checksum_failures` | Counts the number of times a range of logs between two check points contained at least one disk corruption. Refer to [Monitor Raft metrics and logs for WAL](/consul/docs/agent/wal-logstore/monitoring) for more information. | disk corruptions | counter | | `consul.raft.logstore.verifier.write_checksum_failures` | Counts the number of times a follower has a different checksum to the leader at the point where it writes to the log. This could be caused by either a disk-corruption on the leader (unlikely) or some other corruption of the log entries in-flight. | in-flight corruptions | counter | | `consul.raft.leader.lastContact` | Measures the time since the leader was last able to contact the follower nodes when checking its leader lease. It can be used as a measure for how stable the Raft timing is and how close the leader is to timing out its lease.The lease timeout is 500 ms times the [`raft_multiplier` configuration](/consul/docs/agent/config/config-files#raft_multiplier), so this telemetry value should not be getting close to that configured value, otherwise the Raft timing is marginal and might need to be tuned, or more powerful servers might be needed. See the [Server Performance](/consul/docs/install/performance) guide for more details. | ms | timer | | `consul.raft.leader.oldestLogAge` | The number of milliseconds since the _oldest_ log in the leader's log store was written. This can be important for replication health where write rate is high and the snapshot is large as followers may be unable to recover from a restart if restoring takes longer than the minimum value for the current leader. Compare this with `consul.raft.fsm.lastRestoreDuration` and `consul.raft.rpc.installSnapshot` to monitor. In normal usage this gauge value will grow linearly over time until a snapshot completes on the leader and the log is truncated. Note: this metric won't be emitted until the leader writes a snapshot. After an upgrade to Consul 1.10.0 it won't be emitted until the oldest log was written after the upgrade. | ms | gauge |