prometheus

Commit Graph

Author	SHA1	Message	Date
beorn7	e01c5cefac	notifier: fix increment of metric prometheus_notifications_errors_total Previously, prometheus_notifications_errors_total was incremented by one whenever a batch of alerts was affected by an error during sending to a specific alertmanager. However, the corresponding metric prometheus_notifications_sent_total, counting all alerts that were sent (including those where the sent ended in error), is incremented by the batch size, i.e. the number of alerts. Therefore, the ratio used in the mixin for the PrometheusErrorSendingAlertsToSomeAlertmanagers alert is inconsistent. This commit changes the increment of prometheus_notifications_errors_total to the number of alerts that were sent in the attempt that ended in an error. It also adjusts the metrics help string accordingly and makes the wording in the alert in the mixin more precise. Signed-off-by: beorn7 <beorn@grafana.com>	2024-11-26 15:50:02 +01:00
SuperQ	bc1bc5c118	Update sigv4 library Migrate use of prometheus/common/sigv4 to prometheus/sigv4. Related to: https://github.com/prometheus/common/issues/709 Signed-off-by: SuperQ <superq@gmail.com>	2024-11-08 22:35:44 +01:00
Alan Protasio	c78d5b94af	Disallowing configure AM with the v1 api (#13883 ) * Stop supporting Alertmanager v1 * Disallowing configure AM with the v1 api Signed-off-by: alanprot <alanprot@gmail.com> * Update config/config_test.go Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com> Signed-off-by: Alan Protasio <alanprot@gmail.com> * Update config/config.go Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com> Signed-off-by: Alan Protasio <alanprot@gmail.com> * Addressing coments Signed-off-by: alanprot <alanprot@gmail.com> * Update notifier/notifier.go Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com> Signed-off-by: Alan Protasio <alanprot@gmail.com> * Update config/config_test.go Co-authored-by: Jan Fajerski <jan--f@users.noreply.github.com> Signed-off-by: Alan Protasio <alanprot@gmail.com> --------- Signed-off-by: alanprot <alanprot@gmail.com> Signed-off-by: Alan Protasio <alanprot@gmail.com> Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com> Co-authored-by: Jan Fajerski <jan--f@users.noreply.github.com>	2024-10-18 15:23:14 +02:00
Ayoub Mrini	d9a54284f5	Merge pull request #14987 from machine424/yafix fix(notifier): avoid dropping known alertmanagers after each ApplyConfig	2024-10-10 21:14:35 +02:00
TJ Hoplock	6ebfbd2d54	chore!: adopt log/slog, remove go-kit/log For: #14355 This commit updates Prometheus to adopt stdlib's log/slog package in favor of go-kit/log. As part of converting to use slog, several other related changes are required to get prometheus working, including: - removed unused logging util func `RateLimit()` - forward ported the util/logging/Deduper logging by implementing a small custom slog.Handler that does the deduping before chaining log calls to the underlying real slog.Logger - move some of the json file logging functionality to use prom/common package functionality - refactored some of the new json file logging for scraping - changes to promql.QueryLogger interface to swap out logging methods for relevant slog sugar wrappers - updated lots of tests that used/replicated custom logging functionality, attempting to keep the logical goal of the tests consistent after the transition - added a healthy amount of `if logger == nil { $makeLogger }` type conditional checks amongst various functions where none were provided -- old code that used the go-kit/log.Logger interface had several places where there were nil references when trying to use functions like `With()` to add keyvals on the new *slog.Logger type Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>	2024-10-07 15:58:50 -04:00
machine424	83ee57343a	fix(notifier): stop dropping known alertmanagers on each ApplyConfig and waiting on SD to update them. Signed-off-by: machine424 <ayoubmrini424@gmail.com>	2024-10-03 11:24:01 +02:00
machine424	3dc623d30b	test(notifier): add reproducer Signed-off-by: machine424 <ayoubmrini424@gmail.com> Co-authored-by: tommy0 <tommy0@mail.ru>	2024-10-01 10:16:43 +02:00
Bryan Boreham	5710ddf24f	[ENHANCEMENT] Alerts: remove metrics for removed Alertmanagers (#13909 ) * [ENHANCEMENT] Alerts: remove metrics for removed Alertmanagers So they don't continue to report stale values. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2024-09-26 15:32:18 +01:00
Nathan Baulch	50cd453c8f	chore: Fix typos (#14868 ) * Fix typos --------- Signed-off-by: Nathan Baulch <nathan.baulch@gmail.com>	2024-09-10 22:32:03 +02:00
Julien	481d718539	Merge pull request #14661 from prymitive/TestHangingNotifier tests: increase TestHangingNotifier timeout	2024-08-20 14:37:16 +02:00
Arve Knudsen	250aa5031d	Remove empty line Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>	2024-08-19 10:50:27 +02:00
Arve Knudsen	3a78e76282	Upgrade golangci-lint to v1.60.1 Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>	2024-08-18 12:13:25 +02:00
Lukasz Mierzwa	7694c89497	Increase TestHangingNotifier timeout This test keeps timing out on our arm64 CI server, it does use a very slow timeout and that 5ms doesn't seem to be enough. But it 10x. Signed-off-by: Lukasz Mierzwa <lukasz@cloudflare.com>	2024-08-12 14:53:08 +01:00
Charles Korn	2dd07fbb1b	notifier: optionally drain queued notifications before shutting down (#14290 ) * Add draining of queued notifications to `notifier.Manager` Signed-off-by: Charles Korn <charles.korn@grafana.com> * Update docs Signed-off-by: Charles Korn <charles.korn@grafana.com> * Address PR feedback Signed-off-by: Charles Korn <charles.korn@grafana.com> * Add more logging Signed-off-by: Charles Korn <charles.korn@grafana.com> * Address offline feedback: remove timeout Signed-off-by: Charles Korn <charles.korn@grafana.com> * Ensure stopping takes priority over further processing, make tests more robust Signed-off-by: Charles Korn <charles.korn@grafana.com> * Make channel unbuffered Signed-off-by: Charles Korn <charles.korn@grafana.com> * Update docs Signed-off-by: Charles Korn <charles.korn@grafana.com> * Fix race in test Signed-off-by: Charles Korn <charles.korn@grafana.com> * Remove unnecessary context Signed-off-by: Charles Korn <charles.korn@grafana.com> * Make Stop safe to call multiple times Signed-off-by: Charles Korn <charles.korn@grafana.com> --------- Signed-off-by: Charles Korn <charles.korn@grafana.com>	2024-06-26 11:32:04 +01:00
Arve Knudsen	d902116b41	Fix various linting errors Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>	2024-06-24 16:11:53 -07:00
machine424	70beda092a	fix(notifier): take alertmanagerSet.mtx before checking alertmanagerSet.ams in sendAll Signed-off-by: machine424 <ayoubmrini424@gmail.com>	2024-06-19 09:43:52 +02:00
machine424	690de487e2	chore(notifier): Split 'Run()' into two goroutines: one to receive target updates and trigger reloads and the other one to send notifications. This is done to prevent the latter operation from blocking/starving the former, as previously, the `tsets` channel was consumed by the same goroutine that consumes and feeds the buffered `n.more` channel, the `tsets` channel was less likely to be ready as it's unbuffered and only fed every `SDManager.updatert` seconds. See https://github.com/prometheus/prometheus/issues/13676 and https://github.com/prometheus/prometheus/issues/8768 The synchronization with the sendLoop goroutine is managed through the n.mtx mutex. This uses a similar approach than scrape manager's `efbd6e41c5/scrape/manager.go (L115-L117)` The old TestHangingNotifier was replaced by the new one to more closely reflect reality. Signed-off-by: machine424 <ayoubmrini424@gmail.com>	2024-06-19 09:43:52 +02:00
machine424	94d28cd6cf	chore(notifier): add a reproducer for https://github.com/prometheus/prometheus/issues/13676 to show "targets groups update" starvation when the notifications queue is full and an Alertmanager is down. The existing `TestHangingNotifier` that was added in https://github.com/prometheus/prometheus/pull/10948 doesn't really reflect the reality as the SD changes are manually fed into `syncCh` in a continuous way, whereas in reality, updates are only resent every `updatert`. The test added here sets up an SD manager and links it to the notifier. The SD changes will be triggered by that manager as it's done in reality. Signed-off-by: machine424 <ayoubmrini424@gmail.com> Co-authored-by: Ethan Hunter <ehunter@hudson-trading.com>	2024-06-19 09:43:52 +02:00
Oleksandr Redko	f10c3454e9	Enable perfsprint linter and fix up code Signed-off-by: Oleksandr Redko <oleksandr.red+github@gmail.com>	2024-05-15 17:51:05 +03:00
Bryan Boreham	e1dd8e72df	Merge branch 'main' into merge-2.51.2-into-main Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2024-04-10 15:05:52 +01:00
Matthieu MOREL	d496687c8e	golangci-lint: enable usestdlibvars linter Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2024-04-08 19:26:23 +00:00
Simon Pasquier	8bd6ae1b20	Notifier: fix deadlock when zero alerts When all alerts were dropped after alert relabeling, the `sendAll()` function didn't release the lock properly which created a deadlock with the Alertmanager target discovery. In addition, the commit detects early when there are no Alertmanager endpoint to notify to avoid unnecessary work. Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2024-03-29 15:44:05 +01:00
Bryan Boreham	8c4e4b72a8	Notifier: pass parameters to goroutine explicitly Avoids possible false sharing between loops. Plausibly there is no problem in the current code, but it's easy enough to write it more safely. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2024-03-08 09:20:36 +00:00
Bryan Boreham	57c799132b	Notifier: don't reuse payload after relabeling Also clarify why these variables are being cleared. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2024-03-08 09:16:43 +00:00
darshanime	61ce18950f	Fix pipeline golangci-lint error Signed-off-by: darshanime <deathbullet@gmail.com>	2024-03-05 00:45:07 +05:30
Julien	88622cfa2c	Merge pull request #12551 from nabokihms/alertmanager-relabeling-config Route different alerts to different alertmanagers	2024-03-04 16:45:00 +01:00
Matthieu MOREL	9c4782f1cc	golangci-lint: enable testifylint linter (#13254 ) Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2023-12-07 11:35:01 +00:00
TJ Hoplock	26b78da281	ci: use go1.21.0 fmt to make ci happy https://github.com/prometheus/prometheus/actions/runs/6044443719/job/16403043771?pr=12774 Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>	2023-08-31 22:01:53 -04:00
TJ Hoplock	51d1d2cd96	feat: add AWS sigv4 support to alertmanager endpoints Addresses: #12536 This commit adds support for configuring sigv4 to an `alertmanager_config`. Based heavily on the sigv4 work in the remote write client. Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>	2023-08-31 21:47:25 -04:00
m.nabokikh	9d8463339d	Fixes according to the code review Signed-off-by: m.nabokikh <maksim.nabokikh@flant.com>	2023-07-23 00:37:30 +02:00
m.nabokikh	39d008f94f	Route different alerts to different alertmanagers Signed-off-by: m.nabokikh <maksim.nabokikh@flant.com>	2023-07-12 16:11:25 +02:00
Bryan Boreham	3711339a7d	Alerts: more efficient relabel on Send Re-use `labels.Builder` and use `relabel.ProcessBuilder` to skip a conversion step. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2023-05-15 08:44:06 -04:00
Bryan Boreham	c3f267d862	Alerts: more efficient generation of target labels Use a label builder instead of a slice when creating labels for the target alertmanagers. This can be passed directly to `relabel.ProcessBuilder`, skipping a copy. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2023-05-15 08:43:59 -04:00
Bryan Boreham	b987afa7ef	labels: simplify call to get Labels from Builder It took a `Labels` where the memory could be re-used, but in practice this hardly ever benefitted. Especially after converting `relabel.Process` to `relabel.ProcessBuilder`. Comparing the parameter to `nil` was a bug; `EmptyLabels` is not `nil` so the slice was reallocated multiple times by `append`. Lastly `Builder.Labels()` now estimates that the final size will depend on labels added and deleted. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2023-03-22 17:05:20 +00:00
Bryan Boreham	b3ca791bfd	Update package notifier for new labels.Labels type Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-12-19 15:22:09 +00:00
Bryan Boreham	9619d3fd3b	notifier: remove unused code None of the actions on `lb` have any effect because its result is not read. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-12-04 17:35:36 +00:00
Bryan Boreham	ac02cfcb79	notifier: in tests use labels.FromStrings Replacing code which assumes the internal structure of `Labels`. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-09-09 13:34:49 +02:00
Cosrider	bef6556ca5	delete redundant alias (#11180 ) Signed-off-by: Cosrider <cosrider7@gmail.com> Signed-off-by: Cosrider <cosrider7@gmail.com>	2022-08-31 15:50:38 +02:00
Bryan Boreham	8b863c42dd	Optimise relabeling by re-using memory (#11147 ) * model/relabel: Add benchmark Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * model/relabel: re-use Builder across relabels Saves memory allocations. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * labels.Builder: allow re-use of result slice This reduces memory allocations where the caller has a suitable slice available. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * model/relabel: re-use source values slice To reduce memory allocations. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * Unwind one change causing test failures Restore original behaviour in PopulateLabels, where we must not overwrite the input set. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * relabel: simplify values optimisation Use a stack-based array for up to 16 source labels, which will be the vast majority of cases. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * lint Signed-off-by: Bryan Boreham <bjboreham@gmail.com> Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-08-19 15:27:52 +05:30
Julien Pivotto	2479fb42f0	Improve notifier queue test to reduce flakiness (#10984 ) Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>	2022-07-05 15:27:26 +02:00
Julien Pivotto	02f3297719	Split notifier select in 2 to ensure newer targets are used. (#10948 ) * Split notifier select in 2 to ensure newer targets are used. Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>	2022-07-01 14:23:23 +02:00
Matthieu MOREL	80fbe1de96	refactor (package model): move from github.com/pkg/errors to 'errors' and 'fmt' packages (#10748 ) Signed-off-by: Matthieu MOREL <mmorel-35@users.noreply.github.com>	2022-06-16 10:38:27 +02:00
Matthieu MOREL	e2ede285a2	refactor: move from io/ioutil to io and os packages (#10528 ) * refactor: move from io/ioutil to io and os packages * use fs.DirEntry instead of os.FileInfo after os.ReadDir Signed-off-by: MOREL Matthieu <matthieu.morel@cnp.fr>	2022-04-27 11:24:36 +02:00
beorn7	c954cd9d1d	Move packages out of deprecated pkg directory This creates a new `model` directory and moves all data-model related packages over there: exemplar labels relabel rulefmt textparse timestamp value All the others are more or less utilities and have been moved to `util`: gate logging modetimevfs pool runtime Signed-off-by: beorn7 <beorn@grafana.com>	2021-11-09 08:03:10 +01:00
Mateusz Gozdek	1a6c2283a3	Format Go source files using 'gofumpt -w -s -extra' Part of #9557 Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>	2021-11-02 19:52:34 +01:00
DrAuYueng	69e309d202	Expose TargetsFromGroup/AlertmanagerFromGroup func and reuse this for (#9343 ) static/file sd config check in promtool Signed-off-by: DrAuYueng <ouyang1204@gmail.com>	2021-10-28 02:01:28 +02:00
Julien Pivotto	63b3e4e5ec	Enable HTTP2 again (#9398 ) We are re-enabling HTTP 2 again. There has been a few bugfixes upstream in go, and we have also enabled ReadIdleTimeout. Fix #7588 Fix #9068 Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2021-09-26 23:16:12 +02:00
Paweł Szulik	f5563bfe95	tests: Move from t.Errorf and others. (Part 2) (#9309 ) * Refactor util tests. Signed-off-by: Paweł Szulik <paul.szulik@gmail.com>	2021-09-13 21:19:20 +02:00
Julius Volz	179b2155d1	Fix: Use json.Unmarshal() instead of json.Decoder (#9033 ) * Fix: Use json.Unmarshal() instead of json.Decoder See https://ahmet.im/blog/golang-json-decoder-pitfalls/ json.Decoder is for JSON streams, not single JSON objects / bodies. Signed-off-by: Julius Volz <julius.volz@gmail.com> * Revert modifications to targetgroup parsing Signed-off-by: Julius Volz <julius.volz@gmail.com>	2021-07-02 09:38:14 +01:00
Levi Harrison	b5f6f8fb36	Switched to go-kit/log Signed-off-by: Levi Harrison <git@leviharrison.dev>	2021-06-11 12:28:36 -04:00

1 2 3

141 Commits (35047db9db56026dacf0d96315eb1afff6e70358)