Commit Graph

141 Commits (1b184ac1428d622fb093b4b1b74a654053a0f223)

Author SHA1 Message Date
beorn7 e01c5cefac notifier: fix increment of metric prometheus_notifications_errors_total
Previously, prometheus_notifications_errors_total was incremented by
one whenever a batch of alerts was affected by an error during sending
to a specific alertmanager. However, the corresponding metric
prometheus_notifications_sent_total, counting all alerts that were
sent (including those where the sent ended in error), is incremented
by the batch size, i.e. the number of alerts.

Therefore, the ratio used in the mixin for the
PrometheusErrorSendingAlertsToSomeAlertmanagers alert is inconsistent.

This commit changes the increment of
prometheus_notifications_errors_total to the number of alerts that
were sent in the attempt that ended in an error. It also adjusts the
metrics help string accordingly and makes the wording in the alert in
the mixin more precise.

Signed-off-by: beorn7 <beorn@grafana.com>
2024-11-26 15:50:02 +01:00
SuperQ bc1bc5c118
Update sigv4 library
Migrate use of prometheus/common/sigv4 to prometheus/sigv4.

Related to: https://github.com/prometheus/common/issues/709

Signed-off-by: SuperQ <superq@gmail.com>
2024-11-08 22:35:44 +01:00
Alan Protasio c78d5b94af
Disallowing configure AM with the v1 api (#13883)
* Stop supporting Alertmanager v1

* Disallowing configure AM with the v1 api

Signed-off-by: alanprot <alanprot@gmail.com>

* Update config/config_test.go

Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
Signed-off-by: Alan Protasio <alanprot@gmail.com>

* Update config/config.go

Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
Signed-off-by: Alan Protasio <alanprot@gmail.com>

* Addressing coments

Signed-off-by: alanprot <alanprot@gmail.com>

* Update notifier/notifier.go

Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
Signed-off-by: Alan Protasio <alanprot@gmail.com>

* Update config/config_test.go

Co-authored-by: Jan Fajerski <jan--f@users.noreply.github.com>
Signed-off-by: Alan Protasio <alanprot@gmail.com>

---------

Signed-off-by: alanprot <alanprot@gmail.com>
Signed-off-by: Alan Protasio <alanprot@gmail.com>
Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
Co-authored-by: Jan Fajerski <jan--f@users.noreply.github.com>
2024-10-18 15:23:14 +02:00
Ayoub Mrini d9a54284f5
Merge pull request #14987 from machine424/yafix
fix(notifier): avoid dropping known alertmanagers after each ApplyConfig
2024-10-10 21:14:35 +02:00
TJ Hoplock 6ebfbd2d54 chore!: adopt log/slog, remove go-kit/log
For: #14355

This commit updates Prometheus to adopt stdlib's log/slog package in
favor of go-kit/log. As part of converting to use slog, several other
related changes are required to get prometheus working, including:
- removed unused logging util func `RateLimit()`
- forward ported the util/logging/Deduper logging by implementing a small custom slog.Handler that does the deduping before chaining log calls to the underlying real slog.Logger
- move some of the json file logging functionality to use prom/common package functionality
- refactored some of the new json file logging for scraping
- changes to promql.QueryLogger interface to swap out logging methods for relevant slog sugar wrappers
- updated lots of tests that used/replicated custom logging functionality, attempting to keep the logical goal of the tests consistent after the transition
- added a healthy amount of `if logger == nil { $makeLogger }` type conditional checks amongst various functions where none were provided -- old code that used the go-kit/log.Logger interface had several places where there were nil references when trying to use functions like `With()` to add keyvals on the new *slog.Logger type

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
2024-10-07 15:58:50 -04:00
machine424 83ee57343a
fix(notifier): stop dropping known alertmanagers on each ApplyConfig and waiting on SD to update them.
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-10-03 11:24:01 +02:00
machine424 3dc623d30b
test(notifier): add reproducer
Signed-off-by: machine424 <ayoubmrini424@gmail.com>

Co-authored-by: tommy0 <tommy0@mail.ru>
2024-10-01 10:16:43 +02:00
Bryan Boreham 5710ddf24f
[ENHANCEMENT] Alerts: remove metrics for removed Alertmanagers (#13909)
* [ENHANCEMENT] Alerts: remove metrics for removed Alertmanagers

So they don't continue to report stale values.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-09-26 15:32:18 +01:00
Nathan Baulch 50cd453c8f
chore: Fix typos (#14868)
* Fix typos

---------

Signed-off-by: Nathan Baulch <nathan.baulch@gmail.com>
2024-09-10 22:32:03 +02:00
Julien 481d718539
Merge pull request #14661 from prymitive/TestHangingNotifier
tests: increase TestHangingNotifier timeout
2024-08-20 14:37:16 +02:00
Arve Knudsen 250aa5031d Remove empty line
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-08-19 10:50:27 +02:00
Arve Knudsen 3a78e76282 Upgrade golangci-lint to v1.60.1
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-08-18 12:13:25 +02:00
Lukasz Mierzwa 7694c89497 Increase TestHangingNotifier timeout
This test keeps timing out on our arm64 CI server, it does use a very slow timeout and that 5ms doesn't seem to be enough.
But it 10x.

Signed-off-by: Lukasz Mierzwa <lukasz@cloudflare.com>
2024-08-12 14:53:08 +01:00
Charles Korn 2dd07fbb1b
notifier: optionally drain queued notifications before shutting down (#14290)
* Add draining of queued notifications to `notifier.Manager`

Signed-off-by: Charles Korn <charles.korn@grafana.com>

* Update docs

Signed-off-by: Charles Korn <charles.korn@grafana.com>

* Address PR feedback

Signed-off-by: Charles Korn <charles.korn@grafana.com>

* Add more logging

Signed-off-by: Charles Korn <charles.korn@grafana.com>

* Address offline feedback: remove timeout

Signed-off-by: Charles Korn <charles.korn@grafana.com>

* Ensure stopping takes priority over further processing, make tests more robust

Signed-off-by: Charles Korn <charles.korn@grafana.com>

* Make channel unbuffered

Signed-off-by: Charles Korn <charles.korn@grafana.com>

* Update docs

Signed-off-by: Charles Korn <charles.korn@grafana.com>

* Fix race in test

Signed-off-by: Charles Korn <charles.korn@grafana.com>

* Remove unnecessary context

Signed-off-by: Charles Korn <charles.korn@grafana.com>

* Make Stop safe to call multiple times

Signed-off-by: Charles Korn <charles.korn@grafana.com>

---------

Signed-off-by: Charles Korn <charles.korn@grafana.com>
2024-06-26 11:32:04 +01:00
Arve Knudsen d902116b41 Fix various linting errors
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-06-24 16:11:53 -07:00
machine424 70beda092a fix(notifier): take alertmanagerSet.mtx before checking alertmanagerSet.ams in sendAll
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-06-19 09:43:52 +02:00
machine424 690de487e2 chore(notifier): Split 'Run()' into two goroutines: one to receive target updates and trigger reloads and the other one to send notifications.
This is done to prevent the latter operation from blocking/starving the former, as previously, the `tsets` channel was consumed by the same goroutine that consumes and feeds the buffered `n.more` channel, the `tsets` channel was less likely to be ready as it's unbuffered and only fed every `SDManager.updatert` seconds.

See https://github.com/prometheus/prometheus/issues/13676 and https://github.com/prometheus/prometheus/issues/8768

The synchronization with the sendLoop goroutine is managed through the n.mtx mutex.

This uses a similar approach than scrape manager's efbd6e41c5/scrape/manager.go (L115-L117)

The old TestHangingNotifier was replaced by the new one to more closely reflect reality.

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-06-19 09:43:52 +02:00
machine424 94d28cd6cf chore(notifier): add a reproducer for https://github.com/prometheus/prometheus/issues/13676
to show "targets groups update" starvation when the notifications queue is full and an Alertmanager
is down.

The existing `TestHangingNotifier` that was added in https://github.com/prometheus/prometheus/pull/10948 doesn't really reflect the reality as the SD changes are manually fed into `syncCh` in a continuous way, whereas in reality, updates are only resent every `updatert`.

The test added here sets up an SD manager and links it to the notifier. The SD changes will be triggered by that manager as it's done in reality.

Signed-off-by: machine424 <ayoubmrini424@gmail.com>

Co-authored-by: Ethan Hunter <ehunter@hudson-trading.com>
2024-06-19 09:43:52 +02:00
Oleksandr Redko f10c3454e9 Enable perfsprint linter and fix up code
Signed-off-by: Oleksandr Redko <oleksandr.red+github@gmail.com>
2024-05-15 17:51:05 +03:00
Bryan Boreham e1dd8e72df
Merge branch 'main' into merge-2.51.2-into-main
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-04-10 15:05:52 +01:00
Matthieu MOREL d496687c8e golangci-lint: enable usestdlibvars linter
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2024-04-08 19:26:23 +00:00
Simon Pasquier 8bd6ae1b20 Notifier: fix deadlock when zero alerts
When all alerts were dropped after alert relabeling, the `sendAll()`
function didn't release the lock properly which created a deadlock with
the Alertmanager target discovery.

In addition, the commit detects early when there are no Alertmanager
endpoint to notify to avoid unnecessary work.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2024-03-29 15:44:05 +01:00
Bryan Boreham 8c4e4b72a8 Notifier: pass parameters to goroutine explicitly
Avoids possible false sharing between loops.

Plausibly there is no problem in the current code, but it's easy enough to write it more safely.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-03-08 09:20:36 +00:00
Bryan Boreham 57c799132b Notifier: don't reuse payload after relabeling
Also clarify why these variables are being cleared.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-03-08 09:16:43 +00:00
darshanime 61ce18950f Fix pipeline golangci-lint error
Signed-off-by: darshanime <deathbullet@gmail.com>
2024-03-05 00:45:07 +05:30
Julien 88622cfa2c
Merge pull request #12551 from nabokihms/alertmanager-relabeling-config
Route different alerts to different alertmanagers
2024-03-04 16:45:00 +01:00
Matthieu MOREL 9c4782f1cc
golangci-lint: enable testifylint linter (#13254)
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-12-07 11:35:01 +00:00
TJ Hoplock 26b78da281 ci: use go1.21.0 fmt to make ci happy
https://github.com/prometheus/prometheus/actions/runs/6044443719/job/16403043771?pr=12774

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
2023-08-31 22:01:53 -04:00
TJ Hoplock 51d1d2cd96 feat: add AWS sigv4 support to alertmanager endpoints
Addresses: #12536

This commit adds support for configuring sigv4 to an
`alertmanager_config`. Based heavily on the sigv4 work in the remote
write client.

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
2023-08-31 21:47:25 -04:00
m.nabokikh 9d8463339d Fixes according to the code review
Signed-off-by: m.nabokikh <maksim.nabokikh@flant.com>
2023-07-23 00:37:30 +02:00
m.nabokikh 39d008f94f Route different alerts to different alertmanagers
Signed-off-by: m.nabokikh <maksim.nabokikh@flant.com>
2023-07-12 16:11:25 +02:00
Bryan Boreham 3711339a7d Alerts: more efficient relabel on Send
Re-use `labels.Builder` and use `relabel.ProcessBuilder` to skip a
conversion step.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-05-15 08:44:06 -04:00
Bryan Boreham c3f267d862 Alerts: more efficient generation of target labels
Use a label builder instead of a slice when creating labels for the
target alertmanagers. This can be passed directly to
`relabel.ProcessBuilder`, skipping a copy.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-05-15 08:43:59 -04:00
Bryan Boreham b987afa7ef labels: simplify call to get Labels from Builder
It took a `Labels` where the memory could be re-used, but in practice
this hardly ever benefitted. Especially after converting `relabel.Process`
to `relabel.ProcessBuilder`.

Comparing the parameter to `nil` was a bug; `EmptyLabels` is not `nil`
so the slice was reallocated multiple times by `append`.

Lastly `Builder.Labels()` now estimates that the final size will depend
on labels added and deleted.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-03-22 17:05:20 +00:00
Bryan Boreham b3ca791bfd Update package notifier for new labels.Labels type
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2022-12-19 15:22:09 +00:00
Bryan Boreham 9619d3fd3b notifier: remove unused code
None of the actions on `lb` have any effect because its result is not
read.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2022-12-04 17:35:36 +00:00
Bryan Boreham ac02cfcb79 notifier: in tests use labels.FromStrings
Replacing code which assumes the internal structure of `Labels`.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2022-09-09 13:34:49 +02:00
Cosrider bef6556ca5
delete redundant alias (#11180)
Signed-off-by: Cosrider <cosrider7@gmail.com>

Signed-off-by: Cosrider <cosrider7@gmail.com>
2022-08-31 15:50:38 +02:00
Bryan Boreham 8b863c42dd
Optimise relabeling by re-using memory (#11147)
* model/relabel: Add benchmark

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* model/relabel: re-use Builder across relabels

Saves memory allocations.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* labels.Builder: allow re-use of result slice

This reduces memory allocations where the caller has a suitable slice available.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* model/relabel: re-use source values slice

To reduce memory allocations.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Unwind one change causing test failures

Restore original behaviour in PopulateLabels, where we must not overwrite the input set.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* relabel: simplify values optimisation

Use a stack-based array for up to 16 source labels, which will be the
vast majority of cases.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* lint

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2022-08-19 15:27:52 +05:30
Julien Pivotto 2479fb42f0
Improve notifier queue test to reduce flakiness (#10984)
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2022-07-05 15:27:26 +02:00
Julien Pivotto 02f3297719
Split notifier select in 2 to ensure newer targets are used. (#10948)
* Split notifier select in 2 to ensure newer targets are used.

Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2022-07-01 14:23:23 +02:00
Matthieu MOREL 80fbe1de96
refactor (package model): move from github.com/pkg/errors to 'errors' and 'fmt' packages (#10748)
Signed-off-by: Matthieu MOREL <mmorel-35@users.noreply.github.com>
2022-06-16 10:38:27 +02:00
Matthieu MOREL e2ede285a2
refactor: move from io/ioutil to io and os packages (#10528)
* refactor: move from io/ioutil to io and os packages
* use fs.DirEntry instead of os.FileInfo after os.ReadDir

Signed-off-by: MOREL Matthieu <matthieu.morel@cnp.fr>
2022-04-27 11:24:36 +02:00
beorn7 c954cd9d1d Move packages out of deprecated pkg directory
This creates a new `model` directory and moves all data-model related
packages over there:
  exemplar labels relabel rulefmt textparse timestamp value

All the others are more or less utilities and have been moved to `util`:
  gate logging modetimevfs pool runtime

Signed-off-by: beorn7 <beorn@grafana.com>
2021-11-09 08:03:10 +01:00
Mateusz Gozdek 1a6c2283a3 Format Go source files using 'gofumpt -w -s -extra'
Part of #9557

Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>
2021-11-02 19:52:34 +01:00
DrAuYueng 69e309d202
Expose TargetsFromGroup/AlertmanagerFromGroup func and reuse this for (#9343)
static/file sd config check in promtool

Signed-off-by: DrAuYueng <ouyang1204@gmail.com>
2021-10-28 02:01:28 +02:00
Julien Pivotto 63b3e4e5ec
Enable HTTP2 again (#9398)
We are re-enabling HTTP 2 again. There has been a few bugfixes upstream
in go, and we have also enabled ReadIdleTimeout.

Fix #7588
Fix #9068

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-09-26 23:16:12 +02:00
Paweł Szulik f5563bfe95
tests: Move from t.Errorf and others. (Part 2) (#9309)
* Refactor util tests.

Signed-off-by: Paweł Szulik <paul.szulik@gmail.com>
2021-09-13 21:19:20 +02:00
Julius Volz 179b2155d1
Fix: Use json.Unmarshal() instead of json.Decoder (#9033)
* Fix: Use json.Unmarshal() instead of json.Decoder

See https://ahmet.im/blog/golang-json-decoder-pitfalls/

json.Decoder is for JSON streams, not single JSON objects / bodies.

Signed-off-by: Julius Volz <julius.volz@gmail.com>

* Revert modifications to targetgroup parsing

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2021-07-02 09:38:14 +01:00
Levi Harrison b5f6f8fb36 Switched to go-kit/log
Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-06-11 12:28:36 -04:00