This adds a new metric exposing per target scrape sample_limit value. Metrics are only exposed if extra-scrape-metrics feature flag is enabled.
scrape_sample_limit will make it easy to monitor and alert on targets getting close to configured sample_limit, which is important given than exceeding sample_limit results in the entire scrape results being rejected.
Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
Add a new built-in metric `scrape_timeout_seconds` to allow monitoring
of the ratio of scrape duration to the scrape timeout. Hide behind a
feature flag to avoid additional cardinality by default.
Signed-off-by: SuperQ <superq@gmail.com>
* PromQL: Fix start and end keywords masking label and metric names
This commit fixes an issue with the "at modifier" that introduced two
new keywords: `start` and `end`. In grouping options and in metric
names, these keywords took precedence over metric or label names, so
that those metrics and labels could no longer be referenced.
Signed-off-by: Clayton Peters <clayton.peters@man.com>
* Add in additional tests for metrics and/or labels called start/end.
Signed-off-by: Clayton Peters <clayton.peters@man.com>
* *: Cut 2.29.0-rc.0
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
* VERSION: bump to 2.29.0-rc.0
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
* Remove experimental wording on size-based retention
Followup of #9004
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Fix PR reference in changelog
Signed-off-by: George Brighton <george@gebn.co.uk>
* Describe EC2 availability zone IDs at most once per refresh (#9142)
Signed-off-by: George Brighton <george@gebn.co.uk>
* Describe EC2 availability zones at most once per SD load
Closes#9142.
Signed-off-by: George Brighton <george@gebn.co.uk>
* Incorporate feedback
Signed-off-by: George Brighton <george@gebn.co.uk>
* Integrate feedback
Signed-off-by: George Brighton <george@gebn.co.uk>
* Add a compatibility note for macOS users.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* *: Cut v2.29.0-rc.1
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
* Fix `kuma_sd` targetgroup reporting (#9157)
* Bundle all xDS targets into a single group
Signed-off-by: austin ce <austin.cawley@gmail.com>
* *: cut v2.29.0-rc.2
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
* Rename links
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* bump codemirror-promql to 0.17.0
Signed-off-by: Augustin Husson <husson.augustin@gmail.com>
* *: cut v2.29.0
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
* tsdb: align atomically accessed int64 (#9192)
This prevents a panic in 32-bit archs:
https://pkg.go.dev/sync/atomic#pkg-note-BUGFixed#9190
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Release 2.29.1 (#9193)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
Co-authored-by: Clayton Peters <clayton.peters@man.com>
Co-authored-by: Frederic Branczyk <fbranczyk@gmail.com>
Co-authored-by: George Brighton <george@gebn.co.uk>
Co-authored-by: Austin Cawley-Edwards <austin.cawley@gmail.com>
Co-authored-by: Levi Harrison <git@leviharrison.dev>
Co-authored-by: Augustin Husson <husson.augustin@gmail.com>
* optimize Linode SD by polling for event changes during refresh
Most accounts are fairly "static", in the sense that they're not cycling
through instances constantly. So rather than do a full refresh every
interval and potentially make several behind-the-scenes paginated API
calls, this will now poll the `/account/events/` endpoint every minute
with a list of events that we care about. If a matching event is found,
we then do a full refresh.
Co-authored-by: William Smith <wsmith@linode.com>
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Signed-off-by: William Smith <wsmith@linode.com>
* promtool: backfill: allow configuring block duration
When backfilling large amounts of data across long periods of time, it
may in certain circumstances be useful to use a longer block duration to
increase the efficiency and speed of the backfilling process. This patch
adds a flag --block-duration-power to allow a user to choose the power N
where the block duration is 2^(N+1)h.
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
* promtool: use sub-tests in backfill testing
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
* backfill: add messages to tests for clarity
When someone new breaks a test, seeing "expected: false, got: true" is
really not useful. A nice message helps here.
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
* backfill: test long block durations
A test that uses a long block duration to write bigger blocks is added.
The check to make sure all blocks are the default duration is removed.
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
* Added MaxSamplesPerSend
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Added tests
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Fixed order of require
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Added docs
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* writes -> writesReceived
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Improved send loop
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Cut v2.28.0-rc.0 (#8954)
* Cut v2.28.0-rc.0
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Changelog fixup
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Address review comments
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Downgrade some features to enhancements
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Adjust release date to today
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Migrate HTTP SD docs from docs repo (#8972)
See discussion in https://github.com/prometheus/docs/pull/1975
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Cut Prometheus v2.28.0 (#8973)
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Added walreplay API endpoint
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Added starting page to react-ui
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Documented the new endpoint
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Fixed typos
Signed-off-by: Levi Harrison <git@leviharrison.dev>
Co-authored-by: Julius Volz <julius.volz@gmail.com>
* Removed logo
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Changed isResponding to isUnexpected
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Changed width of progress bar
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Changed width of progress bar
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Added DB stats object
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Updated starting page to work with new fields
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Passing nil
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Passing nil (pt. 2)
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Passing nil (pt. 3)
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Passing nil (and also implementing a method this time) (pt. 4)
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Passing nil (and also implementing a method this time) (pt. 5)
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Changed const to let
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Passing nil (pt. 6)
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Remove SetStats method
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Added comma
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Changed api
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Changed to triple equals
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Fixed data response types
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Don't return pointer
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Changed version
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Fixed interface issue
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Fixed pointer
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Fixed copying lock value error
Signed-off-by: Levi Harrison <git@leviharrison.dev>
Co-authored-by: Julius Volz <julius.volz@gmail.com>
* Write exemplars to the WAL and send them over remote write.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Update example for exemplars, print data in a more obvious format.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Add metrics for remote write of exemplars.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Fix incorrect slices passed to send in remote write.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* We need to unregister the new metrics.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Address review comments
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Order of exemplar append vs write exemplar to WAL needs to change.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Several fixes to prevent sending uninitialized or incorrect samples with an exemplar. Fix dropping exemplar for missing series. Add tests for queue_manager sending exemplars
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* Store both samples and exemplars in the same timeseries buffer to remove the alloc when building final request, keep sub-slices in separate buffers for re-use
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* Condense sample/exemplar delivery tests to parameterized sub-tests
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* Rename test methods for clarity now that they also handle exemplars
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* Rename counter variable. Fix instances where metrics were not updated correctly
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* Add exemplars to LoadWAL benchmark
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* last exemplars timestamp metric needs to convert value to seconds with
ms precision
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Process exemplar records in a separate go routine when loading the WAL.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Address review comments related to clarifying comments and variable
names. Also refactor sample/exemplar to enqueue prompb types.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Regenerate types proto with comments, update protoc version again.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Put remote write of exemplars behind a feature flag.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Address some of Ganesh's review comments.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Move exemplar remote write feature flag to a config file field.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Address Bartek's review comments.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Don't allocate exemplar buffers in queue_manager if we're not going to
send exemplars over remote write.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Add ValidateExemplar function, validate exemplars when appending to head
and log them all to WAL before adding them to exemplar storage.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Address more reivew comments from Ganesh.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Add exemplar total label length check.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Address a few last review comments
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: Martin Disibio <mdisibio@gmail.com>
* scrape: add label limits per scrape
Add three new limits to the scrape configuration to provide some
mechanism to defend against unbound number of labels and excessive
label lengths. If any of these limits are broken by a sample from a
scrape, the whole scrape will fail. For all of these configuration
options, a zero value means no limit.
The `label_limit` configuration will provide a mechanism to bound the
number of labels per-scrape of a certain sample to a user defined limit.
This limit will be tested against the sample labels plus the discovery
labels, but it will exclude the __name__ from the count since it is a
mandatory Prometheus label to which applying constraints isn't
meaningful.
The `label_name_length_limit` and `label_value_length_limit` will
prevent having labels of excessive lengths. These limits also skip the
__name__ label for the same reasons as the `label_limit` option and will
also make the scrape fail if any sample has a label name/value length
that exceed the predefined limits.
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
* scrape: add metrics and alert to label limits
Add three gauge, one for each label limit to easily access the
limit set by a certain scrape target.
Also add a counter to count the number of targets that exceeded the
label limits and thus were dropped. This is useful for the
`PrometheusLabelLimitHit` alert that will notify the users that scraping
some targets failed because they had samples exceeding the label limits
defined in the scrape configuration.
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
* scrape: apply label limits to __name__ label
Apply limits to the __name__ label that was previously skipped and
truncate the label names and values in the error messages as they can be
very very long.
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
* scrape: remove label limits gauges and refactor
Remove `prometheus_target_scrape_pool_label_limit`,
`prometheus_target_scrape_pool_label_name_length_limit`, and
`prometheus_target_scrape_pool_label_value_length_limit` as they are not
really useful since we don't have the information on the labels in it.
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>