Commit Graph

420 Commits (09899ad56d42cea999e6cd985b0d729d28501f38)

Author SHA1 Message Date
Bryan Boreham 48287b15d4
Merge pull request #13914 from bboreham/scrape-reload-metric
[REFACTOR] Scraping: rename variables for clarity
2024-12-02 17:09:25 +00:00
TJ Hoplock aa8e067f13 Merge release-3.0 into main 2024-11-27 17:51:51 -05:00
TJ Hoplock 3e24e84172 fix!: stop unbounded memory usage from query log
Resolves: #15433

When I converted prometheus to use slog in #14906, I update both the
`QueryLogger` interface, as well as how the log calls to the
`QueryLogger` were built up in `promql.Engine.exec()`. The backing
logger for the `QueryLogger` in the engine is a
`util/logging.JSONFileLogger`, and it's implementation of the `With()`
method updates the logger the logger in place with the new keyvals added
onto the underlying slog.Logger, which means they get inherited onto
everything after. All subsequent calls to `With()`, even in later
queries, would continue to then append on more and more keyvals for the
various params and fields built up in the logger. In turn, this causes
unbounded growth of the logger, leading to increased memory usage, and
in at least one report was the likely cause of an OOM kill. More
information can be found in the issue and the linked slack thread.

This commit does a few things:

- It was referenced in feedback in #14906 that it would've been better
  to not change the `QueryLogger` interface if possible, this PR
proposes changes that bring it closer to alignment with the pre-3.0
`QueryLogger` interface contract
- reverts `promql.Engine.exec()`'s usage of the query logger to the
  pattern of building up an array of args to pass at once to the end log
call. Avoiding the repetitious calls to `.With()` are what resolve the
issue with the logger growth/memory usage.
- updates the scrape failure logger to use the update `QueryLogger`
  methods in the contract.
- updates tests accordingly
- cleans up unused methods

Builds and passes tests successfully. Tested locally and confirmed I
could no longer reproduce the issue/it resolved the issue.

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
2024-11-23 14:20:37 -05:00
Arve Knudsen 89bbb885e5
Upgrade to golangci-lint v1.62.0 (#15424)
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-11-20 17:22:20 +01:00
Bryan Boreham 26886d6d95 Merge branch 'release-3.0' into merge-release-into-main 2024-11-18 14:43:56 +00:00
Bryan Boreham 6380229c71 [REFACTOR] Scraping: rename variables for clarity
Instead of shadowing the outer variables, use different names.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-11-07 17:29:11 +00:00
Bryan Boreham 3f6a133f26 [Test] Scraping: check reload metrics
Need to extend `newTestScrapeMetrics`` to get at the Registry.
`gatherLabels` function could go upstream to `client_golang`.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-11-07 17:29:04 +00:00
alexgreenbank 6b09f094e1 scrape: stop erroring on empty scrapes
Signed-off-by: alexgreenbank <alex.greenbank@grafana.com>
2024-11-07 11:30:03 +00:00
Arthur Silva Sens 96c2fe04e7
Merge pull request #15328 from prometheus/deadcode
chore: Remove dead code
2024-11-06 13:20:53 -03:00
Matthieu MOREL af1a19fc78 enable errorf rule from perfsprint linter
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2024-11-06 16:50:36 +01:00
Simon Pasquier 37f3f3f2db Fix scrape failure logs
Before this change, logs would show like:

```
{...,"target":"http://localhost:8080/metrics","!BADKEY":"Get ..."}
```

After this change

```
{...,"msg":"Get ...","job_name":...,"target":...}
```

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2024-11-06 15:38:47 +01:00
Arthur Silva Sens 88818c9cb3
chore: Remove dead code
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
2024-11-04 13:00:22 -03:00
Vanshika cccbe72514
TSDB: Fix some edge cases when OOO is enabled (#14710)
Fix some edge cases when OOO is enabled

Signed-off-by: Vanshikav123 <vanshikav928@gmail.com>
Signed-off-by: Vanshika <102902652+Vanshikav123@users.noreply.github.com>
Signed-off-by: Jesus Vazquez <jesusvzpg@gmail.com>
Co-authored-by: Jesus Vazquez <jesusvzpg@gmail.com>
2024-10-23 17:34:28 +02:00
George Krajcsovits aa81210c8b
NHCB scrape: refactor state handling and speed up scrape test (#15193)
* NHCB: scrape use state field and not booleans

From comment https://github.com/prometheus/prometheus/pull/14978#discussion_r1800898724

Also make compareLabels read only and move storeLabels to the first
processed classic histogram series.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

* Speed up TestConvertClassicHistogramsToNHCB 3x

Reduce the startup time and timeouts

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

* lint fix

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

---------

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-10-22 17:49:25 +01:00
George Krajcsovits 877fd2a60e
Update scrape/scrape.go
Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>
2024-10-21 16:01:34 +02:00
György Krajcsovits 4283ae73dc Rename convert_classic_histograms to convert_classic_histograms_to_nhcb
On reviewer request.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-10-21 13:22:58 +02:00
György Krajcsovits a23aed5634 More followup to #15164
Scrape test for NHCB modified.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-10-21 11:10:50 +02:00
György Krajcsovits 70742a64aa Follow up #15178
Renaming

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-10-21 11:03:47 +02:00
György Krajcsovits 8c1b5a6251 Merge branch 'main' into nhcb-scrape-impl 2024-10-21 11:00:41 +02:00
Alex Greenbank 421a3c22ea
scrape: provide a fallback format (#15136)
scrape: Remove implicit fallback to the Prometheus text format

Remove implicit fallback to the Prometheus text format in case of invalid/missing Content-Type and fail the scrape instead. Add ability to specify a `fallback_scrape_protocol` in the scrape config.

---------

Signed-off-by: alexgreenbank <alex.greenbank@grafana.com>
Signed-off-by: Alex Greenbank <alex.greenbank@grafana.com>
Co-authored-by: Björn Rabenstein <beorn@grafana.com>
2024-10-18 17:12:31 +02:00
Bartlomiej Plotka efc43d0714
s/scrape_classic_histograms/always_scrape_classic_histograms (3.0 breaking change) (#15178)
This is for readability, especially when we can converting to nhcb option.

See discussion https://cloud-native.slack.com/archives/C077Z4V13AM/p1729155873397889

Signed-off-by: bwplotka <bwplotka@gmail.com>
2024-10-18 08:32:15 +01:00
György Krajcsovits 5ee698de2c Apply review comments
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-10-17 12:55:45 +02:00
György Krajcsovits 04b827dd77 Merge branch 'main' into nhcb-scrape-impl 2024-10-17 12:34:44 +02:00
Yi 2cabd1b707
config: remove expand-external-labels flag in release 3.0 (#14657)
remove expand-external-labels feature flag

and enabled env arg expansion for external labels by default.

Signed-off-by: jyz0309 <45495947@qq.com>
2024-10-17 10:25:05 +02:00
György Krajcsovits c13585665d Merge branch 'main' into nhcb-scrape-impl
# Conflicts:
#	promql/promqltest/test.go
#	util/convertnhcb/convertnhcb.go
2024-10-14 14:26:11 +02:00
Björn Rabenstein 8b545bab2f
Merge pull request #15026 from huochexizhan/main
fix: fix slice init length
2024-10-10 12:55:28 +02:00
huochexizhan ff16fa1827 fix: fix slice init length
Signed-off-by: huochexizhan <huochexizhan@outlook.com>
2024-10-10 16:14:40 +08:00
György Krajcsovits 8dfa733596 Fix labels handling with dedupelabels tag
Use scratch builder.
Use hash compare instead of compare by label.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-10-09 12:29:59 +02:00
György Krajcsovits 6b15791ec6 Follow the follow-up
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-10-09 11:56:54 +02:00
György Krajcsovits 4e911a3a05 Follow-up merge from main
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-10-09 11:54:49 +02:00
György Krajcsovits a2805e7934 Merge branch 'main' into nhcb-scrape-impl 2024-10-09 11:53:26 +02:00
György Krajcsovits 7fccf1e6be Disable CT handling and enable exemplar handling
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-10-09 11:08:08 +02:00
TJ Hoplock 6ebfbd2d54 chore!: adopt log/slog, remove go-kit/log
For: #14355

This commit updates Prometheus to adopt stdlib's log/slog package in
favor of go-kit/log. As part of converting to use slog, several other
related changes are required to get prometheus working, including:
- removed unused logging util func `RateLimit()`
- forward ported the util/logging/Deduper logging by implementing a small custom slog.Handler that does the deduping before chaining log calls to the underlying real slog.Logger
- move some of the json file logging functionality to use prom/common package functionality
- refactored some of the new json file logging for scraping
- changes to promql.QueryLogger interface to swap out logging methods for relevant slog sugar wrappers
- updated lots of tests that used/replicated custom logging functionality, attempting to keep the logical goal of the tests consistent after the transition
- added a healthy amount of `if logger == nil { $makeLogger }` type conditional checks amongst various functions where none were provided -- old code that used the go-kit/log.Logger interface had several places where there were nil references when trying to use functions like `With()` to add keyvals on the new *slog.Logger type

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
2024-10-07 15:58:50 -04:00
György Krajcsovits ed2e7dc258 Merge branch 'main' into nhcb-scrape-impl
# Conflicts:
#	scrape/scrape.go
2024-10-07 11:23:44 +02:00
Matthieu MOREL ab64966e9d
fix: use "ErrorContains" or "EqualError" instead of "Contains(t, err.Error()" and "Equal(t, err.Error()" (#15094)
* fix: use "ErrorContains" or "EqualError" instead of "Contains(t, err.Error()" and "Equal(t, err.Error()"

---------

Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Co-authored-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-10-06 16:35:29 +00:00
Manik Rana f1c57a95ed
change: No longer ingest OM _created as timeseries if feature-flag 'enable-ct-zero-ingestion' is enabled; fixed OM text CT conversion bug (#14738)
* chore: revert TypeRequiresCT to private

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* feat: init NewOpenMetricsParser with skipCT true

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* refac: allow opt-in to OM CT ingestion

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* chore: lint

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* chore: use textparse interface to set om options

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* fix: set skipOMSeries in test

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* chore: gofumpt

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* wip: add tests for OM CR parse

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* chore: merge ct tests

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* tests: add cases for OM text

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* fix: check correct test cases

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* chore: use both scrape protocols in config

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* fix: fix inputs and output tests for OM

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* chore: cleanup

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* refac: rename skipOMSeries to skipOMCTSeries

Co-authored-by: Arthur Silva Sens <arthursens2005@gmail.com>
Signed-off-by: Manik Rana <Manikrana54@gmail.com>

* fix: finish refac

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* refac: move setup code outside test

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* tests: verify _created lines create new metric in certain cases

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* fix: post merge fixes

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* chore: lint

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* manager: Fixed CT OMText conversion bug; Refactored tests.

Signed-off-by: bwplotka <bwplotka@gmail.com>

* chore: lint

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* chore: gofumpt

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* chore: imports

Signed-off-by: Manik Rana <manikrana54@gmail.com>

---------

Signed-off-by: Manik Rana <manikrana54@gmail.com>
Signed-off-by: Manik Rana <Manikrana54@gmail.com>
Signed-off-by: bwplotka <bwplotka@gmail.com>
Co-authored-by: Arthur Silva Sens <arthursens2005@gmail.com>
Co-authored-by: bwplotka <bwplotka@gmail.com>
2024-10-02 11:52:03 +01:00
Björn Rabenstein f74722841b
Merge pull request #14160 from alex-kattathra-johnson/issue-13959
Remove no-default-scrape-port featureFlag
2024-09-26 18:45:56 +02:00
George Krajcsovits 79a6238e19
scrape/scrape_test.go: reduce the time it takes to reload the manager (#14447)
* scrape/scrape_test.go: reduce the time it takes to reload the manager

TestNativeHistogramMaxSchemaSet took over 3x5s to complete because
there's a minimum reload interval.

I've made the testcases run in parallel and reduced the reload interval
to 10ms. Now the test runs in around 0.1-0.2 seconds.

Ran test 10000 times to check if it's flaky.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

---------

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-09-26 18:35:15 +02:00
Arthur Silva Sens 6bd9b1a7cc
Histogram CT Zero ingestion
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
2024-09-26 11:29:22 -03:00
Alex Johnson be0f10054e Remove no-default-scrape-port featureFlag
Signed-off-by: Alex Johnson <alex.kattathra.johnson@gmail.com>
2024-09-25 10:13:19 -05:00
György Krajcsovits 71fd2d93a9 Merge branch 'main' into nhcb-scrape-impl
# Conflicts:
#	config/config.go
#	scrape/scrape.go
2024-09-25 13:43:57 +02:00
Jeanette Tan 97ba2fc39d use caps for NHCB
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2024-09-25 13:38:30 +02:00
Jeanette Tan 90c266845b fix lint
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2024-09-25 13:38:27 +02:00
Jeanette Tan 050b5fc257 refine test cases according to spec
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2024-09-25 13:38:22 +02:00
Jeanette Tan de9de320a4 start to cover all test cases for scrape
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2024-09-25 13:38:18 +02:00
Jeanette Tan f35c6649e4 don't blindly convert series with the classic histogram name suffixes if they are not actually histograms based on metadata
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2024-09-25 13:38:15 +02:00
Jeanette Tan 8b3ae15ad5 expand tests for classic and exponential native histograms
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2024-09-25 13:38:10 +02:00
Jeanette Tan e3899187da expand tests for protobuf and fix problems
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2024-09-25 13:38:08 +02:00
Jeanette Tan cd498964e6 expand tests and support conversion to nhcb in the middle of scrape
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2024-09-25 13:38:04 +02:00
Jeanette Tan 172d4f2405 insert nhcb parser as intermediate layer
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2024-09-25 13:37:37 +02:00