mirror of https://github.com/prometheus/prometheus
46 Commits (b2a8657d589331a598c75b3e51b860d0437f20b7)
Author | SHA1 | Message | Date |
---|---|---|---|
Bartlomiej Plotka |
9198952f7c
|
[PRW 2.0] Merging `remote-write-2.0` feature branch to main (PRW 2.0 support + metadata in WAL) (#14395)
* Remote Write 1.1: e2e benchmarks (#13102) * Remote Write e2e benchmarks Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * Prometheus ports automatically assigned Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * make dashboard editable + more modular to different job label values Signed-off-by: Callum Styan <callumstyan@gmail.com> * Dashboard improvements * memory stats * diffs look at counter increases Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * run script: absolute path for config templates Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * grafana dashboard improvements * show actual values of metrics * add memory stats and diff Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * dashboard changes Signed-off-by: Callum Styan <callumstyan@gmail.com> --------- Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> Signed-off-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Callum Styan <callumstyan@gmail.com> * replace snappy encoding library Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * add new proto types Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * add decode function for new write request proto Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * add lookup table struct that is used to build the symbol table in new write request format Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * Implement code paths for new proto format Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * update example server to include handler for new format Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * Add new test client Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * tests and new -> original proto mapping util Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * add new proto support on receiver end Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * Fix test Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * no-brainer copypaste but more performance write support Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * remove some comented code Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * fix mocks and fixture Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * add basic reduce remote write handler benchmark Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * refactor out common code between write methods Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * fix: queue manager to include float histograms in new requests Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * add sender-side tests and fix failing ones Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * refactor queue manager code to remove some duplication Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * fix build Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * Improve sender benchmarks and some allocations Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * Use github.com/golang/snappy Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * cleanup: remove hardcoded fake url for testing Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * Add 1.1 version handling code Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * Remove config, update proto Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * gofmt Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * fix NewWriteClient and change new flags wording Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * fields rewording in handler Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * remote write handler to checks version header Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * fix typo in log Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * lint Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * Add minmized remote write proto format Co-authored-by: Marco Pracucci <marco@pracucci.com> Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * add functions for translating between new proto formats symbol table and actual prometheus labels Co-authored-by: Marco Pracucci <marco@pracucci.com> Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * add functionality for new minimized remote write request format Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * fix minor things Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * Make LabelSymbols a fixed32 Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * remove unused proto type Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * update tests Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * fix build for stringlabels tag Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * Use two uint32 to encode (offset,leng) Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * manually optimize varint marshaling Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * Use unsafe []byte->string cast to reuse buffer Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * fix writeRequestMinimizedFixture Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * remove all code from previous interning approach the 'minimized' version is now the only v1.1 version Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * minimally-tested exemplar support for rw 1.1 Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * refactor new version flag to make it easier to pick a specific format instead of having multiple flags, plus add new formats for testing Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * use exp slices for backwards compat. to go 1.20 plus add copyright header to test file Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * fix label ranging Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * Add bytes slice (instead of slice of 32bit vars) format for testing Co-authored-by: Nicolás Pazos <npazosmendez@gmail.com> Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * test additional len and lenbytes formats Co-authored-by: Nicolás Pazos <npazosmendez@gmail.com> Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * remove mistaken package lock changes Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * remove formats we've decided not to use Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * remove more format types we probably won't use Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * More cleanup Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * use require instead of assert in custom marshal test Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * cleanup; remove some unused functions Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * more cleanup, mostly linting fixes Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * remove package-lock.json change again Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * more cleanup, address review comments Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * fix test panic Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * fix minor lint issue + use labels Range function since it looks like the tests fail to do `range labels.Labels` on CI Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * new interning format based on []string indeces Co-authored-by: bwplotka <bwplotka@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * remove all new rw formats but the []string one also adapt tests to the new format Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * cleanup rwSymbolTable Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * add some TODOs for later Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * don't reserve field 3 for new proto and add TODO Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * fix custom marshaling Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * lint Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * additional merge fixes Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * lint fixes Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * fix server example Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * revert package-lock.json changes Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * update example prometheus version Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * define separate proto types for remote write 2.0 Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * lint Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * rename new proto types and move to separate pkg Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * update prometheus version for example Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * make proto Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * make Metadata not nullable Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * remove old MinSample proto message Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * change enum names to fit buf build recommend enum naming and lint rules Signed-off-by: Callum Styan <callumstyan@gmail.com> * remote: Added test for classic histogram grouping when sending rw; Fixed queue manager test delay. (#13421) Signed-off-by: bwplotka <bwplotka@gmail.com> * Remote write v2: metadata support in every write request (#13394) * Approach bundling metadata along with samples and exemplars Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Add first test; rebase with main Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Alternative approach: bundle metadata in TimeSeries protobuf Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * update go mod to match main branch Signed-off-by: Callum Styan <callumstyan@gmail.com> * fix after rebase Signed-off-by: Callum Styan <callumstyan@gmail.com> * we're not going to modify the 1.X format anymore Signed-off-by: Callum Styan <callumstyan@gmail.com> * Modify AppendMetadata based on the fact that we be putting metadata into timeseries Signed-off-by: Callum Styan <callumstyan@gmail.com> * Rename enums for remote write versions to something that makes more sense + remove the added `sendMetadata` flag. Signed-off-by: Callum Styan <callumstyan@gmail.com> * rename flag that enables writing of metadata records to the WAL Signed-off-by: Callum Styan <callumstyan@gmail.com> * additional clean up Signed-off-by: Callum Styan <callumstyan@gmail.com> * lint Signed-off-by: Callum Styan <callumstyan@gmail.com> * fix usage of require.Len Signed-off-by: Callum Styan <callumstyan@gmail.com> * some clean up from review comments Signed-off-by: Callum Styan <callumstyan@gmail.com> * more review fixes Signed-off-by: Callum Styan <callumstyan@gmail.com> --------- Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> Signed-off-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Paschalis Tsilias <paschalist0@gmail.com> * remote write 2.0: sync with `main` branch (#13510) * consoles: exclude iowait and steal from CPU Utilisation 'iowait' and 'steal' indicate specific idle/wait states, which shouldn't be counted into CPU Utilisation. Also see https://github.com/prometheus-operator/kube-prometheus/pull/796 and https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/667. Per the iostat man page: %idle Show the percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request. %iowait Show the percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request. %steal Show the percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> * tsdb: shrink txRing with smaller integers 4 billion active transactions ought to be enough for anyone. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * tsdb: create isolation transaction slice on demand When Prometheus restarts it creates every series read in from the WAL, but many of those series will be finished, and never receive any more samples. By defering allocation of the txRing slice to when it is first needed, we save 32 bytes per stale series. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * add cluster variable to Overview dashboard Signed-off-by: Erik Sommer <ersotech@posteo.de> * promql: simplify Native Histogram arithmetics Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com> * Cut 2.49.0-rc.0 (#13270) * Cut 2.49.0-rc.0 Signed-off-by: bwplotka <bwplotka@gmail.com> * Removed the duplicate. Signed-off-by: bwplotka <bwplotka@gmail.com> --------- Signed-off-by: bwplotka <bwplotka@gmail.com> * Add unit protobuf parser Signed-off-by: Arianna Vespri <arianna.vespri@yahoo.it> * Go on adding protobuf parsing for unit Signed-off-by: Arianna Vespri <arianna.vespri@yahoo.it> * ui: create a reproduction for https://github.com/prometheus/prometheus/issues/13292 Signed-off-by: machine424 <ayoubmrini424@gmail.com> * Get conditional right Signed-off-by: Arianna Vespri <arianna.vespri@yahoo.it> * Get VM Scale Set NIC (#13283) Calling `*armnetwork.InterfacesClient.Get()` doesn't work for Scale Set VM NIC, because these use a different Resource ID format. Use `*armnetwork.InterfacesClient.GetVirtualMachineScaleSetNetworkInterface()` instead. This needs both the scale set name and the instance ID, so add an `InstanceID` field to the `virtualMachine` struct. `InstanceID` is empty for a VM that isn't a ScaleSetVM. Signed-off-by: Daniel Nicholls <daniel.nicholls@resdiary.com> * Cut v2.49.0-rc.1 Signed-off-by: bwplotka <bwplotka@gmail.com> * Delete debugging lines, amend error message for unit Signed-off-by: Arianna Vespri <arianna.vespri@yahoo.it> * Correct order in error message Signed-off-by: Arianna Vespri <arianna.vespri@yahoo.it> * Consider storage.ErrTooOldSample as non-retryable Signed-off-by: Daniel Kerbel <nmdanny@gmail.com> * scrape_test.go: Increase scrape interval in TestScrapeLoopCache to reduce potential flakiness Signed-off-by: machine424 <ayoubmrini424@gmail.com> * Avoid creating string for suffix, consider counters without _total suffix Signed-off-by: Arianna Vespri <arianna.vespri@yahoo.it> * build(deps): bump github.com/prometheus/client_golang Bumps [github.com/prometheus/client_golang](https://github.com/prometheus/client_golang) from 1.17.0 to 1.18.0. - [Release notes](https://github.com/prometheus/client_golang/releases) - [Changelog](https://github.com/prometheus/client_golang/blob/main/CHANGELOG.md) - [Commits](https://github.com/prometheus/client_golang/compare/v1.17.0...v1.18.0) --- updated-dependencies: - dependency-name: github.com/prometheus/client_golang dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * build(deps): bump actions/setup-node from 3.8.1 to 4.0.1 Bumps [actions/setup-node](https://github.com/actions/setup-node) from 3.8.1 to 4.0.1. - [Release notes](https://github.com/actions/setup-node/releases) - [Commits]( |
|
Arthur Silva Sens |
7aacef9b42
|
bugfix: Decouple native histogram ingestions and protobuf parsing
Up until this point, if a scrape was done with the protobuf format Prometheus would always try to ingest native histograms even with the feature flag disabled. This causes problems with other feature-flags that depend on the protobuf format, like 'created-timestamp-zero-ingestion'. This commit decouples native histogram parsing from ingestion, making sure ingestion only happens when the 'native-histogram' feature-flag is enabled. Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com> |
|
Matthieu MOREL |
6f595c6762
|
golangci-lint: enable whitespace linter (#13905)
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com> |
|
David Ashpole |
c755fa9935
|
support unregistering scrape manager metrics
Signed-off-by: David Ashpole <dashpole@google.com> |
|
Bryan Boreham |
75fc8a1535
|
Merge pull request #13167 from bboreham/simplify-TargetsActive
scrape: simplify TargetsActive function |
|
Arthur Silva Sens |
5082655392
|
Append Created Timestamps (#12733)
* Append created timestamps. Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com> * Log when created timestamps are ignored Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com> * Proposed changes to Append CT PR. Changes: * Changed textparse Parser interface for consistency and robustness. * Changed CT interface to be more explicit and handle validation. * Simplified test, change scrapeManager to allow testability. * Added TODOs. Signed-off-by: bwplotka <bwplotka@gmail.com> * Updates. Signed-off-by: bwplotka <bwplotka@gmail.com> * Addressed comments. Signed-off-by: bwplotka <bwplotka@gmail.com> * Refactor head_appender test Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com> * Fix linter issues Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com> * Use model.Sample in head appender test Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com> --------- Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com> Signed-off-by: bwplotka <bwplotka@gmail.com> Co-authored-by: bwplotka <bwplotka@gmail.com> |
|
Bryan Boreham | 9051100aba |
Scraping: share buffer pool across all scrapes
Previously we had one per scrapePool, and one of those per configured scraping job. Each pool holds a few unused buffers, so sharing one across all scrapePools reduces total heap memory. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> |
|
Bryan Boreham | f095c33da1 |
scrape: simplify TargetsActive function
Since everything was serialized on a single mutex, it's exactly the same if we process targets in sequence without starting goroutines. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> |
|
Matthieu MOREL | fe057fc60d |
use Go standard errors package
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com> |
|
Oleksandr Redko | fa90ca46e5 |
ci(lint): enable godot; append dot at the end of comments
Signed-off-by: Oleksandr Redko <Oleksandr_Redko@epam.com> |
|
Paulin Todev |
5752050b42
|
Scrape metrics can now be registered with a non-default registry.
* A registerer is passed to the scrape Manager, and all scrape metrics register with it. * For now the registry which we pass to the scrape Manager is still the global one. Signed-off-by: Paulin Todev <paulin.todev@gmail.com> |
|
Bartlomiej Plotka |
624b973ebf
|
Added ability to specify scrape protocols to accept during HTTP content type negotiation. (#12738)
* Added ability to specify scrape protocols to accept during HTTP content type negotiation. This is done via new option in GlobalConfig and ScrapeConfig: "scrape_protocol" Signed-off-by: bwplotka <bwplotka@gmail.com> * Fixed readability and log message. Signed-off-by: bwplotka <bwplotka@gmail.com> --------- Signed-off-by: bwplotka <bwplotka@gmail.com> |
|
Bryan Boreham |
1e3fef6ab0
|
scraping: limit detail on dropped targets, to save memory (#12647)
It's possible (quite common on Kubernetes) to have a service discovery return thousands of targets then drop most of them in relabel rules. The main place this data is used is to display in the web UI, where you don't want thousands of lines of display. The new limit is `keep_dropped_targets`, which defaults to 0 for backwards-compatibility. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> |
|
Julius Volz | cb045c0e4b |
Fix wording from "jitterSeed" -> "offsetSeed" for server-wide scrape offsets
In digital communication, "jitter" usually refers to how much a signal deviates from true periodicity, see https://en.wikipedia.org/wiki/Jitter. The way we are using the "jitterSeed" in Prometheus does not affect the true periodicity at all, but just introduces a constant phase shift (or offset) within the period. So it would be more correct and less confusing to call the "jitterSeed" an "offsetSeed" instead. Signed-off-by: Julius Volz <julius.volz@gmail.com> |
|
beorn7 | 5b53aa1108 |
style: Replace `else if` cascades with `switch`
Wiser coders than myself have come to the conclusion that a `switch` statement is almost always superior to a statement that includes any `else if`. The exceptions that I have found in our codebase are just these two: * The `if else` is followed by an additional statement before the next condition (separated by a `;`). * The whole thing is within a `for` loop and `break` statements are used. In this case, using `switch` would require tagging the `for` loop, which probably tips the balance. Why are `switch` statements more readable? For one, fewer curly braces. But more importantly, the conditions all have the same alignment, so the whole thing follows the natural flow of going down a list of conditions. With `else if`, in contrast, all conditions but the first are "hidden" behind `} else if `, harder to spot and (for no good reason) presented differently from the first condition. I'm sure the aforemention wise coders can list even more reasons. In any case, I like it so much that I have found myself recommending it in code reviews. I would like to make it a habit in our code base, without making it a hard requirement that we would test on the CI. But for that, there has to be a role model, so this commit eliminates all `if else` occurrences, unless it is autogenerated code or fits one of the exceptions above. Signed-off-by: beorn7 <beorn@grafana.com> |
|
Julien Pivotto | 599b70a05d |
Add include scrape configs
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu> |
|
Łukasz Mierzwa |
e1b7082008
|
Show individual scrape pools on /targets page (#11142)
* Add API endpoints for getting scrape pool names This adds api/v1/scrape_pools endpoint that returns the list of *names* of all the scrape pools configured. Having it allows to find out what scrape pools are defined without having to list and parse all targets. The second change is adding scrapePool query parameter support in api/v1/targets endpoint, that allows to filter returned targets by only finding ones for passed scrape pool name. Both changes allow to query for a specific scrape pool data, rather than getting all the targets for all possible scrape pools. The problem with api/v1/targets endpoint is that it returns huge amount of data if you configure a lot of scrape pools. Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com> * Add a scrape pool selector on /targets page Current targets page lists all possible targets. This works great if you only have a few scrape pools configured, but for systems with a lot of scrape pools and targets this slow things down a lot. Not only does the /targets page load very slowly in such case (waiting for huge API response) but it also take a long time to render, due to huge number of elements. This change adds a dropdown selector so it's possible to select only intersting scrape pool to view. There's also scrapePool query param that will open selected pool automatically. Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com> Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com> |
|
Ganesh Vernekar |
3cbf87b83d
|
Enable protobuf negotiation only when histograms are enabled
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> |
|
Paschalis Tsilias |
5a8e202f94
|
Append metadata to the WAL in the scrape loop (#10312)
* Append metadata to the WAL Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Remove extra whitespace; Reword some docstrings and comments Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Use RLock() for hasNewMetadata check Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Use single byte for metric type in RefMetadata Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Update proposed WAL format for single-byte type metadata Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Address first round of review comments Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Amend description of metadata in wal.md Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Correct key used to retrieve metadata from cache When we're setting metadata entries in the scrapeCace, we're using the p.Help(), p.Unit(), p.Type() helpers, which retrieve the series name and use it as the cache key. When checking for cache entries though, we used p.Series() as the key, which included the metric name _with_ its labels. That meant that we were never actually hitting the cache. We're fixing this by utiling the __name__ internal label for correctly getting the cache entries after they've been set by setHelp(), setType() or setUnit(). Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Put feature behind a feature flag Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Reorder WAL format document Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Fix CR comments Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Extract logic about changing metadata in an anonymous function Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Implement new proposed WAL format and amend relevant tests Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Use 'const' for metadata field names Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Apply metadata to head memSeries in Commit, not in AppendMetadata Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Add docstring and rename extracted helper in scrape.go Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Fix review comments around TestMetadata* tests Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Rebase with merged TSDB changes; fix duplicate definitions after rebase Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Remove leftover changes on db_test.go Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Rename feature flag Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Simplify updateMetadata helper function Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Remove extra newline Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> |
|
Levi Harrison |
d61459d826
|
`no-default-scrape-port` feature flag (#9523)
* Add `no-default-scrape-port` flag Signed-off-by: Levi Harrison <git@leviharrison.dev> |
|
Alban Hurtaud |
41630b8e88
|
Add hidden flag to configure discovery loop interval (#10634)
* Add hidden flag to configure discovery loop interval Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com> |
|
Goutham Veeramachaneni |
2381d7be57
|
Send target and metadata cache in context (again) (#10636)
* Send target and metadata cache in context (again) The previous attempt was rolled back in #10590 due to memory issues. `sl.parentCtx` and `sl.ctx` both had a copy of the cache and target info in the previous attempt and it was hard to pin-point where the context was being retained causing the memory increase. I've experimented a bunch in #10627 to figure out that this approach doesn't cause memory increase. Beyond that, just using this info in _any_ other context is causing a memory increase. The change fixed a bunch of long-standing in the OTel Collector that the community was waiting on and release is blocked on a few downstream distrubutions of OTel Collector waiting on a fix. I propose to merge this change in while I investigate what is happening. Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> * Gate the change behind a manager option Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> |
|
Robert Fratto |
44a5e705be
|
discovery: Expose custom HTTP client options to discoverers (#10462)
* discovery: expose HTTP client options to discoverers Signed-off-by: Robert Fratto <robertfratto@gmail.com> * discovery/http: use HTTP client options for created client Signed-off-by: Robert Fratto <robertfratto@gmail.com> * scrape: use a list of HTTP client options instead of just dial context Signed-off-by: Robert Fratto <robertfratto@gmail.com> * discovery: rephrase comment Signed-off-by: Robert Fratto <robertfratto@gmail.com> |
|
Robert Fratto |
f0ec619eec
|
scrape: allow providing a custom Dialer for scraping (#10415)
* scrape: allow providing a custom Dialer for scraping This commit extends config.ScrapeConfig with an optional field to override how HTTP connections to targets are created. This field is not set directly in Prometheus, and is only added for the convenience of downstream importers. Closes #9706 Signed-off-by: Robert Fratto <robertfratto@gmail.com> * scrape: move custom dial function to scrape.Options Signed-off-by: Robert Fratto <robertfratto@gmail.com> |
|
beorn7 | c954cd9d1d |
Move packages out of deprecated pkg directory
This creates a new `model` directory and moves all data-model related packages over there: exemplar labels relabel rulefmt textparse timestamp value All the others are more or less utilities and have been moved to `util`: gate logging modetimevfs pool runtime Signed-off-by: beorn7 <beorn@grafana.com> |
|
SuperQ |
31f4108758
|
Add scrape_timeout_seconds metric
Add a new built-in metric `scrape_timeout_seconds` to allow monitoring of the ratio of scrape duration to the scrape timeout. Hide behind a feature flag to avoid additional cardinality by default. Signed-off-by: SuperQ <superq@gmail.com> |
|
austin ce |
5bdfba1d20
|
Extract and export GetFQDN()
Signed-off-by: austin ce <austin.cawley@gmail.com> |
|
Naka Masato |
a1c1313b3c
|
fix typo in comment for scrape manager (#9094)
Signed-off-by: Masato Naka <masatonaka1989@gmail.com> |
|
Levi Harrison | b5f6f8fb36 |
Switched to go-kit/log
Signed-off-by: Levi Harrison <git@leviharrison.dev> |
|
Julien Pivotto |
4e5b1722b3
|
Move away from testutil, refactor imports (#8087)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu> |
|
Bartlomiej Plotka | 34426766d8 |
Unify Iterator interfaces. All point to storage now.
This is part of https://github.com/prometheus/prometheus/pull/5882 that can be done to simplify things. All todos I added will be fixed in follow up PRs. * querier.Querier, querier.Appender, querier.SeriesSet, and querier.Series interfaces merged with storage interface.go. All imports that. * querier.SeriesIterator replaced by chunkenc.Iterator * Added chunkenc.Iterator.Seek method and tests for xor implementation (?) * Since we properly handle SelectParams for Select methods I adjusted min max based on that. This should help in terms of performance for queries with functions like offset. * added Seek to deletedIterator and test. * storage/tsdb was removed as it was only a unnecessary glue with incompatible structs. No logic was changed, only different source of abstractions, so no need for benchmarks. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> |
|
gotjosh |
8b49c9285d
|
scrape: Add metrics to track bytes and entries in the metadata cache (#6675)
Signed-off-by: gotjosh <josue@grafana.com> |
|
yeya24 | b7bb278e95 |
make targets active parallel (#5740)
Signed-off-by: yeya24 <yb532204897@gmail.com> |
|
Tariq Ibrahim | 8fdfa8abea |
refine error handling in prometheus (#5388)
i) Uses the more idiomatic Wrap and Wrapf methods for creating nested errors. ii) Fixes some incorrect usages of fmt.Errorf where the error messages don't have any formatting directives. iii) Does away with the use of fmt package for errors in favour of pkg/errors Signed-off-by: tariqibrahim <tariq181290@gmail.com> |
|
Tom Wilkie | 807fd33ecc |
Review feedback.
- Update read path to use labels.Labels. - Fix the tests. - Remove pack. - Remove unused function. - Fix race in tests. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com> |
|
Simon Pasquier | 23069b87dc |
scrape: fallback to hostname if lookup fails (#5366)
Signed-off-by: Simon Pasquier <spasquie@redhat.com> |
|
xjewer | 0d1a69353e |
scrape: Add global jitter for HA server (#5181)
* scrape: Add global jitter for HA server Covers issue in https://github.com/prometheus/prometheus/pull/4926#issuecomment-449039848 where the HA setup become a problem for targets unable to be scraped simultaneously. The new jitter per server relies on the hostname and external labels which necessarily to be uniq. As before, scrape offset will be calculated with regard the absolute time, so even restart/reload doesn't change scrape time per scrape target + prometheus instance. Use fqdn if possible, otherwise fall back to the hostname. It adds extra random seed to calculate server hash to be distinguish on machines with the same hostname, but different DC. Signed-off-by: Aleksei Semiglazov <xjewer@gmail.com> |
|
Simon Pasquier |
12708acd15
|
scrape: catch errors when creating HTTP clients (#5182)
* scrape: catch errors when creating HTTP clients This change makes sure that no scrape pool is created with a nil HTTP client. Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Address Tariq's comment Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Address Brian's comment Signed-off-by: Simon Pasquier <spasquie@redhat.com> |
|
Wei Guo | 996fd958ac |
fix deadlock in scrape manager (#4894)
Scrape manager will fall in deadlock when we reload configs frequently. |
|
Krasi Georgiev |
47a673c3a0
|
process scrape loops reloading in parallel (#4526)
The scrape manage receiver's channel now just saves the target sets and another backgorund runner updates the scrape loops every 5 seconds. This is so that the scrape manager doesn't block the receiving channel when it does the long background reloading of the scrape loops. Active and dropped targets are now saved in each scrape pool instead of the scrape manager. This is mainly to avoid races when getting the targets via the web api. When reloading the scrape loops now happens in parallel to speed up the final disared state and this also speeds up the prometheus's shutting down. Also updated some funcs signatures in the web package for consistency. Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com> |
|
Fabian Reinartz | ad4c33c1ff |
scrape,api: provide per-target metric metadata
This adds a per-target cache of scraped metadata. The metadata is only available for the lifecycle of the attached target. An API endpoint allows to select metadata by metric name and a label selection of targets. Signed-off-by: Fabian Reinartz <freinartz@google.com> |
|
Krasi Georgiev | ddd46de6f4 |
Races/3994 (#4005)
Fix race by properly locking access to scrape pools. Use separate mutex for information needed by UI so that UI isn't blocked when targets are being updated. |
|
ferhat elmas | ffa673f7d8 |
General simplifications (#3887)
Another try as in #1516 |
|
Conor Broderick | 99006d3baf | Added dropped targets API to targets endpoint (#3870) | |
Krasi Georgiev | 6ce84dbcb1 | rename ScrapeManager struct to Manager to remove stutter | |
Krasi Georgiev | b75428ec19 |
rename package retrieve to scrape
no fucnctinal changes just renaming retrieval to scrape |