* Send target and metadata cache in context (again)
The previous attempt was rolled back in #10590 due to memory issues.
`sl.parentCtx` and `sl.ctx` both had a copy of the cache and target info
in the previous attempt and it was hard to pin-point where the context
was being retained causing the memory increase.
I've experimented a bunch in #10627 to figure out that this approach doesn't
cause memory increase. Beyond that, just using this info in _any_ other context
is causing a memory increase.
The change fixed a bunch of long-standing in the OTel Collector that the
community was waiting on and release is blocked on a few downstream distrubutions
of OTel Collector waiting on a fix. I propose to merge this change in while
I investigate what is happening.
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
* Gate the change behind a manager option
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
* discovery: expose HTTP client options to discoverers
Signed-off-by: Robert Fratto <robertfratto@gmail.com>
* discovery/http: use HTTP client options for created client
Signed-off-by: Robert Fratto <robertfratto@gmail.com>
* scrape: use a list of HTTP client options instead of just dial context
Signed-off-by: Robert Fratto <robertfratto@gmail.com>
* discovery: rephrase comment
Signed-off-by: Robert Fratto <robertfratto@gmail.com>
* scrape: allow providing a custom Dialer for scraping
This commit extends config.ScrapeConfig with an optional field to
override how HTTP connections to targets are created. This field is not
set directly in Prometheus, and is only added for the convenience of
downstream importers.
Closes#9706
Signed-off-by: Robert Fratto <robertfratto@gmail.com>
* scrape: move custom dial function to scrape.Options
Signed-off-by: Robert Fratto <robertfratto@gmail.com>
This creates a new `model` directory and moves all data-model related
packages over there:
exemplar labels relabel rulefmt textparse timestamp value
All the others are more or less utilities and have been moved to `util`:
gate logging modetimevfs pool runtime
Signed-off-by: beorn7 <beorn@grafana.com>
Add a new built-in metric `scrape_timeout_seconds` to allow monitoring
of the ratio of scrape duration to the scrape timeout. Hide behind a
feature flag to avoid additional cardinality by default.
Signed-off-by: SuperQ <superq@gmail.com>
This is part of https://github.com/prometheus/prometheus/pull/5882 that can be done to simplify things.
All todos I added will be fixed in follow up PRs.
* querier.Querier, querier.Appender, querier.SeriesSet, and querier.Series interfaces merged
with storage interface.go. All imports that.
* querier.SeriesIterator replaced by chunkenc.Iterator
* Added chunkenc.Iterator.Seek method and tests for xor implementation (?)
* Since we properly handle SelectParams for Select methods I adjusted min max
based on that. This should help in terms of performance for queries with functions like offset.
* added Seek to deletedIterator and test.
* storage/tsdb was removed as it was only a unnecessary glue with incompatible structs.
No logic was changed, only different source of abstractions, so no need for benchmarks.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
i) Uses the more idiomatic Wrap and Wrapf methods for creating nested errors.
ii) Fixes some incorrect usages of fmt.Errorf where the error messages don't have any formatting directives.
iii) Does away with the use of fmt package for errors in favour of pkg/errors
Signed-off-by: tariqibrahim <tariq181290@gmail.com>
* scrape: Add global jitter for HA server
Covers issue in https://github.com/prometheus/prometheus/pull/4926#issuecomment-449039848
where the HA setup become a problem for targets unable to be scraped simultaneously.
The new jitter per server relies on the hostname and external labels which necessarily to be uniq.
As before, scrape offset will be calculated with regard the absolute time, so even
restart/reload doesn't change scrape time per scrape target + prometheus instance.
Use fqdn if possible, otherwise fall back to the hostname. It adds extra random seed
to calculate server hash to be distinguish on machines with the same hostname, but
different DC.
Signed-off-by: Aleksei Semiglazov <xjewer@gmail.com>
* scrape: catch errors when creating HTTP clients
This change makes sure that no scrape pool is created with a nil HTTP
client.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Address Tariq's comment
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Address Brian's comment
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
The scrape manage receiver's channel now just saves the target sets
and another backgorund runner updates the scrape loops every 5 seconds.
This is so that the scrape manager doesn't block the receiving channel
when it does the long background reloading of the scrape loops.
Active and dropped targets are now saved in each scrape pool instead of
the scrape manager. This is mainly to avoid races when getting the
targets via the web api.
When reloading the scrape loops now happens in parallel to speed up the
final disared state and this also speeds up the prometheus's shutting
down.
Also updated some funcs signatures in the web package for consistency.
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
This adds a per-target cache of scraped metadata. The metadata is only
available for the lifecycle of the attached target. An API endpoint allows
to select metadata by metric name and a label selection of targets.
Signed-off-by: Fabian Reinartz <freinartz@google.com>
Fix race by properly locking access to scrape pools. Use separate mutex for information needed by UI so that UI isn't blocked when targets are being updated.
read bearer token on every request
removed unuseful scrape manager startup log
new tests -TestScrapeManagerReloadNoChange( scrape pool is not reloaded
when the config hasn't changed), TestMissingBearerAuthFile ,
TestBearerAuthFileRoundTripper
* refactor: move targetGroup struct and CheckOverflow() to their own package
* refactor: move auth and security related structs to a utility package, fix import error in utility package
* refactor: Azure SD, remove SD struct from config
* refactor: DNS SD, remove SD struct from config into dns package
* refactor: ec2 SD, move SD struct from config into the ec2 package
* refactor: file SD, move SD struct from config to file discovery package
* refactor: gce, move SD struct from config to gce discovery package
* refactor: move HTTPClientConfig and URL into util/config, fix import error in httputil
* refactor: consul, move SD struct from config into consul discovery package
* refactor: marathon, move SD struct from config into marathon discovery package
* refactor: triton, move SD struct from config to triton discovery package, fix test
* refactor: zookeeper, move SD structs from config to zookeeper discovery package
* refactor: openstack, remove SD struct from config, move into openstack discovery package
* refactor: kubernetes, move SD struct from config into kubernetes discovery package
* refactor: notifier, use targetgroup package instead of config
* refactor: tests for file, marathon, triton SD - use targetgroup package instead of config.TargetGroup
* refactor: retrieval, use targetgroup package instead of config.TargetGroup
* refactor: storage, use config util package
* refactor: discovery manager, use targetgroup package instead of config.TargetGroup
* refactor: use HTTPClient and TLS config from configUtil instead of config
* refactor: tests, use targetgroup package instead of config.TargetGroup
* refactor: fix tagetgroup.Group pointers that were removed by mistake
* refactor: openstack, kubernetes: drop prefixes
* refactor: remove import aliases forced due to vscode bug
* refactor: move main SD struct out of config into discovery/config
* refactor: rename configUtil to config_util
* refactor: rename yamlUtil to yaml_config
* refactor: kubernetes, remove prefixes
* refactor: move the TargetGroup package to discovery/
* refactor: fix order of imports