prometheus

Commit Graph

Author	SHA1	Message	Date
Callum Styan	67838643ee	Add config option for remote job name (#6043 ) * Track remote write queues via a map so we don't care about index. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Support a job name for remote write/read so we can differentiate between them using the name. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Remote write/read has Name to not confuse the meaning of the field with scrape job names. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Split queue/client label into remote_name and url labels. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Don't allow for duplicate remote write/read configs. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Ensure we restart remote write queues if the hash of their config has not changed, but the remote name has changed. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Include name in remote read/write config hashes, simplify duplicates check, update test accordingly. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-12-12 12:47:23 -08:00
Julien Pivotto	4397916cb2	Add honor_timestamps (#5304 ) Fixes #5302 Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2019-03-15 10:04:15 +00:00
Callum Styan	83c46fd549	update Consul vendor code so that catalog.ServiceMultipleTags can be (#5151 ) Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-03-12 10:31:27 +00:00
Simon Pasquier	027d2ece14	config: resolve more file paths (#5284 ) Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-03-12 10:24:15 +00:00
Marcel D. Juhnke	c7d83b2b6a	discovery: add support for Managed Identity authentication in Azure SD (#4590 ) Signed-off-by: Marcel Juhnke <marrat@marrat.de>	2018-12-19 10:03:33 +00:00
Simon Pasquier	ff08c40091	discovery/openstack: support tls_config Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2018-09-25 14:31:32 +02:00
Tariq Ibrahim	f708fd5c99	Adding support for multiple azure environments (#4569 ) Signed-off-by: Tariq Ibrahim <tariq.ibrahim@microsoft.com>	2018-09-04 17:55:40 +02:00
Philippe Laflamme	2aba238f31	Use common HTTPClientConfig for marathon_sd configuration (#4009 ) This adds support for basic authentication which closes #3090 The support for specifying the client timeout was removed as discussed in https://github.com/prometheus/common/pull/123. Marathon was the only sd mechanism doing this and configuring the timeout is done through `Context`. DC/OS uses a custom `Authorization` header for authenticating. This adds 2 new configuration properties to reflect this. Existing configuration files that use the bearer token will no longer work. More work is required to make this backwards compatible.	2018-04-05 09:08:18 +01:00
Kristiyan Nikolov	be85ba3842	discovery/ec2: Support filtering instances in discovery (#4011 )	2018-03-31 07:51:11 +01:00
Corentin Chary	60dafd425c	consul: improve consul service discovery (#3814 ) * consul: improve consul service discovery Related to #3711 - Add the ability to filter by tag and node-meta in an efficient way (`/catalog/services` allow filtering by node-meta, and returns a `map[string]string` or `service`->`tags`). Tags and nore-meta are also used in `/catalog/service` requests. - Do not require a call to the catalog if services are specified by name. This is important because on large cluster `/catalog/services` changes all the time. - Add `allow_stale` configuration option to do stale reads. Non-stale reads can be costly, even more when you are doing them to a remote datacenter with 10k+ targets over WAN (which is common for federation). - Add `refresh_interval` to minimize the strain on the catalog and on the service endpoint. This is needed because of that kind of behavior from consul: https://github.com/hashicorp/consul/issues/3712 and because a catalog on a large cluster would basically change all the time. No need to discover targets in 1sec if we scrape them every minute. - Added plenty of unit tests. Benchmarks ---------- ```yaml scrape_configs: - job_name: prometheus scrape_interval: 60s static_configs: - targets: ["127.0.0.1:9090"] - job_name: "observability-by-tag" scrape_interval: "60s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 tag: marathon-user-observability # Used in After refresh_interval: 30s # Used in After+delay relabel_configs: - source_labels: [__meta_consul_tags] regex: ^(.,)?marathon-user-observability(,.)?$ action: keep - job_name: "observability-by-name" scrape_interval: "60s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 services: - observability-cerebro - observability-portal-web - job_name: "fake-fake-fake" scrape_interval: "15s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 services: - fake-fake-fake ``` Note: tested with ~1200 services, ~5000 nodes. \| Resource \| Empty \| Before \| After \| After + delay \| \| -------- \|:-----:\|:------:\|:-----:\|:-------------:\| \|/service-discovery size\|5K\|85MiB\|27k\|27k\|27k\| \|`go_memstats_heap_objects`\|100k\|1M\|120k\|110k\| \|`go_memstats_heap_alloc_bytes`\|24MB\|150MB\|28MB\|27MB\| \|`rate(go_memstats_alloc_bytes_total[5m])`\|0.2MB/s\|28MB/s\|2MB/s\|0.3MB/s\| \|`rate(process_cpu_seconds_total[5m])`\|0.1%\|15%\|2%\|0.01%\| \|`process_open_fds`\|16\|1236\|22\|22\| \|`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="services"}[5m])`\|~0\|1\|1\|0.03\| \|`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="service"}[5m])`\|0.1\|80\|0.5\|0.5\| \|`prometheus_target_sync_length_seconds{quantile="0.9",scrape_job="observability-by-tag"}`\|N/A\|200ms\|0.2ms\|0.2ms\| \|Network bandwidth\|~10kbps\|~2.8Mbps\|~1.6Mbps\|~10kbps\| Filtering by tag using relabel_configs uses 100kiB and 23kiB/s per service per job and quite a lot of CPU. Also sends and additional 1Mbps of traffic to consul. Being a little bit smarter about this reduces the overhead quite a lot. Limiting the number of `/catalog/services` queries per second almost removes the overhead of service discovery. * consul: tweak `refresh_interval` behavior `refresh_interval` now does what is advertised in the documentation, there won't be more that one update per `refresh_interval`. It now defaults to 30s (which was also the current waitTime in the consul query). This also make sure we don't wait another 30s if we already waited 29s in the blocking call by substracting the number of elapsed seconds. Hopefully this will do what people expect it does and will be safer for existing consul infrastructures.	2018-03-23 14:48:43 +00:00
Tobias Schmidt	7098c56474	Add remote read filter option For special remote read endpoints which have only data for specific queries, it is desired to limit the number of queries sent to the configured remote read endpoint to reduce latency and performance overhead.	2017-11-13 23:30:01 +01:00
Thibault Chataigner	bf4a279a91	Remote storage reads based on oldest timestamp in primary storage (#3129 ) Currently all read queries are simply pushed to remote read clients. This is fine, except for remote storage for wich it unefficient and make query slower even if remote read is unnecessary. So we need instead to compare the oldest timestamp in primary/local storage with the query range lower boundary. If the oldest timestamp is older than the mint parameter, then there is no need for remote read. This is an optionnal behavior per remote read client. Signed-off-by: Thibault Chataigner <t.chataigner@criteo.com>	2017-10-18 12:08:14 +01:00
Fuente, Pablo Andres	902fafb8e7	Fixing tests for Windows Fixing the config/config_test, the discovery/file/file_test and the promql/promql_test tests for Windows. For most of the tests, the fix involved correct handling of path separators. In the case of the promql tests, the issue was related to the removal of the temporal directories used by the storage. The issue is that the RemoveAll() call returns an error when it tries to remove a directory which is not empty, which seems to be true due to some kind of process that is still running after closing the storage. To fix it I added some retries to the remove of the temporal directories. Adding tags file from Universal Ctags to .gitignore	2017-07-09 01:59:30 -03:00
Roman Vynar	dbe2eb2afc	Hide consul token on UI. (#2797 )	2017-06-01 22:14:23 +01:00
Conor Broderick	6766123f93	Replace regex with Secret type and remarshal config to hide secrets (#2775 )	2017-05-29 12:46:23 +01:00
Brian Akins	27d66628a1	Allow limiting Kubernetes service discover to certain namespaces Allow namespace discovery to be more easily extended in the future by using a struct rather than just a list. Rename fields for kubernetes namespace discovery	2017-04-27 07:41:36 -04:00
yklausz	75880b594f	Adding consul capability to connect via tls	2017-03-17 22:37:18 +01:00
Julius Volz	e9476b35d5	Re-add multiple remote writers Each remote write endpoint gets its own set of relabeling rules. This is based on the (yet-to-be-merged) https://github.com/prometheus/prometheus/pull/2419, which removes legacy remote write implementations.	2017-02-20 13:23:12 +01:00
Fabian Reinartz	7eb849e6a8	Merge pull request #2307 from joyent/triton_discovery Add Joyent Triton discovery	2017-01-18 05:08:11 +01:00
Richard Kiene	f3d9692d09	Add Joyent Triton discovery	2017-01-17 20:34:32 +00:00
Björn Rabenstein	ad40d0abbc	Merge pull request #2288 from prometheus/limit-scrape Add ability to limit scrape samples, and related metrics	2017-01-08 01:34:06 +01:00
Brian Brazil	30448286c7	Add sample_limit to scrape config. This imposes a hard limit on the number of samples ingested from the target. This is counted after metric relabelling, to allow dropping of problemtic metrics. This is intended as a very blunt tool to prevent overload due to misbehaving targets that suddenly jump in sample count (e.g. adding a label containing email addresses). Add metric to track how often this happens. Fixes #2137	2016-12-16 15:10:09 +00:00
Tristan Colgate-McFarlane	4d9134e6d8	Add labeldrop and labelkeep actions. (#2279 ) Introduce two new relabel actions. labeldrop, and labelkeep. These can be used to filter the set of labels by matching regex - labeldrop: drops all labels that match the regex - labelkeep: drops all labels that do not match the regex	2016-12-14 10:17:42 +00:00
Fabian Reinartz	183c5749b9	config: add Alertmanager configuration	2016-11-23 18:23:37 +01:00
Fabian Reinartz	ec66082749	Merge branch 'ec2_sd_profile_support' of https://github.com/Ticketmaster/prometheus into Ticketmaster-ec2_sd_profile_support	2016-11-21 11:49:23 +01:00
Kraig Amador	bec6870ed4	ec2_sd_configs: Support profiles for configuring the ec2 service	2016-11-03 08:38:02 -07:00
beorn7	b2f28a9e82	Merge branch 'release-1.2'	2016-11-03 14:42:15 +01:00
Brian Brazil	d1ece12c70	Handle null Regex in the config as the empty regex. (#2150 )	2016-11-03 13:34:15 +00:00
bekbulatov	c689b35858	Merge branch 'master' into marathon_tls	2016-10-24 10:37:32 +01:00
Matti Savolainen	f867c1fd58	formating and text fixes, adjust regexp	2016-10-19 13:31:55 +03:00
Matti Savolainen	56907ba6e3	Add interpolation to good test config. Fix regex	2016-10-19 01:19:19 +03:00
bekbulatov	ac702f66eb	Resolve merge conflicts	2016-10-18 14:14:24 +01:00
Fabian Reinartz	1b6dfa32a9	config: rename role 'endpoint' to 'endpoints'	2016-10-17 11:53:49 +02:00
Frederic Branczyk	2e18c81a00	config: adapt unit tests	2016-10-17 10:32:10 +02:00
bekbulatov	01b53c1180	Add tls support	2016-10-07 13:40:22 +01:00
Brian Brazil	77605649a9	Add support for remote write relabelling. Switch back to a single remote writer, as we were only ever meant to have one and the relabel semantics are clearer that way.	2016-10-05 07:43:19 +01:00
Donatas Abraitis	1aa8898b66	Allow number to be the first letter as well for `job_name`	2016-09-16 14:06:47 +03:00
Fabian Reinartz	7221228843	discovery/kubernetes: select between discovery role This adds `role` field to the Kubernetes SD config, which indicates which type of Kubernetes SD should be run. This no longer allows discovering pods and nodes with the same SD configuration for example.	2016-07-05 14:22:12 +02:00
Fabian Reinartz	0f21bd31ca	config: deprecate `target_groups` for `static_configs` This change deprecates the `target_groups` option in favor of `static_configs`. The old configuration is still accepted but prints a warning. Configuration loading errors if both options are set.	2016-06-08 15:55:25 +02:00
Ali Reza	c81b4e8a87	change config names to files for consistency	2016-05-30 07:47:58 +07:00
Seth Miller	0988e3b937	Add support for Azure discovery This change adds the ability to do target discovery with Microsoft's Azure platform.	2016-04-06 22:47:02 -05:00
Julius Volz	829a029dda	Update two more __meta_dns_srv_name references. Although they are only in examples/tests and don't affect anything, they could be confusing (the label has been renamed in the rest of the code a while ago).	2016-02-14 22:20:39 +01:00
Julien Dehee	061fe2f364	Support AirBnB's Smartstack Nerve client for SD nerve's registration format differs from serverset. With this commit there is now a dedicated treecache file in util, and two separate files for serverset and nerve. Reference: https://github.com/airbnb/nerve	2016-01-18 14:07:28 +01:00
Fabian Reinartz	4d1c9296d5	Add new defaults for relabel configurations	2015-11-16 13:16:13 +01:00
Jimmi Dyson	87940ec213	Kubernetes SD: Rename `masters` to `api_servers` in config	2015-10-24 14:41:14 +01:00
Jimmi Dyson	7ff5cc66ea	Kubernetes SD authentication options cleanup	2015-10-23 16:47:52 +01:00
Matt Jibson	dcb4856d72	Add SD for Amazon EC2 instances	2015-10-06 18:36:17 -04:00
Julius Volz	dac26cef71	Rename global "labels" config option to "external_labels".	2015-09-29 20:54:20 +02:00
Matt Jibson	0e99fa6c46	Allow labelmap action	2015-09-21 15:41:19 -04:00
Jimmi Dyson	a1574aa2b3	Move TLS options to scrape config Fixes #1013, fixes #989	2015-09-09 09:52:21 +01:00

1 2

72 Commits (f174ae1f0913dd6f5183cbfb2a1fff3c1f294ddc)