prometheus

Commit Graph

Author	SHA1	Message	Date
Simon Pasquier	1cd29f782c	discovery/consul: close idle connections on stop Signed-off-by: Simon Pasquier <spasquie@redhat.com>	6 years ago
Romain Baugue	b41be4ef52	Discovery consul service meta (#4280 ) * Upgrade Consul client * Add ServiceMeta to the labels in ConsulSD Signed-off-by: Romain Baugue <romain.baugue@elwinar.com>	6 years ago
Julius Volz	5cf0113762	Add "omitempty" to some SD config YAML field tags (#4338 ) Especially for Kubernetes SD, this fixes a bug where the rendered configuration says "api_server: null", which when read back is not interpreted as an un-set API server (thus the default is not applied). Signed-off-by: Julius Volz <julius.volz@gmail.com>	6 years ago
Adam Shannon	809881d7f5	support reading basic_auth password_file for HTTP basic auth (#4077 ) Issue: https://github.com/prometheus/prometheus/issues/4076 Signed-off-by: Adam Shannon <adamkshannon@gmail.com>	7 years ago
sev3ryn	cc917aee7f	fix of endless loop while doing Consul service discovery. (#4044 ) Reloading Prometheus configs doesn't make loop end. It produced a goroutine leak	7 years ago
Manos Fokas	25f929b772	Yaml UnmarshalStrict implementation. (#4033 ) * Updated yaml vendor package. * remove checkOverflow duplicate in rulefmt * remove duplicated HTTPClientConfig.Validate() * Added yaml static check.	7 years ago
Corentin Chary	60dafd425c	consul: improve consul service discovery (#3814 ) * consul: improve consul service discovery Related to #3711 - Add the ability to filter by tag and node-meta in an efficient way (`/catalog/services` allow filtering by node-meta, and returns a `map[string]string` or `service`->`tags`). Tags and nore-meta are also used in `/catalog/service` requests. - Do not require a call to the catalog if services are specified by name. This is important because on large cluster `/catalog/services` changes all the time. - Add `allow_stale` configuration option to do stale reads. Non-stale reads can be costly, even more when you are doing them to a remote datacenter with 10k+ targets over WAN (which is common for federation). - Add `refresh_interval` to minimize the strain on the catalog and on the service endpoint. This is needed because of that kind of behavior from consul: https://github.com/hashicorp/consul/issues/3712 and because a catalog on a large cluster would basically change all the time. No need to discover targets in 1sec if we scrape them every minute. - Added plenty of unit tests. Benchmarks ---------- ```yaml scrape_configs: - job_name: prometheus scrape_interval: 60s static_configs: - targets: ["127.0.0.1:9090"] - job_name: "observability-by-tag" scrape_interval: "60s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 tag: marathon-user-observability # Used in After refresh_interval: 30s # Used in After+delay relabel_configs: - source_labels: [__meta_consul_tags] regex: ^(.,)?marathon-user-observability(,.)?$ action: keep - job_name: "observability-by-name" scrape_interval: "60s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 services: - observability-cerebro - observability-portal-web - job_name: "fake-fake-fake" scrape_interval: "15s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 services: - fake-fake-fake ``` Note: tested with ~1200 services, ~5000 nodes. \| Resource \| Empty \| Before \| After \| After + delay \| \| -------- \|:-----:\|:------:\|:-----:\|:-------------:\| \|/service-discovery size\|5K\|85MiB\|27k\|27k\|27k\| \|`go_memstats_heap_objects`\|100k\|1M\|120k\|110k\| \|`go_memstats_heap_alloc_bytes`\|24MB\|150MB\|28MB\|27MB\| \|`rate(go_memstats_alloc_bytes_total[5m])`\|0.2MB/s\|28MB/s\|2MB/s\|0.3MB/s\| \|`rate(process_cpu_seconds_total[5m])`\|0.1%\|15%\|2%\|0.01%\| \|`process_open_fds`\|16\|1236\|22\|22\| \|`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="services"}[5m])`\|~0\|1\|1\|0.03\| \|`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="service"}[5m])`\|0.1\|80\|0.5\|0.5\| \|`prometheus_target_sync_length_seconds{quantile="0.9",scrape_job="observability-by-tag"}`\|N/A\|200ms\|0.2ms\|0.2ms\| \|Network bandwidth\|~10kbps\|~2.8Mbps\|~1.6Mbps\|~10kbps\| Filtering by tag using relabel_configs uses 100kiB and 23kiB/s per service per job and quite a lot of CPU. Also sends and additional 1Mbps of traffic to consul. Being a little bit smarter about this reduces the overhead quite a lot. Limiting the number of `/catalog/services` queries per second almost removes the overhead of service discovery. * consul: tweak `refresh_interval` behavior `refresh_interval` now does what is advertised in the documentation, there won't be more that one update per `refresh_interval`. It now defaults to 30s (which was also the current waitTime in the consul query). This also make sure we don't wait another 30s if we already waited 29s in the blocking call by substracting the number of elapsed seconds. Hopefully this will do what people expect it does and will be safer for existing consul infrastructures.	7 years ago
zemek	8a01a0fbed	Set consul server default to localhost:8500 (#3703 )	7 years ago
Shubheksha Jalan	0471e64ad1	Use shared types from the `common` repo (#3674 ) * refactor: use shared types from common repo, remove util/config * vendor: add common/config * fix nit	7 years ago
Callum Styan	97464236c7	comments with TargetProvider should read Discoverer instead (#3667 )	7 years ago
Shubheksha Jalan	ec94df49d4	Refactor SD configuration to remove `config` dependency (#3629 ) * refactor: move targetGroup struct and CheckOverflow() to their own package * refactor: move auth and security related structs to a utility package, fix import error in utility package * refactor: Azure SD, remove SD struct from config * refactor: DNS SD, remove SD struct from config into dns package * refactor: ec2 SD, move SD struct from config into the ec2 package * refactor: file SD, move SD struct from config to file discovery package * refactor: gce, move SD struct from config to gce discovery package * refactor: move HTTPClientConfig and URL into util/config, fix import error in httputil * refactor: consul, move SD struct from config into consul discovery package * refactor: marathon, move SD struct from config into marathon discovery package * refactor: triton, move SD struct from config to triton discovery package, fix test * refactor: zookeeper, move SD structs from config to zookeeper discovery package * refactor: openstack, remove SD struct from config, move into openstack discovery package * refactor: kubernetes, move SD struct from config into kubernetes discovery package * refactor: notifier, use targetgroup package instead of config * refactor: tests for file, marathon, triton SD - use targetgroup package instead of config.TargetGroup * refactor: retrieval, use targetgroup package instead of config.TargetGroup * refactor: storage, use config util package * refactor: discovery manager, use targetgroup package instead of config.TargetGroup * refactor: use HTTPClient and TLS config from configUtil instead of config * refactor: tests, use targetgroup package instead of config.TargetGroup * refactor: fix tagetgroup.Group pointers that were removed by mistake * refactor: openstack, kubernetes: drop prefixes * refactor: remove import aliases forced due to vscode bug * refactor: move main SD struct out of config into discovery/config * refactor: rename configUtil to config_util * refactor: rename yamlUtil to yaml_config * refactor: kubernetes, remove prefixes * refactor: move the TargetGroup package to discovery/ * refactor: fix order of imports	7 years ago
Callum Styan	7776527390	bump consul HTTP client timeout by 5s so it doesn't match up exactly with the consul SD watch timeout	7 years ago
Julius Volz	099df0c5f0	Migrate "golang.org/x/net/context" -> "context" (#3333 ) In some places, where ctxhttp or gRPC are concerned, we still need to use the old contexts.	7 years ago
Callum Styan	45f9f3c539	use a timeout in the HTTP client used for consul sd (#3303 )	7 years ago
Marc Sluiter	6a633eece1	Added go-conntrack for monitoring http connections (#3241 ) Added metrics for in- and outgoing traffic with go-conntrack.	7 years ago
Fabian Reinartz	d21f149745	*: migrate to go-kit/log	7 years ago
Joe Martin	aba41c7d0f	add support for consul's node metadata	7 years ago
Roman Vynar	dbe2eb2afc	Hide consul token on UI. (#2797 )	8 years ago
Chris Goller	42de0ae013	Use log.Logger interface for all discovery services	8 years ago
Conor Broderick	6766123f93	Replace regex with Secret type and remarshal config to hide secrets (#2775 )	8 years ago
yklausz	75880b594f	Adding consul capability to connect via tls	8 years ago
Tobias Schmidt	58cd39aacd	Follow golang naming conventions in discovery packages	8 years ago
Fabian Reinartz	35da23fd82	consul: start service watch as goroutine	8 years ago
Fabian Reinartz	d19d1bcad3	discovery: move into top-level package	8 years ago
Fabian Reinartz	b4d7ce1370	discovery: respect context cancellation everywhere This also removes closing of the target group channel everywhere as the contexts cancels across all stages and we don't care about draining all events once that happened.	8 years ago
Fabian Reinartz	bc7bd7202c	discovery: terminate senders before closing channel Fixes #2200	8 years ago
Dominik Schulz	0c69227616	Add Consul-SD metrics (#2097 ) * Add Consul-SD metrics * Remove unnecessary metric and add labels to summary. * Do not stutter	8 years ago
Roman Vynar	db63a4bd2a	Do not fail Consul discovery on Prometheus startup when Consul is down.	8 years ago
Fabian Reinartz	a2589e7815	retrieval: correctly handle IPv6 addresses This updates all service discoveries to correctly build the __address__ label for IPv6 addresses.	8 years ago
Fabian Reinartz	a15237a0b8	retrieval: correctly handle IPv6 addresses This updates all service discoveries to correctly build the __address__ label for IPv6 addresses.	8 years ago
Nicholas Capo	84334a8410	discovery: use consul service address if available	9 years ago
Fabian Reinartz	086f7caceb	discovery: extract Consul shouldWatch logic	9 years ago
Fabian Reinartz	e805e68c01	discovery: sanitize Consul service discovery This commits simplifies the SD's structure and ensures that all channel sends are checked against a canceled context.	9 years ago
Fabian Reinartz	5837e6a97f	discovery: move consul SD into own package	9 years ago
Fabian Reinartz	5b30bdb610	Change TargetProvider interface. This commit changes the TargetProvider interface to use a context.Context and send lists of TargetGroups, rather than single ones.	9 years ago
Fabian Reinartz	29a69eecb8	Do not panic in Consul SD creation	9 years ago
Julius Volz	d88aea7e6f	Fix SD mechanism source prefix handling. The prefixed target provider changed a pointerized target group that was reused in the wrapped target provider, causing an ever-increasing chain of source prefixes in target groups from the Consul target provider. We now make this bug generally impossible by switching the target group channel from pointer to value type and thus ensuring that target groups are copied before being passed on to other parts of the system. I tried to not let the depointerization leak too far outside of the channel handling (both upstream and downstream) because I tried that initially and caused some nasty bugs, which I want to minimize. Fixes https://github.com/prometheus/prometheus/issues/1083	9 years ago
Fabian Reinartz	e3b6ec9784	Switch to common/log	9 years ago
Fabian Reinartz	1ce89a4a0b	Fix nil panic on consul error	9 years ago
Julius Volz	995d3b831d	Fix most golint warnings. This is with `golint -min_confidence=0.5`. I left several lint warnings untouched because they were either incorrect or I felt it was better not to change them at the moment.	9 years ago
Fabian Reinartz	438e232c9b	Fix grouping of import blocks	9 years ago
Fabian Reinartz	306e8468a0	Switch from client_golang/model to common/model	9 years ago
Sharif Nassar	6cb519fe82	Add Consul ServiceID to the discovery meta labels.	9 years ago
Robbie Trencheny	48e461f7db	Pass through current agent Consul datacenter name Instead of only filling __meta_consul_dc when datacenter is set in consul_sd_config this change fills the label based on what the agent reports it's current data center is, if datacenter isn't manually set, otherwise it uses whatever datacenter was set to.	9 years ago
Fabian Reinartz	d0a90964c1	Fix license header	9 years ago
Fabian Reinartz	3c6dd161d7	Scrape all services on empty services list.	9 years ago
Fabian Reinartz	4e84b86510	Improve target discovery pipeline Replace the TargetProvider Stop method with done channels that ensure properly broadcasted shutdown of the whole pipeline.	9 years ago
Fabian Reinartz	0138d37458	Improve unique target group sources. Include position of same SD mechanisms within the same scrape configuration. Move unique prefixing out of SD implementations and target manager into its own interface.	9 years ago
Florian Pfitzer	1fa0b0f253	fix consul port label	9 years ago
beorn7	645f6772e5	Add Consul Address, ServicePort, and ServiceAddress to the meta labels. In setups where the ServiceAddress is the relevant address for scraping, users can relabel the `__address__` label to ServiceAddress + ":" + ServicePort. This needs to be documented, of course. Will do once this is LGTM'd.	9 years ago

25 Commits (0bb810d1263c935e1de343978cb5a19ab0c71b07)