prometheus

Commit Graph

Author	SHA1	Message	Date
Frederic Branczyk	f7e1a94b03	Merge pull request #4329 from nailgun/4327-ingress-discovery-issue discovery/kubernetes/ingress: fix TLS discovery	6 years ago
Krasi Georgiev	0b93fd6d5e	fix the zookeper race (#4355 ) Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>	6 years ago
Simon Pasquier	dacb6c530a	discovery/file: fix logging (#4178 ) Signed-off-by: Simon Pasquier <spasquie@redhat.com>	7 years ago
Paul Gier	5c70213f9f	config: set target group source index during unmarshalling (#4245 ) * config: set target group source index during unmarshalling Fixes issue #4214 where the scrape pool is unnecessarily reloaded for a config reload where the config hasn't changed. Previously, the discovery manager changed the static config after loading which caused the in-memory config to differ from a freshly reloaded config. Signed-off-by: Paul Gier <pgier@redhat.com> * [issue #4214] Test that static targets are not modified by discovery manager Signed-off-by: Paul Gier <pgier@redhat.com>	7 years ago
Simon Pasquier	8cd59da857	kubernetes_sd: fix namespace filtering (#4273 ) Signed-off-by: Simon Pasquier <spasquie@redhat.com>	7 years ago
Callum Styan	03578d5df8	add example usage of SD adapter for converting unsupported SD type to filesd (#3720 ) Signed-off-by: Callum Styan <callumstyan@gmail.com>	7 years ago
Adam Shannon	a22e1736b9	discovery/marathon: include url in fetchApps error (#4171 ) This was previously part of a larger PR, but that was closed. https://github.com/prometheus/prometheus/issues/4048#issuecomment-389899997 This change could include auth information in the URL. That's been fixed in upstream go, but not until Go 1.11. See: https://github.com/golang/go/issues/24572 Signed-off-by: Adam Shannon <adamkshannon@gmail.com>	7 years ago
Damien Lespiau	e64037053d	Expose controller kind and name to labelling rules Relabelling rules can use this information to attach the name of the controller that has created a pod. In turn, this can be used to slice metrics by workload at query time, ie. "Give me all metrics that have been created by the $name Deployment" Signed-off-by: Damien Lespiau <damien@weave.works>	7 years ago
Nathan Graves	5b27996cb3	Include GCE labels during service discovery. Updated vendor files for Google API. (#4150 ) Signed-off-by: Nathan Graves <nathan.graves@kofile.us>	7 years ago
Elif T. Kuş	57dcdfb15f	Rewrote tests with testutil for several test files (#4086 ) * promql: Rewrote tests with testutil for functions_test Signed-off-by: Elif T. Kuş <elifkus@gmail.com> * pkg/relabel: Rewrote tests with testutil for relabel_test Signed-off-by: Elif T. Kuş <elifkus@gmail.com> * discovery/consul: Rewrote tests with testutil for consul_test Signed-off-by: Elif T. Kuş <elifkus@gmail.com> * scrape: Rewrote tests with testutil for manager_test Signed-off-by: Elif T. Kuş <elifkus@gmail.com>	7 years ago
Yecheng Fu	2be543e65a	Simplify some code and comments. Signed-off-by: Yecheng Fu <cofyc.jackson@gmail.com>	7 years ago
Yecheng Fu	46683dd67d	Simplify code. - Unified `send` function. - Pass InformerSynced functions to `cache.WaitForCacheSync`. - Use `Role\w+` constants instead of literal string. Signed-off-by: Yecheng Fu <cofyc.jackson@gmail.com>	7 years ago
Yecheng Fu	3a253f796c	Fix grammar in comments and add missing `expectedMaxItems` to let it break fast. Signed-off-by: Yecheng Fu <cofyc.jackson@gmail.com>	7 years ago
Yecheng Fu	d73b0d3141	Move hasSynced interface and its implementations to *_test.go files. Signed-off-by: Yecheng Fu <cofyc.jackson@gmail.com>	7 years ago
Yecheng Fu	8ceb8f2ae8	Refactor Kubernetes Discovery Part 2: Refactoring - Do initial listing and syncing to scrape manager, then register event handlers may lost events happening in listing and syncing (if it lasted a long time). We should register event handlers at the very begining, before processing just wait until informers synced (sync in informer will list all objects and call OnUpdate event handler). - Use a queue then we don't block event callbacks and an object will be processed only once if added multiple times before it being processed. - Fix bug in `serviceUpdate` in endpoints.go, we should build endpoints when `exists && err == nil`. Add `^TestEndpointsDiscoveryWithService` tests to test this feature. Testing: - Use `k8s.io/client-go` testing framework and fake implementations which are more robust and reliable for testing. - `Test\w+DiscoveryBeforeRun` are used to test objects created before discoverer runs - `Test\w+DiscoveryAdd\w+` are used to test adding objects - `Test\w+DiscoveryDelete\w+` are used to test deleting objects - `Test\w+DiscoveryUpdate\w+` are used to test updating objects - `TestEndpointsDiscoveryWithService\w+` are used to test endpoints events triggered by services - `cache.DeletedFinalStateUnknown` related stuffs are removed, because we don't care deleted objects in store, we only need its name to send a specical `targetgroup.Group` to scrape manager Signed-off-by: Yecheng Fu <cofyc.jackson@gmail.com>	7 years ago
Adam Shannon	809881d7f5	support reading basic_auth password_file for HTTP basic auth (#4077 ) Issue: https://github.com/prometheus/prometheus/issues/4076 Signed-off-by: Adam Shannon <adamkshannon@gmail.com>	7 years ago
Rohit Gupta	30c3e02864	Fixes #4090 . Marathon service discovery for 5XX http response (#4091 ) Signed-off-by: rohit01 <hello@rohit.io>	7 years ago
sev3ryn	cc917aee7f	fix of endless loop while doing Consul service discovery. (#4044 ) Reloading Prometheus configs doesn't make loop end. It produced a goroutine leak	7 years ago
Philippe Laflamme	2aba238f31	Use common HTTPClientConfig for marathon_sd configuration (#4009 ) This adds support for basic authentication which closes #3090 The support for specifying the client timeout was removed as discussed in https://github.com/prometheus/common/pull/123. Marathon was the only sd mechanism doing this and configuring the timeout is done through `Context`. DC/OS uses a custom `Authorization` header for authenticating. This adds 2 new configuration properties to reflect this. Existing configuration files that use the bearer token will no longer work. More work is required to make this backwards compatible.	7 years ago
Manos Fokas	25f929b772	Yaml UnmarshalStrict implementation. (#4033 ) * Updated yaml vendor package. * remove checkOverflow duplicate in rulefmt * remove duplicated HTTPClientConfig.Validate() * Added yaml static check.	7 years ago
albatross0	0245fd55bf	Add a machine type label to GCE SD (#4032 )	7 years ago
Kristiyan Nikolov	be85ba3842	discovery/ec2: Support filtering instances in discovery (#4011 )	7 years ago
Corentin Chary	60dafd425c	consul: improve consul service discovery (#3814 ) * consul: improve consul service discovery Related to #3711 - Add the ability to filter by tag and node-meta in an efficient way (`/catalog/services` allow filtering by node-meta, and returns a `map[string]string` or `service`->`tags`). Tags and nore-meta are also used in `/catalog/service` requests. - Do not require a call to the catalog if services are specified by name. This is important because on large cluster `/catalog/services` changes all the time. - Add `allow_stale` configuration option to do stale reads. Non-stale reads can be costly, even more when you are doing them to a remote datacenter with 10k+ targets over WAN (which is common for federation). - Add `refresh_interval` to minimize the strain on the catalog and on the service endpoint. This is needed because of that kind of behavior from consul: https://github.com/hashicorp/consul/issues/3712 and because a catalog on a large cluster would basically change all the time. No need to discover targets in 1sec if we scrape them every minute. - Added plenty of unit tests. Benchmarks ---------- ```yaml scrape_configs: - job_name: prometheus scrape_interval: 60s static_configs: - targets: ["127.0.0.1:9090"] - job_name: "observability-by-tag" scrape_interval: "60s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 tag: marathon-user-observability # Used in After refresh_interval: 30s # Used in After+delay relabel_configs: - source_labels: [__meta_consul_tags] regex: ^(.,)?marathon-user-observability(,.)?$ action: keep - job_name: "observability-by-name" scrape_interval: "60s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 services: - observability-cerebro - observability-portal-web - job_name: "fake-fake-fake" scrape_interval: "15s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 services: - fake-fake-fake ``` Note: tested with ~1200 services, ~5000 nodes. \| Resource \| Empty \| Before \| After \| After + delay \| \| -------- \|:-----:\|:------:\|:-----:\|:-------------:\| \|/service-discovery size\|5K\|85MiB\|27k\|27k\|27k\| \|`go_memstats_heap_objects`\|100k\|1M\|120k\|110k\| \|`go_memstats_heap_alloc_bytes`\|24MB\|150MB\|28MB\|27MB\| \|`rate(go_memstats_alloc_bytes_total[5m])`\|0.2MB/s\|28MB/s\|2MB/s\|0.3MB/s\| \|`rate(process_cpu_seconds_total[5m])`\|0.1%\|15%\|2%\|0.01%\| \|`process_open_fds`\|16\|1236\|22\|22\| \|`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="services"}[5m])`\|~0\|1\|1\|0.03\| \|`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="service"}[5m])`\|0.1\|80\|0.5\|0.5\| \|`prometheus_target_sync_length_seconds{quantile="0.9",scrape_job="observability-by-tag"}`\|N/A\|200ms\|0.2ms\|0.2ms\| \|Network bandwidth\|~10kbps\|~2.8Mbps\|~1.6Mbps\|~10kbps\| Filtering by tag using relabel_configs uses 100kiB and 23kiB/s per service per job and quite a lot of CPU. Also sends and additional 1Mbps of traffic to consul. Being a little bit smarter about this reduces the overhead quite a lot. Limiting the number of `/catalog/services` queries per second almost removes the overhead of service discovery. * consul: tweak `refresh_interval` behavior `refresh_interval` now does what is advertised in the documentation, there won't be more that one update per `refresh_interval`. It now defaults to 30s (which was also the current waitTime in the consul query). This also make sure we don't wait another 30s if we already waited 29s in the blocking call by substracting the number of elapsed seconds. Hopefully this will do what people expect it does and will be safer for existing consul infrastructures.	7 years ago
Ben Kochie	0d9fe18f5e	Fix nil context staticcheck error.	7 years ago
Aaron Kirkbride	c47fbcb626	Fix moved fsnotify dependency (#3995 )	7 years ago
Jeeyoung Kim	5b962c5748	Revert "Feature: Allow getting credentials via EC2 role (#3343 )" (#3985 ) This reverts commit `808f79f00a`.	7 years ago
Matt Palmer	042090a6d3	[dns_sd] Send an EDNS0 query by default (#3586 ) Based on https://groups.google.com/d/topic/prometheus-users/02kezHbuea4/discussion Does not attempt to handle a situation where the server does not understand EDNS0, however that is an unlikely case, and the behaviour of such ancient systems is hard to predict in advance, so if it does come up, it will need to be handled on a case-by-case basis.	7 years ago
Yecheng Fu	56ed29fbf7	Map target infos of endpoints to prometheus meta labels. (#3770 )	7 years ago
Marek Siarkowicz	86011047ca	Validate required fields in sd configuration (#3911 )	7 years ago
Krasi Georgiev	6b0e9ef183	Validate json parse for TargetGroup Unmarshal (#3614 ) Using DisallowUnknownFields in golang 1.10 to forbid unknown fields in targetGroups added the license header for the targetGroup test	7 years ago
Krasi Georgiev	4fa7e719f4	race in Triton SD Test (#3885 )	7 years ago
ferhat elmas	ffa673f7d8	General simplifications (#3887 ) Another try as in #1516	7 years ago
Pedro Araújo	575f665944	Add OS type meta label to Azure SD (#3863 ) There is currently no way to differentiate Windows instances from Linux ones. This is needed when you have a mix of node_exporters / wmi_exporters for OS-level metrics and you want to have them in separate scrape jobs. This change allows you to do just that. Example: ``` - job_name: 'node' azure_sd_configs: - <azure_sd_config> relabel_configs: - source_labels: [__meta_azure_machine_os_type] regex: Linux action: keep ``` The way the vendor'd AzureSDK provides to get the OsType is a bit awkward - as far as I can tell, this information can only be gotten from the startup disk. Newer versions of the SDK appear to improve this a bit (by having OS information in the InstanceView), but the current way still works.	7 years ago
Simon Pasquier	2072bbc824	Send update when pod's IP address is empty When the pod gets evicted, its IP address becomes empty and it needs to be removed from the targets.	7 years ago
Krasi Georgiev	b75428ec19	rename package retrieve to scrape no fucnctinal changes just renaming retrieval to scrape	7 years ago
pasquier-s	bde64cf5a6	Fix Kubernetes endpoints SD for empty subsets (#3660 ) * Fix Kubernetes endpoints SD for empty subsets When an endpoints object has no associated pods (replica scaled to zero for instance), the endpoints SD should return a target group with no targets so that the SD manager propagates this information to the scrape manager. Fixes #3659 * Don't send nil target groups from the Kubernetes SD This is to be consistent with the endpoints SD part.	7 years ago
Krasi Georgiev	818dda72db	updated the sd tests	7 years ago
Krasi Georgiev	acc4197098	remove dicovery race for the context field	7 years ago
Frederic Branczyk	73e829137b	discovery: Cleanup ticker	7 years ago
Ganesh Vernekar	66b0aa3b45	Fixed race condition in map iteration and map write in Discovery (#3735 ) (#3738 ) * Fixed concurrent map iteration and map write in Discovery (#3735) * discovery: Changed Lock to RLock in Collect	7 years ago
Krasi Georgiev	fe926e7829	update the discover tests the discovery test is now only testing update and get groups. It doesn't do an e2e test but just a unit test of setting and receiving target groups	7 years ago
Callum Styan	7dc05538f7	docs: SD implementations do not have to only send new/changed target groups (#3713 )	7 years ago
Frederic Branczyk	cfa0253ed8	discovery: Schedule updates to throttle	7 years ago
zemek	8a01a0fbed	Set consul server default to localhost:8500 (#3703 )	7 years ago
Julius Volz	09e460a647	discovery: Rename file SD mtime metric (#3723 ) - "timestamp" -> "mtime" to be in line with node exporter and clearer. - add unit suffix	7 years ago
Krasi Georgiev	ec26751fd2	use mutexes for the discovery manager instead of a loop as this was a stupid idea	7 years ago
Krasi Georgiev	767faa44b6	fixed the tests Signed-off-by: Krasi Georgiev <krasi.root@gmail.com>	7 years ago
Krasi Georgiev	d12e6f29fc	discovery manager ApplyConfig now takes a direct ServiceDiscoveryConfig so that it can be used for the notify manager reimplement the service discovery for the notify manager Signed-off-by: Krasi Georgiev <krasi.root@gmail.com>	7 years ago
Krasi Georgiev	790cf30fcb	remove uneeded check	7 years ago
Krasi Georgiev	38938ba493	comment nits	7 years ago

1 2 3 4

161 Commits (fc2a9c986b64a3354c94777261c6e90ad472dd29)