prometheus

Commit Graph

Author	SHA1	Message	Date
Julius Volz	5cf0113762	Add "omitempty" to some SD config YAML field tags (#4338 ) Especially for Kubernetes SD, this fixes a bug where the rendered configuration says "api_server: null", which when read back is not interpreted as an un-set API server (thus the default is not applied). Signed-off-by: Julius Volz <julius.volz@gmail.com>	2018-07-03 13:43:41 +02:00
Simon Pasquier	6eab4bbca1	kubernetes_sd: fix namespace filtering (#4273 ) Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2018-06-15 09:08:14 +01:00
Paul Gier	d24d2acd11	config: set target group source index during unmarshalling (#4245 ) * config: set target group source index during unmarshalling Fixes issue #4214 where the scrape pool is unnecessarily reloaded for a config reload where the config hasn't changed. Previously, the discovery manager changed the static config after loading which caused the in-memory config to differ from a freshly reloaded config. Signed-off-by: Paul Gier <pgier@redhat.com> * [issue #4214] Test that static targets are not modified by discovery manager Signed-off-by: Paul Gier <pgier@redhat.com>	2018-06-13 16:34:59 +01:00
Simon Pasquier	0e5e7f75cd	discovery/file: fix logging (#4178 ) Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2018-06-12 12:45:59 +01:00
Callum Styan	03578d5df8	add example usage of SD adapter for converting unsupported SD type to filesd (#3720 ) Signed-off-by: Callum Styan <callumstyan@gmail.com>	2018-05-30 13:14:34 +01:00
Adam Shannon	a22e1736b9	discovery/marathon: include url in fetchApps error (#4171 ) This was previously part of a larger PR, but that was closed. https://github.com/prometheus/prometheus/issues/4048#issuecomment-389899997 This change could include auth information in the URL. That's been fixed in upstream go, but not until Go 1.11. See: https://github.com/golang/go/issues/24572 Signed-off-by: Adam Shannon <adamkshannon@gmail.com>	2018-05-18 10:20:14 +01:00
Damien Lespiau	e64037053d	Expose controller kind and name to labelling rules Relabelling rules can use this information to attach the name of the controller that has created a pod. In turn, this can be used to slice metrics by workload at query time, ie. "Give me all metrics that have been created by the $name Deployment" Signed-off-by: Damien Lespiau <damien@weave.works>	2018-05-09 11:51:37 +02:00
Nathan Graves	5b27996cb3	Include GCE labels during service discovery. Updated vendor files for Google API. (#4150 ) Signed-off-by: Nathan Graves <nathan.graves@kofile.us>	2018-05-08 17:37:47 +01:00
beorn7	a4e4bec3fe	Merge branch 'release-2.2'	2018-04-30 14:38:29 +02:00
Elif T. Kuş	57dcdfb15f	Rewrote tests with testutil for several test files (#4086 ) * promql: Rewrote tests with testutil for functions_test Signed-off-by: Elif T. Kuş <elifkus@gmail.com> * pkg/relabel: Rewrote tests with testutil for relabel_test Signed-off-by: Elif T. Kuş <elifkus@gmail.com> * discovery/consul: Rewrote tests with testutil for consul_test Signed-off-by: Elif T. Kuş <elifkus@gmail.com> * scrape: Rewrote tests with testutil for manager_test Signed-off-by: Elif T. Kuş <elifkus@gmail.com>	2018-04-27 13:11:16 +01:00
Yecheng Fu	2be543e65a	Simplify some code and comments. Signed-off-by: Yecheng Fu <cofyc.jackson@gmail.com>	2018-04-25 19:29:34 +02:00
Yecheng Fu	46683dd67d	Simplify code. - Unified `send` function. - Pass InformerSynced functions to `cache.WaitForCacheSync`. - Use `Role\w+` constants instead of literal string. Signed-off-by: Yecheng Fu <cofyc.jackson@gmail.com>	2018-04-25 19:29:21 +02:00
Yecheng Fu	3a253f796c	Fix grammar in comments and add missing `expectedMaxItems` to let it break fast. Signed-off-by: Yecheng Fu <cofyc.jackson@gmail.com>	2018-04-25 19:29:03 +02:00
Yecheng Fu	d73b0d3141	Move hasSynced interface and its implementations to *_test.go files. Signed-off-by: Yecheng Fu <cofyc.jackson@gmail.com>	2018-04-25 19:28:49 +02:00
Yecheng Fu	8ceb8f2ae8	Refactor Kubernetes Discovery Part 2: Refactoring - Do initial listing and syncing to scrape manager, then register event handlers may lost events happening in listing and syncing (if it lasted a long time). We should register event handlers at the very begining, before processing just wait until informers synced (sync in informer will list all objects and call OnUpdate event handler). - Use a queue then we don't block event callbacks and an object will be processed only once if added multiple times before it being processed. - Fix bug in `serviceUpdate` in endpoints.go, we should build endpoints when `exists && err == nil`. Add `^TestEndpointsDiscoveryWithService` tests to test this feature. Testing: - Use `k8s.io/client-go` testing framework and fake implementations which are more robust and reliable for testing. - `Test\w+DiscoveryBeforeRun` are used to test objects created before discoverer runs - `Test\w+DiscoveryAdd\w+` are used to test adding objects - `Test\w+DiscoveryDelete\w+` are used to test deleting objects - `Test\w+DiscoveryUpdate\w+` are used to test updating objects - `TestEndpointsDiscoveryWithService\w+` are used to test endpoints events triggered by services - `cache.DeletedFinalStateUnknown` related stuffs are removed, because we don't care deleted objects in store, we only need its name to send a specical `targetgroup.Group` to scrape manager Signed-off-by: Yecheng Fu <cofyc.jackson@gmail.com>	2018-04-25 19:28:34 +02:00
Adam Shannon	809881d7f5	support reading basic_auth password_file for HTTP basic auth (#4077 ) Issue: https://github.com/prometheus/prometheus/issues/4076 Signed-off-by: Adam Shannon <adamkshannon@gmail.com>	2018-04-25 18:19:06 +01:00
Rohit Gupta	30c3e02864	Fixes #4090 . Marathon service discovery for 5XX http response (#4091 ) Signed-off-by: rohit01 <hello@rohit.io>	2018-04-17 09:28:06 +01:00
sev3ryn	cc917aee7f	fix of endless loop while doing Consul service discovery. (#4044 ) Reloading Prometheus configs doesn't make loop end. It produced a goroutine leak	2018-04-05 10:41:09 +01:00
Philippe Laflamme	2aba238f31	Use common HTTPClientConfig for marathon_sd configuration (#4009 ) This adds support for basic authentication which closes #3090 The support for specifying the client timeout was removed as discussed in https://github.com/prometheus/common/pull/123. Marathon was the only sd mechanism doing this and configuring the timeout is done through `Context`. DC/OS uses a custom `Authorization` header for authenticating. This adds 2 new configuration properties to reflect this. Existing configuration files that use the bearer token will no longer work. More work is required to make this backwards compatible.	2018-04-05 09:08:18 +01:00
Manos Fokas	25f929b772	Yaml UnmarshalStrict implementation. (#4033 ) * Updated yaml vendor package. * remove checkOverflow duplicate in rulefmt * remove duplicated HTTPClientConfig.Validate() * Added yaml static check.	2018-04-04 09:07:39 +01:00
albatross0	0245fd55bf	Add a machine type label to GCE SD (#4032 )	2018-03-31 09:20:19 +01:00
Kristiyan Nikolov	be85ba3842	discovery/ec2: Support filtering instances in discovery (#4011 )	2018-03-31 07:51:11 +01:00
Corentin Chary	60dafd425c	consul: improve consul service discovery (#3814 ) * consul: improve consul service discovery Related to #3711 - Add the ability to filter by tag and node-meta in an efficient way (`/catalog/services` allow filtering by node-meta, and returns a `map[string]string` or `service`->`tags`). Tags and nore-meta are also used in `/catalog/service` requests. - Do not require a call to the catalog if services are specified by name. This is important because on large cluster `/catalog/services` changes all the time. - Add `allow_stale` configuration option to do stale reads. Non-stale reads can be costly, even more when you are doing them to a remote datacenter with 10k+ targets over WAN (which is common for federation). - Add `refresh_interval` to minimize the strain on the catalog and on the service endpoint. This is needed because of that kind of behavior from consul: https://github.com/hashicorp/consul/issues/3712 and because a catalog on a large cluster would basically change all the time. No need to discover targets in 1sec if we scrape them every minute. - Added plenty of unit tests. Benchmarks ---------- ```yaml scrape_configs: - job_name: prometheus scrape_interval: 60s static_configs: - targets: ["127.0.0.1:9090"] - job_name: "observability-by-tag" scrape_interval: "60s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 tag: marathon-user-observability # Used in After refresh_interval: 30s # Used in After+delay relabel_configs: - source_labels: [__meta_consul_tags] regex: ^(.,)?marathon-user-observability(,.)?$ action: keep - job_name: "observability-by-name" scrape_interval: "60s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 services: - observability-cerebro - observability-portal-web - job_name: "fake-fake-fake" scrape_interval: "15s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 services: - fake-fake-fake ``` Note: tested with ~1200 services, ~5000 nodes. \| Resource \| Empty \| Before \| After \| After + delay \| \| -------- \|:-----:\|:------:\|:-----:\|:-------------:\| \|/service-discovery size\|5K\|85MiB\|27k\|27k\|27k\| \|`go_memstats_heap_objects`\|100k\|1M\|120k\|110k\| \|`go_memstats_heap_alloc_bytes`\|24MB\|150MB\|28MB\|27MB\| \|`rate(go_memstats_alloc_bytes_total[5m])`\|0.2MB/s\|28MB/s\|2MB/s\|0.3MB/s\| \|`rate(process_cpu_seconds_total[5m])`\|0.1%\|15%\|2%\|0.01%\| \|`process_open_fds`\|16\|1236\|22\|22\| \|`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="services"}[5m])`\|~0\|1\|1\|0.03\| \|`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="service"}[5m])`\|0.1\|80\|0.5\|0.5\| \|`prometheus_target_sync_length_seconds{quantile="0.9",scrape_job="observability-by-tag"}`\|N/A\|200ms\|0.2ms\|0.2ms\| \|Network bandwidth\|~10kbps\|~2.8Mbps\|~1.6Mbps\|~10kbps\| Filtering by tag using relabel_configs uses 100kiB and 23kiB/s per service per job and quite a lot of CPU. Also sends and additional 1Mbps of traffic to consul. Being a little bit smarter about this reduces the overhead quite a lot. Limiting the number of `/catalog/services` queries per second almost removes the overhead of service discovery. * consul: tweak `refresh_interval` behavior `refresh_interval` now does what is advertised in the documentation, there won't be more that one update per `refresh_interval`. It now defaults to 30s (which was also the current waitTime in the consul query). This also make sure we don't wait another 30s if we already waited 29s in the blocking call by substracting the number of elapsed seconds. Hopefully this will do what people expect it does and will be safer for existing consul infrastructures.	2018-03-23 14:48:43 +00:00
Ben Kochie	0d9fe18f5e	Fix nil context staticcheck error.	2018-03-22 07:59:39 +00:00
Aaron Kirkbride	c47fbcb626	Fix moved fsnotify dependency (#3995 )	2018-03-21 15:46:31 +00:00
Jeeyoung Kim	5b962c5748	Revert "Feature: Allow getting credentials via EC2 role (#3343 )" (#3985 ) This reverts commit `808f79f00a`.	2018-03-20 12:34:54 +00:00
Matt Palmer	042090a6d3	[dns_sd] Send an EDNS0 query by default (#3586 ) Based on https://groups.google.com/d/topic/prometheus-users/02kezHbuea4/discussion Does not attempt to handle a situation where the server does not understand EDNS0, however that is an unlikely case, and the behaviour of such ancient systems is hard to predict in advance, so if it does come up, it will need to be handled on a case-by-case basis.	2018-03-09 10:21:58 +00:00
Yecheng Fu	56ed29fbf7	Map target infos of endpoints to prometheus meta labels. (#3770 )	2018-03-09 10:07:00 +00:00
Marek Siarkowicz	86011047ca	Validate required fields in sd configuration (#3911 )	2018-03-05 19:27:54 +00:00
Krasi Georgiev	6b0e9ef183	Validate json parse for TargetGroup Unmarshal (#3614 ) Using DisallowUnknownFields in golang 1.10 to forbid unknown fields in targetGroups added the license header for the targetGroup test	2018-02-27 12:33:27 +00:00
Krasi Georgiev	4fa7e719f4	race in Triton SD Test (#3885 )	2018-02-26 10:03:50 +00:00
ferhat elmas	ffa673f7d8	General simplifications (#3887 ) Another try as in #1516	2018-02-26 07:58:10 +00:00
Pedro Araújo	575f665944	Add OS type meta label to Azure SD (#3863 ) There is currently no way to differentiate Windows instances from Linux ones. This is needed when you have a mix of node_exporters / wmi_exporters for OS-level metrics and you want to have them in separate scrape jobs. This change allows you to do just that. Example: ``` - job_name: 'node' azure_sd_configs: - <azure_sd_config> relabel_configs: - source_labels: [__meta_azure_machine_os_type] regex: Linux action: keep ``` The way the vendor'd AzureSDK provides to get the OsType is a bit awkward - as far as I can tell, this information can only be gotten from the startup disk. Newer versions of the SDK appear to improve this a bit (by having OS information in the InstanceView), but the current way still works.	2018-02-19 15:40:57 +00:00
Simon Pasquier	2072bbc824	Send update when pod's IP address is empty When the pod gets evicted, its IP address becomes empty and it needs to be removed from the targets.	2018-02-14 14:23:52 +01:00
Krasi Georgiev	b75428ec19	rename package retrieve to scrape no fucnctinal changes just renaming retrieval to scrape	2018-02-01 09:55:07 +00:00
Frederic Branczyk	d3ae1ac40e	Merge pull request #3741 from krasi-georgiev/discovery-race read/write race for the context field in the discovery package	2018-01-30 18:17:09 +01:00
pasquier-s	bde64cf5a6	Fix Kubernetes endpoints SD for empty subsets (#3660 ) * Fix Kubernetes endpoints SD for empty subsets When an endpoints object has no associated pods (replica scaled to zero for instance), the endpoints SD should return a target group with no targets so that the SD manager propagates this information to the scrape manager. Fixes #3659 * Don't send nil target groups from the Kubernetes SD This is to be consistent with the endpoints SD part.	2018-01-30 15:00:33 +00:00
Krasi Georgiev	818dda72db	updated the sd tests	2018-01-29 15:19:15 +00:00
Krasi Georgiev	acc4197098	remove dicovery race for the context field	2018-01-29 15:18:07 +00:00
Frederic Branczyk	47538cf6ce	Merge pull request #3747 from prometheus/sched-update-throttle Update throttle & tsdb update	2018-01-29 16:05:05 +01:00
Frederic Branczyk	73e829137b	discovery: Cleanup ticker	2018-01-29 13:51:04 +01:00
Ganesh Vernekar	66b0aa3b45	Fixed race condition in map iteration and map write in Discovery (#3735 ) (#3738 ) * Fixed concurrent map iteration and map write in Discovery (#3735) * discovery: Changed Lock to RLock in Collect	2018-01-28 22:24:31 +05:30
Krasi Georgiev	fe926e7829	update the discover tests the discovery test is now only testing update and get groups. It doesn't do an e2e test but just a unit test of setting and receiving target groups	2018-01-27 12:03:06 +00:00
Callum Styan	7dc05538f7	docs: SD implementations do not have to only send new/changed target groups (#3713 )	2018-01-26 22:03:11 +00:00
Frederic Branczyk	cfa0253ed8	discovery: Schedule updates to throttle	2018-01-26 16:24:44 +01:00
zemek	8a01a0fbed	Set consul server default to localhost:8500 (#3703 )	2018-01-24 12:14:32 +00:00
Julius Volz	09e460a647	discovery: Rename file SD mtime metric (#3723 ) - "timestamp" -> "mtime" to be in line with node exporter and clearer. - add unit suffix	2018-01-22 14:02:24 +01:00
Krasi Georgiev	ec26751fd2	use mutexes for the discovery manager instead of a loop as this was a stupid idea	2018-01-17 18:12:58 +00:00
Krasi Georgiev	767faa44b6	fixed the tests Signed-off-by: Krasi Georgiev <krasi.root@gmail.com>	2018-01-15 13:39:47 +00:00
Krasi Georgiev	d12e6f29fc	discovery manager ApplyConfig now takes a direct ServiceDiscoveryConfig so that it can be used for the notify manager reimplement the service discovery for the notify manager Signed-off-by: Krasi Georgiev <krasi.root@gmail.com>	2018-01-15 13:39:44 +00:00
Goutham Veeramachaneni	b20a1b1b1b	Merge pull request #3654 from krasi-georgiev/discovery-handle-discoverer-updates discovery - handle Discoverers that send only target Group updates.	2018-01-15 18:53:22 +05:30
Krasi Georgiev	790cf30fcb	remove uneeded check	2018-01-15 11:52:20 +00:00
Krasi Georgiev	38938ba493	comment nits	2018-01-15 11:47:36 +00:00
Krasi Georgiev	febebcd49a	more comments for the future ME, and reverted the Discovery manager execution changes as these were correct in the first place	2018-01-12 22:07:21 +00:00
Krasi Georgiev	78ba5e62a6	few mote usefull comments	2018-01-12 13:58:23 +00:00
Krasi Georgiev	cabce21b70	delete empty targets sets to avoid memory leaks	2018-01-12 13:10:59 +00:00
Krasi Georgiev	abfd9f1920	nits	2018-01-12 12:19:52 +00:00
Shubheksha Jalan	0471e64ad1	Use shared types from the `common` repo (#3674 ) * refactor: use shared types from common repo, remove util/config * vendor: add common/config * fix nit	2018-01-11 16:10:25 +01:00
Krasi Georgiev	546c29af5b	return early for nil target groups	2018-01-09 16:34:23 +00:00
Callum Styan	97464236c7	comments with TargetProvider should read Discoverer instead (#3667 )	2018-01-08 23:59:18 +00:00
Krasi Georgiev	77bf6bece0	discovery-manager comment update	2018-01-04 21:57:28 +00:00
Krasi Georgiev	135ea0f793	discovery manager - doesn't need sorting of the target groups so move it in the discovery manager tests as we only need it there. discovery manager - refactor the discovery tests. Signed-off-by: Krasi Georgiev <krasi.root@gmail.com>	2018-01-04 21:41:54 +00:00
Krasi Georgiev	638818a974	some Discoverers send nil targetgroup so need to check for it when updating a group	2018-01-04 13:57:34 +00:00
Krasi Georgiev	7e28397a2c	discovery - handle Discoverers that send only target Group updates rather than all Targets on every update. Signed-off-by: Krasi Georgiev <krasi.root@gmail.com>	2018-01-04 13:28:37 +00:00
Shubheksha Jalan	ec94df49d4	Refactor SD configuration to remove `config` dependency (#3629 ) * refactor: move targetGroup struct and CheckOverflow() to their own package * refactor: move auth and security related structs to a utility package, fix import error in utility package * refactor: Azure SD, remove SD struct from config * refactor: DNS SD, remove SD struct from config into dns package * refactor: ec2 SD, move SD struct from config into the ec2 package * refactor: file SD, move SD struct from config to file discovery package * refactor: gce, move SD struct from config to gce discovery package * refactor: move HTTPClientConfig and URL into util/config, fix import error in httputil * refactor: consul, move SD struct from config into consul discovery package * refactor: marathon, move SD struct from config into marathon discovery package * refactor: triton, move SD struct from config to triton discovery package, fix test * refactor: zookeeper, move SD structs from config to zookeeper discovery package * refactor: openstack, remove SD struct from config, move into openstack discovery package * refactor: kubernetes, move SD struct from config into kubernetes discovery package * refactor: notifier, use targetgroup package instead of config * refactor: tests for file, marathon, triton SD - use targetgroup package instead of config.TargetGroup * refactor: retrieval, use targetgroup package instead of config.TargetGroup * refactor: storage, use config util package * refactor: discovery manager, use targetgroup package instead of config.TargetGroup * refactor: use HTTPClient and TLS config from configUtil instead of config * refactor: tests, use targetgroup package instead of config.TargetGroup * refactor: fix tagetgroup.Group pointers that were removed by mistake * refactor: openstack, kubernetes: drop prefixes * refactor: remove import aliases forced due to vscode bug * refactor: move main SD struct out of config into discovery/config * refactor: rename configUtil to config_util * refactor: rename yamlUtil to yaml_config * refactor: kubernetes, remove prefixes * refactor: move the TargetGroup package to discovery/ * refactor: fix order of imports	2017-12-29 21:01:34 +01:00
Callum Styan	d76d5de66f	refactor to make timestamp collector work for multiple file_sd's	2017-12-23 10:13:11 +00:00
KalivarapuReshma	a00fc883c3	Add metric for timestamp of the files file_sd is using.	2017-12-23 10:13:11 +00:00
pasquier-s	78625f85a7	Fix race condition on file SD (#3468 ) The file discovery should only stop the watcher if it has been created otherwise it may trigger a segmentation fault.	2017-12-21 10:07:43 +00:00
Krasi Georgiev	587dec9eb9	rebased and resolved conflicts with the new Discovery GUI page Signed-off-by: Krasi Georgiev <krasi.root@gmail.com>	2017-12-18 20:10:03 +00:00
Krasi Georgiev	80182a5d82	use poolKey as the pool map key to avoid multi dimensional maps	2017-12-18 17:23:47 +00:00
Krasi Georgiev	1ec76d1950	rearange the contexts variables and logic split the groupsMerge function to set and get other small nits	2017-12-18 17:23:47 +00:00
Krasi Georgiev	f2df712166	updated README	2017-12-18 17:22:50 +00:00
Krasi Georgiev	aca8f85699	fixed the tests	2017-12-18 17:22:50 +00:00
Krasi Georgiev	fe6c544532	some renaming and comments fixes. remove some select state that is most likely obsoleete and hoepfully doesn't braje anything :) merge targets will sort by Discoverer name so we can have consistent tests for the maps.	2017-12-18 17:22:50 +00:00
Krasi Georgiev	f5c2c5ff8f	brake the start provider func so that can run unit tests against it.	2017-12-18 17:22:50 +00:00
Krasi Georgiev	c5cb0d2910	simplify naming and API.	2017-12-18 17:22:50 +00:00
Krasi Georgiev	9c61f0e8a0	scrape pool doesn't rely on context as Stop() needs to be blocking to prevent Scrape loops trying to write to a closed TSDB storage.	2017-12-18 17:22:49 +00:00
Krasi Georgiev	e405e2f1ea	refactored discovery	2017-12-18 17:22:49 +00:00
Brian Brazil	81db4716c1	Mention SD moratorium in README (#3573 )	2017-12-11 15:38:23 +00:00
Will Howard	6a80fc24cf	Parse the normalized container.PortMappings presented by the Marathon 1.5.x API Fixes #3465	2017-12-06 11:23:12 -05:00
Brian Brazil	d7b3df5ae1	Fix staticcheck errors	2017-12-02 14:52:13 +00:00
Krasi Georgiev	29506e0bca	one meaningless write to the config file to trigger anothe fsnotify (#3492 )	2017-12-01 17:32:27 +00:00
Tom Wilkie	099c50ce93	Avoid empty pod UID in test.	2017-11-24 15:02:42 +00:00
Tom Wilkie	9811e90d65	Fix tests.	2017-11-24 12:24:13 +00:00
Tom Wilkie	06dc1e8797	Include Pod UID in the discovery metadata.	2017-11-20 21:09:47 +00:00
Tobias Schmidt	91be55ebf0	Merge pull request #3458 from grandbora/test-race Fix race in test	2017-11-13 17:57:21 +01:00
Bora Tunca	493fd6bd1f	Fix race in test	2017-11-13 11:47:59 -05:00
Krasi Georgiev	1005ef0a70	Fix flaky file discovery tests - sync the channel draining goroutine	2017-11-13 12:12:01 +00:00
Bora Tunca	3cc01a3088	Add more discovery tests for updating target groups (#3426 ) * Adds a test covering the case where a target providers sends updated versions of the same target groups and the system should reconcile to the latest version of each of the target groups * Refactors how input data is represented in the tests. It used to be literal declarations of necessary structs, now it is parsing yaml. Yaml declarations are half as long as the former. And these can be put in a fixture file * Adds a tiny bit of refactoring on test timeouts	2017-11-12 03:39:08 +01:00
Krasi Georgiev	c8a735ceb6	Fix flaky file discovery tests (#3438 ) * flaky test caused by invalid fsnotify updates before the test files are written to disk causing the fd service to send empty `group[]` struct * `close(filesReady)` needs to be before the file closing so that fsnotify triggers a new loop of the discovery service. * nits * use filepath.Join for the path url to be cross platform * stupid mistake revert	2017-11-11 17:20:39 +01:00
Bora Tunca	e63219ae6a	Add discovery test (#3417 )	2017-11-06 17:33:52 +00:00
Bora Tunca	09be10a553	Add test to prove redundant calls to identical target providers (#3404 )	2017-11-06 16:14:15 +00:00
beorn7	348ea482ea	Merge branch 'beorn7/release'	2017-11-04 12:32:49 +01:00
Dominik Schulz	a731a43302	Guard against tags being nil in EC2 discovery Fixes #3001	2017-11-03 13:23:01 +01:00
Callum Styan	7776527390	bump consul HTTP client timeout by 5s so it doesn't match up exactly with the consul SD watch timeout	2017-10-28 16:42:42 -07:00
Jason Anderson	808f79f00a	Feature: Allow getting credentials via EC2 role (#3343 ) * Allow getting credentials via EC2 role This is subtly different than the existing `role_arn` solution, which allows Prometheus to assume an IAM role given some set of credentials already in-scope. With EC2 roles, one specifies the role at instance launch time (via an instance profile.) The instance then exposes temporary credentials via its metadata. The AWS Go SDK exposes a credential provider that polls the [instance metadata endpoint][1] already, so we can simply use that and it will take care of renewing the credentials when they expire. Without this, if this is being used inside EC2, it is difficult to cleanly allow the use of STS credentials. One has to set up a proxy role that can assume the role you really want, and launch the EC2 instance with the proxy role. This isn't very clean, and also doesn't seem to be [supported very well][2]. [1]: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html [2]: https://github.com/aws/aws-cli/issues/1390 * Automatically try to detect EC2 role credentials The `Available()` function exposed on ec2metadata returns a simple true/false if the ec2 metadata is available. This is the best way to know if we're actually running in EC2 (which is the only valid use-case for this credential provider.) This allows this to "just work" if you are using EC2 instance roles.	2017-10-25 14:15:39 +01:00
Julius Volz	099df0c5f0	Migrate "golang.org/x/net/context" -> "context" (#3333 ) In some places, where ctxhttp or gRPC are concerned, we still need to use the old contexts.	2017-10-24 21:21:42 -07:00
Julius Volz	c3d6abc8e6	Fix some lint errors (#3334 ) I left the promql ones and some others untouched as I remember that @fabxc prefers them that way.	2017-10-23 14:57:30 +01:00
Callum Styan	45f9f3c539	use a timeout in the HTTP client used for consul sd (#3303 )	2017-10-20 16:56:30 +01:00
Alexander Kazarin	2c163f32a5	fix for issue 2976 (#3313 ) fix for null pointer exception in ZookeeperLogger	2017-10-18 17:02:20 +01:00
pasquier-s	88e4815bb7	Get OpenStack variables from env as fallback (#3293 ) This change enables the OpenStack service discovery to read the authentication parameters from the OS_* environment variables when the identity endpoint URL is not defined in the Prometheus configuration file.	2017-10-16 18:01:50 +01:00
Marc Sluiter	6a633eece1	Added go-conntrack for monitoring http connections (#3241 ) Added metrics for in- and outgoing traffic with go-conntrack.	2017-10-06 11:22:19 +01:00
Fabian Reinartz	2d0b8e8b94	Merge branch 'master' into dev-2.0	2017-10-05 13:09:18 +02:00
Goutham Veeramachaneni	3f0267c548	Merge branch 'dev-2.0' into go-kit/log Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-09-15 23:15:27 +05:30
beorn7	84211bd2df	Foward-merge bug fixes and cherry-picks from 'release-1.7'	2017-09-15 13:44:22 +02:00
Matt Palmer	3369422327	Improve DNS response handling to prevent "stuck" records [Fixes #2799 ] (#3138 ) The problem reported in #2799 was that in the event that all records for a name were removed, the target group was never updated to be the "empty" set. Essentially, whatever Prometheus last saw as a non-empty list of targets would stay that way forever (or at least until Prometheus restarted...). This came about because of a fairly naive interpretation of what a valid-looking DNS response actually looked like -- essentially, the only valid DNS responses were ones that had a non-empty record list. That's fine as long as your config always lists only target names which have non-empty record sets; if your environment happens to legitimately have empty record sets sometimes, all hell breaks loose (otherwise-cleanly shutdown systems trigger up==0 alerts, for instance). This patch is a refactoring of the DNS lookup behaviour that maintains existing behaviour with regard to search paths, but correctly handles empty and non-existent record sets. RFC1034 s4.3.1 says there's three ways a recursive DNS server can respond: 1. Here is your answer (possibly an empty answer, because of the way DNS considers all records for a name, regardless of type, when deciding whether the name exists). 2. There is no spoon (the name you asked for definitely does not exist). 3. I am a teapot (something has gone terribly wrong). Situations 1 and 2 are fine and dandy; whatever the answer is (empty or otherwise) is the list of targets. If something has gone wrong, then we shouldn't go updating the target list because we don't really know what the target list should be. Multiple DNS servers to query is a straightforward augmentation; if you get an error, then try the next server in the list, until you get an answer or run out servers to ask. Only if all the servers return errors should you return an error to the calling code. Where things get complicated is the search path. In order to be able to confidently say, "this name does not exist anywhere, you can remove all the targets for this name because it's definitely GORN", at least one server for all the possible names need to return either successful-but-empty responses, or NXDOMAIN. If any name errors out, then -- since that one might have been the one where the records came from -- you need to say "maintain the status quo until we get a known-good response". It is possible, though unlikely, that a poorly-configured DNS setup (say, one which had a domain in its search path for which all configured recursive resolvers respond with REFUSED) could result in the same "stuck" records problem we're solving here, but the DNS configuration should be fixed in that case, and there's nothing we can do in Prometheus itself to fix the problem. I've tested this patch on a local scratch instance in all the various ways I can think of: 1. Adding records (targets get scraped) 2. Adding records of a different type 3. Remove records of the requested type, leaving other type records intact (targets don't get scraped) 4. Remove all records for the name (targets don't get scraped) 5. Shutdown the resolver (targets still get scraped) There's no automated test suite additions, because there isn't a test suite for DNS discovery, and I was stretching my Go skills to the limit to make this happen; mock objects are beyond me.	2017-09-15 12:26:10 +02:00
Goutham Veeramachaneni	f5aed810f9	logging: Port to common/promlog Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-09-15 12:40:50 +05:30
Matt Bostock	e758260986	Marathon SD: Set port index label The changes [1][] to Marathon service discovery to support multiple ports mean that Prometheus now attempts to scrape all ports belonging to a Marathon service. You can use port definition or port mapping labels to filter out which ports to scrape but that requires service owners to update their Marathon configuration. To allow for a smoother migration path, add a `__meta_marathon_port_index` label, whose value is set to the port's sequential index integer. For example, PORT0 has the value `0`, PORT1 has the value `1`, and so on. This allows you to support scraping both the first available port (the previous behaviour) in addition to ports with a `metrics` label. For example, here's the relabel configuration we might use with this patch: - action: keep source_labels: ['__meta_marathon_port_definition_label_metrics', '__meta_marathon_port_mapping_label_metrics', '__meta_marathon_port_index'] # Keep if port mapping or definition has a 'metrics' label with any # non-empty value, or if no 'metrics' port label exists but this is the # service's first available port regex: ([^;]+;;[^;]+\|;[^;]+;[^;]+\|;;0) This assumes that the Marathon API returns the ports in sorted order (matching PORT0, PORT1, etc), which it appears that it does. [1]: https://github.com/prometheus/prometheus/pull/2506	2017-09-11 13:40:51 +01:00
Fabian Reinartz	e746282772	Merge branch 'master' into dev-2.0	2017-09-11 10:55:19 +02:00
Jamie Moore	7a135e0a1b	Add the ability to assume a role for ec2 discovery	2017-09-10 00:36:43 +10:00
Fabian Reinartz	d21f149745	*: migrate to go-kit/log	2017-09-08 22:01:51 +05:30
Johannes 'fish' Ziemke	75aec7d970	k8s: Use versioned struct for ingress discovery	2017-09-06 12:47:03 +02:00
Fabian Reinartz	87918f3097	Merge branch 'master' into dev-2.0	2017-09-04 14:09:21 +02:00
Johannes 'fish' Ziemke	70f3d1e9f9	k8s: Support discovery of ingresses (#3111 ) * k8s: Support discovery of ingresses * Move additional labels below allocation This makes it more obvious why the additional elements are allocated. Also fix allocation for node where we only set a single label. * k8s: Remove port from ingress discovery * k8s: Add comment to ingress discovery example	2017-09-04 13:10:44 +02:00
Tobias Schmidt	29fff1eca4	Merge pull request #2966 from alkalinecoffee/consul-node-metadata Add support for consul's node metadata	2017-09-02 18:43:25 +02:00
Tobias Schmidt	d0a02703a2	Merge pull request #3105 from sak0/dev discovery openstack: support discovery hypervisors, add rule option.	2017-08-31 14:08:16 +02:00
CuiHaozhi	b1c18bf29b	discovery openstack: support discovery hosts, add rule option. Signed-off-by: CuiHaozhi <cuihz@wise2c.com>	2017-08-29 10:14:00 -04:00
Colstuwjx	2b49df2c61	Fix target group foreach nil bug, directly return err.	2017-08-22 08:37:39 +08:00
CuiHaozhi	31b6f8b04c	discovery openstack: handle instances without ip Signed-off-by: CuiHaozhi <cuihz@wise2c.com>	2017-08-11 12:36:12 -04:00
Fabian Reinartz	25f3e1c424	Merge branch 'master' into mergemaster	2017-08-10 17:04:25 +02:00
Fabian Reinartz	ac511ecf30	Merge pull request #2970 from Gouthamve/docs/sd-interface Add docs about SD interface	2017-08-01 22:44:28 +02:00
Goutham Veeramachaneni	ab96e79bc8	Add docs about SD interface Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-08-01 13:53:50 +05:30
Fabian Reinartz	40db026381	Merge pull request #2957 from prometheus/sd-doc Tweaks to SD README from review	2017-07-28 08:51:50 +02:00
Joe Martin	aba41c7d0f	add support for consul's node metadata	2017-07-18 16:46:16 -04:00
J. Taylor O'Connor	5a19ffb315	A few spelling corrections. (#2960 )	2017-07-17 22:13:50 +01:00
Brian Brazil	84be97bd98	Tweaks to SD README from review	2017-07-17 14:20:54 +01:00
Brian Brazil	2a9ca394dd	Document how/when to write service discovery (#2943 )	2017-07-14 15:22:09 +01:00
Fabian Reinartz	dba7586671	Merge branch 'master' into dev-2.0	2017-07-11 17:22:14 +02:00
Fuente, Pablo Andres	902fafb8e7	Fixing tests for Windows Fixing the config/config_test, the discovery/file/file_test and the promql/promql_test tests for Windows. For most of the tests, the fix involved correct handling of path separators. In the case of the promql tests, the issue was related to the removal of the temporal directories used by the storage. The issue is that the RemoveAll() call returns an error when it tries to remove a directory which is not empty, which seems to be true due to some kind of process that is still running after closing the storage. To fix it I added some retries to the remove of the temporal directories. Adding tags file from Universal Ctags to .gitignore	2017-07-09 01:59:30 -03:00
Matt Bostock	ab4d64959f	Marathon SD: Set port index label The changes [1][] to Marathon service discovery to support multiple ports mean that Prometheus now attempts to scrape all ports belonging to a Marathon service. You can use port definition or port mapping labels to filter out which ports to scrape but that requires service owners to update their Marathon configuration. To allow for a smoother migration path, add a `__meta_marathon_port_index` label, whose value is set to the port's sequential index integer. For example, PORT0 has the value `0`, PORT1 has the value `1`, and so on. This allows you to support scraping both the first available port (the previous behaviour) in addition to ports with a `metrics` label. For example, here's the relabel configuration we might use with this patch: - action: keep source_labels: ['__meta_marathon_port_definition_label_metrics', '__meta_marathon_port_mapping_label_metrics', '__meta_marathon_port_index'] # Keep if port mapping or definition has a 'metrics' label with any # non-empty value, or if no 'metrics' port label exists but this is the # service's first available port regex: ([^;]+;;[^;]+\|;[^;]+;[^;]+\|;;0) This assumes that the Marathon API returns the ports in sorted order (matching PORT0, PORT1, etc), which it appears that it does. [1]: https://github.com/prometheus/prometheus/pull/2506	2017-06-23 09:52:52 +01:00
Goutham Veeramachaneni	507790a357	Rework logging to use explicitly passed logger Mostly cleaned up the global logger use. Still some uses in discovery package. Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-06-16 15:52:44 +05:30
Christian Groschupp	8f781e411c	Openstack Service Discovery (#2701 ) * Add openstack service discovery. * Add gophercloud code for openstack service discovery. * first changes for juliusv comments. * add gophercloud code for floatingip. * Add tests to openstack sd. * Add testify suite vendor files. * add copyright and make changes for code climate. * Fixed typos in provider openstack. * Renamed tenant to project in openstack sd. * Change type of password to Secret in openstack sd.	2017-06-01 23:49:02 +02:00
Roman Vynar	dbe2eb2afc	Hide consul token on UI. (#2797 )	2017-06-01 22:14:23 +01:00
Chris Goller	42de0ae013	Use log.Logger interface for all discovery services	2017-06-01 11:25:55 -05:00
Tobias Schmidt	287ec6e6cc	Fix outdated target_group naming in error message The target_groups config has been renamed to static_configs, the error message for overflow attributes should reflect that.	2017-05-31 11:01:13 +02:00
Conor Broderick	6766123f93	Replace regex with Secret type and remarshal config to hide secrets (#2775 )	2017-05-29 12:46:23 +01:00
Fabian Reinartz	11aa049b05	Merge branch 'release-1.6' into merge16	2017-05-11 15:00:51 +02:00
Fabian Reinartz	ddbbd2b712	Merge branch 'release-1.5' into cut162	2017-05-11 14:29:49 +02:00
Fabian Reinartz	2ff8855ae6	discovery/k8s: update client library	2017-05-11 13:53:12 +02:00
Fabian Reinartz	aaaec6431e	Merge pull request #2642 from bakins/kubernetes-namespaces Allow limiting Kubernetes service discover to certain namespaces	2017-05-04 07:36:21 +02:00
Stephan Erb	0b9fca983b	Fix reload of ZooKeeper service discovery config (#2669 ) Rational: * When the config is reloaded and the provider context is canceled, we need to exit the current ZK `TargetProvider.Run` method as a new provider will be instantiated. * In case `Stop` is called on the `ZookeeperTreeCache`, the update/events channel may not be closed as it is shared by multiple caches and would thus be double closed. * Stopping all `zookeeperTreeCacheNode`s on teardown ensures all associated watcher go-routines will be closed eagerly rather than implicityly on connection close events.	2017-05-02 18:21:37 -05:00
Brian Akins	27d66628a1	Allow limiting Kubernetes service discover to certain namespaces Allow namespace discovery to be more easily extended in the future by using a struct rather than just a list. Rename fields for kubernetes namespace discovery	2017-04-27 07:41:36 -04:00
Goutham Veeramachaneni	0f48d07f95	Fix Map Race by Moving Locking closer to the Write (#2476 )	2017-04-07 08:55:01 +02:00
Richard Kiene	ec692f6161	Add triton zone brand metadata	2017-04-06 21:35:42 +00:00
Julius Volz	525da88c35	Merge pull request #2479 from YKlausz/consul-tls Adding consul capability to connect via tls	2017-03-20 11:40:18 +01:00
Robson Roberto Souza Peixoto	cc3e859d9e	Add support for multiple ports in Marathon (#2506 ) - create a target for every port - add meta labels for Marathon labels in portMappings and portDefinitions	2017-03-18 22:10:44 +02:00
yklausz	75880b594f	Adding consul capability to connect via tls	2017-03-17 22:37:18 +01:00
Tobias Schmidt	7bde44e98e	Remove testing.T usage in goroutines The staticcheck warns about testing.T usage in goroutines. Moving the t.Fatal* calls to the main thread showed immediately that this is a good practice, as one of the test setups didn't work.	2017-03-16 23:40:46 -03:00
Tobias Schmidt	58cd39aacd	Follow golang naming conventions in discovery packages	2017-03-16 23:40:46 -03:00
Robert Neumayer	feb7670929	Add tests for consul service discovery (#2490 ) * Add tests for consul service discovery * Add license header * Address comments * inline variables * check for extra error * Fix error formatting	2017-03-15 09:33:53 +01:00
Michael Kraus	690b49e503	Fix marathon tests	2017-03-06 11:36:55 +01:00
Michael Kraus	31252cc1b5	Clarify explicit use of authorization header	2017-03-06 11:36:36 +01:00
Michael Kraus	47bdcf0f67	Allow the use of bearer_token or bearer_token_file for MarathonSD	2017-03-02 09:44:20 +01:00
James Hartig	865f28bb15	discovery: Instead of looping over conf.Search, use NameList()	2017-02-13 15:48:51 -05:00
Alex Somesan	b22eb65d0f	Cleaner separation between ServiceAccount and custom authentication in K8S SD (#2348 ) * Canonical usage of cluster service-account in K8S SD * Early validation for opt-in custom auth in K8S SD * Fix typo in condition	2017-01-19 10:52:52 +01:00
Richard Kiene	f3d9692d09	Add Joyent Triton discovery	2017-01-17 20:34:32 +00:00
Fabian Reinartz	35da23fd82	consul: start service watch as goroutine	2016-11-27 11:01:16 +01:00
Fabian Reinartz	200bbe1bad	config: extract SD and HTTPClient configurations	2016-11-23 18:23:37 +01:00
Fabian Reinartz	d7f4f8b879	discovery: move TargetSet into discovery package	2016-11-23 09:14:44 +01:00
Fabian Reinartz	d19d1bcad3	discovery: move into top-level package	2016-11-22 12:56:33 +01:00

... 11 12 13 14 15 ...

760 Commits (0a7729469d90fa2d2fa74b156052d2be5c89480e)