prometheus

Commit Graph

Author	SHA1	Message	Date
beorn7	3610331eeb	Retrieval: Do not buffer the samples if no sample limit configured Also, simplify and streamline the code a bit.	2017-01-07 18:18:54 +01:00
André Carvalho	c43dfaba1c	Add max concurrent and current queries engine metrics (#2326 ) * Add max concurrent and current queries engine metrics This commit adds two metrics to the promql/engine: the number of max concurrent queries, as configured by the flag, and the number of current queries being served+blocked in the engine.	2017-01-07 14:41:25 +00:00
beorn7	767c0709b1	Retrieval: Avoid copying Target retreival.Target contains a mutex. It was copied in the Targets() call. This potentially can wreak a lot of havoc. It might even have caused the issues reported as #2266 and #2262 .	2017-01-06 18:43:41 +01:00
Brian Brazil	f9e581907a	Make index queue bigger. (#2322 ) When a large Prometheus starts up fresh it can take many minutes to warmup and clear out the index queue. A larger queue means less blocking, bigger batches and cuts down startup time by ~50%.	2017-01-05 17:57:42 +00:00
Fabian Reinartz	c9f4aea8e2	Merge pull request #2305 from alicebob/favicon Add a favicon to the web GUI	2017-01-04 10:15:27 +01:00
Martin Lehmann	78fae3155f	Make relative links in README.md absolute (#2316 ) The relative links don't work in other pages that render the README (for example https://hub.docker.com/r/prom/prometheus/). As they are (hopefully) not due to change any time soon, I think using absolute links is better.	2017-01-03 20:07:33 +00:00
Julius Volz	90dd216646	Merge pull request #2306 from EdSchouten/sorted-alerts Use lexicographic order to sort alerts by name.	2016-12-31 13:12:30 +01:00
Mitsuhiro Tanda	7e369b9318	expose max memory chunks metrics (#2303 ) * expose max memory chunks metrics	2016-12-27 18:34:07 +00:00
Ed Schouten	b3a39ccd8a	Use lexicographic order to sort alerts by name. Right now the /alerts page of Prometheus sorts alerts by severity (firing, pending, inactive). Once multiple alerts have the same severity, their order seems to correlate to how they are placed in the configuration files, but not always. Looking at the code, we make use of sort.Sort(), which is documented not to provide a stable sort. The Less() function also only takes the alert state into account. This change extends the Less() function to provide a lexicographic order on both the alert state and the name. This means I can finally find the alerts I'm looking for without using my browser's search feature.	2016-12-27 14:28:44 +01:00
Harmen	135d32ea22	make assets	2016-12-27 13:59:20 +01:00
Harmen	dfa4f79bcd	add favicon	2016-12-27 13:58:51 +01:00
Brian Brazil	93b70ee4ea	Evict chunk descs of all unloaded chunks during maintenance. (#2297 ) Keeping these around has two problems: 1) Each desc takes 64 bytes, 10 of them is 640B. This is a lot of overhead on a 1024 byte chunk. 2) It can take well over a week to reach a point where this and thus Prometheus memory usage as a whole enters steady state. This makes RAM estimation very hard for users, and makes it difficult to investigate things like memory fragmentation. Instead we'll wipe them during each memory series maintenance cycle, and if a query pulls them in they'll hang around as cache until the next cycle.	2016-12-22 13:49:03 +00:00
Brian Brazil	bed4635802	Use irate consistently in console template examples. (#2296 ) I must have forgotten my 'g' when switching these.	2016-12-21 13:19:23 +00:00
Fabian Reinartz	d6d03a966f	Merge pull request #2295 from prometheus/fast-path-remote Don't clone the metric if there's no remote writes.	2016-12-21 12:36:41 +01:00
Brian Brazil	1b8a474612	Don't clone the metric if there's no remote writes. The metric clone can't be further optimised, and is a non-trivial memory allocation cost so fast path it if there's no remote writes configured.	2016-12-21 11:34:48 +00:00
Brian Brazil	6c07453ec1	Only clone the metric in the one place relabelling needs it. (#2292 ) This cuts ~17% off memory allocations related to ingesting data in a basic setup.	2016-12-21 10:00:33 +00:00
Brian Brazil	2e3b42ad6c	Correctly handle the end time being 0 in the URL. (#2290 )	2016-12-18 19:30:52 +00:00
Brian Brazil	f421ce0636	Remove label from prometheus_target_skipped_scrapes_total (#2289 ) This avoids it not being intialised, and breaking out by interval wasn't partiuclarly useful. Fixes #2269	2016-12-16 18:00:52 +00:00
Brian Brazil	30448286c7	Add sample_limit to scrape config. This imposes a hard limit on the number of samples ingested from the target. This is counted after metric relabelling, to allow dropping of problemtic metrics. This is intended as a very blunt tool to prevent overload due to misbehaving targets that suddenly jump in sample count (e.g. adding a label containing email addresses). Add metric to track how often this happens. Fixes #2137	2016-12-16 15:10:09 +00:00
Björn Rabenstein	f3f798fbcf	Merge pull request #2283 from tcolgate/ignoredots ignore dotfiles in data directory	2016-12-15 13:32:03 +01:00
Tristan Colgate	30be8e0b8a	ignore dotfiles in data directory	2016-12-15 11:48:23 +00:00
Tristan Colgate-McFarlane	4d9134e6d8	Add labeldrop and labelkeep actions. (#2279 ) Introduce two new relabel actions. labeldrop, and labelkeep. These can be used to filter the set of labels by matching regex - labeldrop: drops all labels that match the regex - labelkeep: drops all labels that do not match the regex	2016-12-14 10:17:42 +00:00
Björn Rabenstein	45570e5972	Merge pull request #2277 from prometheus/beorn7/storage2 storage: Sanity-check number of loaded chunk descs	2016-12-14 02:59:10 +01:00
beorn7	253be23c00	storage: Sanity-check number of loaded chunk descs Two cases: - An unarchived metric must have at least one chunk desc loaded upon unarchival. Otherwise, the file is gone or has size 0, which is an inconsistency (because the series is still indexed in the archive index). Hence, quarantining is triggered. - If loading the chunk descs of a series with a known chunkDescsOffset (i.e. != -1), the number of chunks loaded must be equal to chunkDescsOffset. If not, there is a data corruption. An error is returned, which leads to qurantining. In any case, there is a guard added to not access the 1st element of an empty chunkDescs slice. (That's what triggered the crashes in issue 2249.) A time series with unknown chunkDescsOffset and no chunks in memory and no chunks on disk either could trigger that case. I would assume such a "null series" doesn't exist, but it's not entirely unthinkable and unreasonable to happen (perhaps in future uses of the storage). (Create a series, and then something tries to preload chunks before the first sample is added.)	2016-12-13 23:19:39 +01:00
Björn Rabenstein	5f0c0e43cf	Merge pull request #2276 from prometheus/beorn7/storage storage: Catch data corruption that leads to division by zero	2016-12-13 23:13:39 +01:00
Björn Rabenstein	a4c8292232	Merge pull request #2278 from prometheus/beorn7/style storage: Fix linter issue	2016-12-13 23:13:05 +01:00
beorn7	837c029b16	storage: Fix linter issue Go style tries to avoid indented `else` blocks.	2016-12-13 19:05:30 +01:00
Brian Brazil	c8de1484d5	Add scrape_samples_post_metric_relabeling This reports the number of samples post any keep/drop from metric relabelling.	2016-12-13 17:32:11 +00:00
Brian Brazil	06b9df65ec	Refactor and add unittests to scrape result handling.	2016-12-13 16:49:17 +00:00
Björn Rabenstein	568fd8a8cb	Merge pull request #2155 from prometheus/beorn7/vendoring2 Update vendoring for Azure	2016-12-13 17:10:59 +01:00
beorn7	4719482f5f	storage: Make tests go-vet and golint clean	2016-12-13 17:07:27 +01:00
beorn7	485ac8dff7	storage: Verify validity of byte length when unmarshalling (double)delta chunks This makes sure a division-by-zero crash cannot happen in the Len() method. Fixes #2773	2016-12-13 17:07:27 +01:00
Brian Brazil	b5ded43594	Allow buffering of scraped samples before sending them to storage.	2016-12-13 15:01:35 +00:00
beorn7	906c3a2237	Update vendoring for Azure Also, actually record the vendored version in vendor.json.	2016-12-13 14:21:16 +01:00
tattsun	e714079cf2	storage: fix error message (#2270 ) * storage: add error message	2016-12-09 22:36:27 +00:00
Fabian Reinartz	9ecea36ef9	Merge pull request #2259 from prometheus/federationerr web: don't return federation errors over HTTP	2016-12-06 16:18:03 +01:00
Fabian Reinartz	cef2e04aa3	web: add error counter for federation responses	2016-12-06 16:09:50 +01:00
Fabian Reinartz	0ea0a19848	Merge pull request #2240 from agaoglu/read-timeout Set read-timeout for http.Server	2016-12-06 16:01:45 +01:00
Fabian Reinartz	9d68e81b32	web: don't return federation errors over HTTP We are writing federation responses streaming. So after the first byte we wrote, the status header is fixed. We cannot return an HTTP error for intermediate error but should just abort and log instead.	2016-12-06 15:52:50 +01:00
Erdem Agaoglu	054f8ebbfb	Increase default max-connections	2016-12-06 17:45:19 +03:00
Erdem Agaoglu	2260079c12	Vendor x/net/netutil	2016-12-06 12:52:29 +03:00
Erdem Agaoglu	e487477a17	LimitListener to limit max number of connections This also drops tcp keep-alive in ListenAndServe but it's no longer necessary since we now close idle connections long before that.	2016-12-06 12:45:59 +03:00
Fabian Reinartz	893390e0c6	Merge pull request #2248 from msiebuhr/cwd-in-status web: Display current working directory on status-page	2016-12-05 21:41:37 +01:00
Morten Siebuhr	c5b17263a6	web: Display current working directory on status-page	2016-12-05 19:46:41 +01:00
Björn Rabenstein	a932c1a4b6	Merge pull request #1794 from cmluciano/cml/persistenceerror Clarify error message when Prometheus data dir finds unexpected files	2016-12-05 18:40:51 +01:00
Christopher M. Luciano	148b006e25	Clarify error message when Prometheus data dir finds unexpected files	2016-12-05 10:51:57 -05:00
Fabian Reinartz	0459dcd2e2	Merge pull request #2234 from brancz/targets-api web/api: add targets endpoint	2016-12-05 14:14:04 +01:00
Frederic Branczyk	33b583d50e	web/api: add targets endpoint	2016-12-05 13:13:21 +01:00
Frederic Branczyk	8f8cea4fbd	retrieval: refactor TargetManager to return flat list of Targets	2016-12-02 13:28:58 +01:00
Erdem Agaoglu	9986b28380	Set read-timeout for http.Server This also specifies a timeout for idle client connections, which may cause "too many open files" errors. See #2238	2016-12-01 16:29:45 +03:00

1 2 3 4 5 ...

3681 Commits (6aee1551e1ec53054d5cbf6464b806339c26ced7) All Branches Search

3681 Commits (6aee1551e1ec53054d5cbf6464b806339c26ced7)

All Branches