prometheus

Commit Graph

Author	SHA1	Message	Date
Julius Volz	94666e20b7	Minor test error reporting cleanup. Change-Id: Ie11c16b4e60de7c179c6d2a86e063f4432e2000f	2014-02-03 12:27:01 +01:00
Julius Volz	fd2158e746	Store copy of metric during fingerprint caching Problem description: ==================== If a rule evaluation referencing a metric/timeseries M happens at a time when M doesn't have a memory timeseries yet, looking up the fingerprint for M (via TieredStorage.GetMetricForFingerprint()) will create a new Metric object for M which gets both: a) attached to a new empty memory timeseries (so we don't have to ask disk for the Metric's fingerprint next time), and b) returned to the rule evaluation layer. However, the rule evaluation layer replaces the name label (and possibly other labels) of the metric with the name of the recorded rule. Since both the rule evaluator and the memory storage share a reference to the same Metric object, the original memory timeseries will now also be incorrectly renamed. Fix: ==== Instead of storing a reference to a shared metric object, take a copy of the object when creating an empty memory timeseries for caching purposes. Change-Id: I9f2172696c16c10b377e6708553a46ef29390f1e	2014-02-02 17:11:08 +01:00
Julius Volz	7e9ecaac3a	Add count_scalar() function. Change-Id: I63f09dd0479d0a6b016f5f857dd39dcbda56c7f9	2014-01-30 13:07:26 +01:00
Julius Volz	718ad2224b	Fix LevelDB closing order. The storage itself should be closed before any of the objects passed into it are closed (otherwise closing the storage can randomly freeze). Defers are executed in reverse order, so closing the storage should be the last of the defer statements. Change-Id: Id920318b876f5b94767ed48c81221b3456770620	2014-01-28 15:16:06 +01:00
Julius Volz	18d9d00100	Upgrade to Go 1.2. Change-Id: If8451257487edc4b76f4248f6e6b47c073dea183	2014-01-24 16:13:36 +01:00
Julius Volz	b382e8b7bd	Remove overly verbose DNS-SD logging line. Change-Id: Ie4534437ab88b9a6b99f5cb6c2f32c9588c1fff6	2014-01-24 16:09:41 +01:00
Julius Volz	0378c2ca1f	Nonexistent labels in BY-clauses shouldn't propagate to result. This fixes bug 2. of https://github.com/prometheus/prometheus/issues/374 Change-Id: Ia4a13153616bafce5bf10597966b071434422d09	2014-01-24 16:05:30 +01:00
Bjoern Rabenstein	c342ad33a0	Fix OperatorError. This used to work with Go 1.1, but only because of a compiler bug. The bug is fixed in Go 1.2, so we have to fix our code now. Change-Id: I5a9f3a15878afd750e848be33e90b05f3aa055e1	2014-01-21 16:49:51 +01:00
Julius Volz	d5ef0c64dc	Merge "Add optional sample replication to OpenTSDB."	2014-01-08 17:45:08 +01:00
Julius Volz	61d26e8445	Add optional sample replication to OpenTSDB. Prometheus needs long-term storage. Since we don't have enough resources to build our own timeseries storage from scratch ontop of Riak, Cassandra or a similar distributed datastore at the moment, we're planning on using OpenTSDB as long-term storage for Prometheus. It's data model is roughly compatible with that of Prometheus, with some caveats. As a first step, this adds write-only replication from Prometheus to OpenTSDB, with the following things worth noting: 1) I tried to keep the integration lightweight, meaning that anything related to OpenTSDB is isolated to its own package and only main knows about it (essentially it tees all samples to both the existing storage and TSDB). It's not touching the existing TieredStorage at all to avoid more complexity in that area. This might change in the future, especially if we decide to implement a read path for OpenTSDB through Prometheus as well. 2) Backpressure while sending to OpenTSDB is handled by simply dropping samples on the floor when the in-memory queue of samples destined for OpenTSDB runs full. Prometheus also only attempts to send samples once, rather than implementing a complex retry algorithm. Thus, replication to OpenTSDB is best-effort for now. If needed, this may be extended in the future. 3) Samples are sent in batches of limited size to OpenTSDB. The optimal batch size, timeout parameters, etc. may need to be adjusted in the future. 4) OpenTSDB has different rules for legal characters in tag (label) values. While Prometheus allows any characters in label values, OpenTSDB limits them to a to z, A to Z, 0 to 9, -, _, . and /. Currently any illegal characters in Prometheus label values are simply replaced by an underscore. Especially when integrating OpenTSDB with the read path in Prometheus, we'll need to reconsider this: either we'll need to introduce the same limitations for Prometheus labels or escape/encode illegal characters in OpenTSDB in such a way that they are fully decodable again when reading through Prometheus, so that corresponding timeseries in both systems match in their labelsets. Change-Id: I8394c9c55dbac3946a0fa497f566d5e6e2d600b5	2014-01-02 18:21:38 +01:00
Julius Volz	7b013e6491	Merge "Replace some uses of obsolete `/metrics.json` with `/metrics` (haven't touched test files yet)."	2013-12-18 16:56:30 +01:00
Julius Volz	f44f398ea7	Merge "Added DNS-SD lookup counter for successful/unsuccessful lookups"	2013-12-16 14:52:50 +01:00
Stuart Nelson	48a6326d25	Added DNS-SD lookup counter for successful/unsuccessful lookups Change-Id: I0a71e994a989cecace280b5134a31ebc2ace7591	2013-12-16 08:48:56 -05:00
Julius Volz	97d84239df	Merge "Don't keep extra labels in aggregations by default."	2013-12-16 12:54:55 +01:00
Julius Volz	6dc36d0c3e	Don't keep extra labels in aggregations by default. MIN/MAX/SUM/AVG/COUNT aggregations will now by default drop all labels that are not specifically part of a BY-clause, even if a label value is the same within all timeseries of an aggregation group. The old behavior of keeping extra labels may still be switched on by adding KEEPING_EXTRA to the end of an aggregation statement: sum(http_requests) by (job, method) keeping_extra I'm open to better syntax/naming suggestions. Change-Id: I21d3fe7af9e98552ce3dffa3ce7c0a4ba4c0b4a4	2013-12-16 12:53:10 +01:00
Stuart Nelson	0c58e388f6	rename curation metrics to prometheus_curation Change-Id: I6a0bf277e88ea8eb737670b7e865ae20f2cbfb91	2013-12-13 17:45:01 -05:00
Julius Volz	20bfaf80ab	Merge "Display filename when encountering bad rule file."	2013-12-13 15:01:02 +01:00
Stuart Nelson	28f59edf16	Added telemetry for counting stored samples Change-Id: I0f36f7c2738d070ca2f107fcb315f98e46803af3	2013-12-12 10:06:41 -05:00
Julius Volz	3bf3a555b2	Merge "add evalDuration histogram and ruleCount counter for rules"	2013-12-11 22:52:19 +01:00
Stuart Nelson	b75adfebad	add evalDuration histogram and ruleCount counter for rules Change-Id: I3508fe72526348d96b8158828388c3ac8d7c3fa9	2013-12-11 15:42:53 -05:00
Julius Volz	77a79d1fc0	Display filename when encountering bad rule file. Change-Id: I4729371be92c5659a6938145c5fde66771d7be22	2013-12-11 15:44:11 +01:00
Julius Volz	fb44580110	Cleanup/fix program termination sequence. Change-Id: I2bc58a2583fb079c9ef383cfc7a5e0fbe613f1cd	2013-12-11 15:40:32 +01:00
Tobias Schmidt	6947ee9bc9	Try to create metrics root directory if missing This change tries to be nice and create the metrics directoy first before erroring out. Change-Id: I72691cdc32469708cd671c6ef1fb7db55fe60430	2013-12-03 18:16:13 +07:00
Tobias Schmidt	4300ce3dc8	Merge "Ensure that job names are unique in parsed configs."	2013-12-03 12:13:03 +01:00
Julius Volz	78ebc1a61f	Ensure that job names are unique in parsed configs. Change-Id: I6bd89e6401bd924315981db797af21bdf0b81252	2013-12-03 12:10:22 +01:00
Julius Volz	436f3df0e8	Merge "Add note that pbcopy is only available in OSX"	2013-12-03 12:08:55 +01:00
Tobias Schmidt	ee7f81b665	Add note that pbcopy is only available in OSX Change-Id: I4eda3a5a9117b5021fbc6e3625afa01100c39fa6	2013-12-03 18:06:04 +07:00
Julius Volz	740d448983	Use custom timestamp type for sample timestamps and related code. So far we've been using Go's native time.Time for anything related to sample timestamps. Since the range of time.Time is much bigger than what we need, this has created two problems: - there could be time.Time values which were out of the range/precision of the time type that we persist to disk, therefore causing incorrectly ordered keys. One bug caused by this was: https://github.com/prometheus/prometheus/issues/367 It would be good to use a timestamp type that's more closely aligned with what the underlying storage supports. - sizeof(time.Time) is 192, while Prometheus should be ok with a single 64-bit Unix timestamp (possibly even a 32-bit one). Since we store samples in large numbers, this seriously affects memory usage. Furthermore, copying/working with the data will be faster if it's smaller. MEMORY USAGE RESULTS Initial memory usage comparisons for a running Prometheus with 1 timeseries and 100,000 samples show roughly a 13% decrease in total (VIRT) memory usage. In my tests, this advantage for some reason decreased a bit the more samples the timeseries had (to 5-7% for millions of samples). This I can't fully explain, but perhaps garbage collection issues were involved. WHEN TO USE THE NEW TIMESTAMP TYPE The new clientmodel.Timestamp type should be used whenever time calculations are either directly or indirectly related to sample timestamps. For example: - the timestamp of a sample itself - all kinds of watermarks - anything that may become or is compared to a sample timestamp (like the timestamp passed into Target.Scrape()). When to still use time.Time: - for measuring durations/times not related to sample timestamps, like duration telemetry exporting, timers that indicate how frequently to execute some action, etc. NOTE ON OPERATOR OPTIMIZATION TESTS We don't use operator optimization code anymore, but it still lives in the code as dead code. It still has tests, but I couldn't get all of them to pass with the new timestamp format. I commented out the failing cases for now, but we should probably remove the dead code soon. I just didn't want to do that in the same change as this. Change-Id: I821787414b0debe85c9fffaeb57abd453727af0f	2013-12-03 09:11:28 +01:00
Julius Volz	6b7de31a3c	Upgrade to LevelDB 1.14.0 to fix LevelDB bugs. This tentatively fixes https://github.com/prometheus/prometheus/issues/368 due to an upstream bugfix in snapshotted LevelDB iterator handling, which got fixed in LevelDB 1.14.0: https://code.google.com/p/leveldb/issues/detail?id=200 Change-Id: Ib0cc67b7d3dc33913a1c16736eff32ef702c63bf	2013-12-03 09:07:15 +01:00
Julius Volz	db015de65b	Comment and "go fmt" fixups in compaction tests. Change-Id: Iaa0eda6a22a5caa0590bae87ff579f9ace21e80a	2013-10-30 17:06:17 +01:00
Johannes 'fish' Ziemke	8c08a5031f	Add search domain support to SRV lookups This adds search domain support by trying to resolve a name by appending each search domain configured in /etc/resolv.conf until the query succeeds (NOERROR) and has at least one answer. Change-Id: Ibdc5138c5d8cc049e11fab90c3d5243d5a06852c	2013-10-29 17:19:49 +01:00
Julius Volz	39417f93ee	Merge "Remove usage of gorest."	2013-10-28 10:29:33 +01:00
Julius Volz	fceef4137c	Fix /metrics endpoint in sample config. Change-Id: I2daca6a31f536b87aa8e49a2190787ad9d803595	2013-10-28 08:03:58 +01:00
Julius Volz	51408bdfe8	Merge changes I3ffeb091,Idffefea4 * changes: Add chunk sanity checking to dumper tool. Add compaction regression tests.	2013-10-24 13:58:14 +02:00
Julius Volz	2162e57784	Merge "Fix watermarker default time / LevelDB key ordering bug."	2013-10-24 13:57:48 +02:00
Julius Volz	5e18255920	Merge "Fix chunk corruption compaction bug."	2013-10-24 13:57:31 +02:00
Julius Volz	6f6f56021a	Merge changes I53a24c06,Ibe1def5c,Ife68c9c6,Ia3284a90 * changes: fix link to CONTRIBUTING.md in README.md moved CONTRIBUTING.md to top of repo; link to CONTRIBUTING.md in README.md change double quotes to backticks for md awesomeness add contributing.md	2013-10-24 13:03:10 +02:00
Julius Volz	b70d5ca143	Merge changes I76203973,I38646c2b * changes: More updates for first time users. Update example config file from json to new protobuf format.	2013-10-24 12:45:55 +02:00
Julius Volz	98007b8289	Merge "Add a check for metrics directory existence."	2013-10-24 12:42:25 +02:00
Stuart Nelson	1e357cf859	fix link to CONTRIBUTING.md in README.md Change-Id: I53a24c061d0610a9c4b3c515c7d5ba7c04ae9f54	2013-10-23 16:26:39 -04:00
Stuart Nelson	28b055554f	moved CONTRIBUTING.md to top of repo; link to CONTRIBUTING.md in README.md Change-Id: Ibe1def5c0c5e1e7f6eb0da344badc53d18f2ecb3	2013-10-23 16:21:35 -04:00
Stuart Nelson	dd2b5e0e1c	change double quotes to backticks for md awesomeness Change-Id: Ife68c9c67d36ffec24927176ab519f7cb08976a8	2013-10-23 10:16:25 -04:00
Stuart Nelson	af5114d81e	add contributing.md Change-Id: Ia3284a90dfbbaaf655facd885a8ef13858bdb2c9	2013-10-23 10:11:43 -04:00
Conor Hennessy	eba01d1119	Remove usage of gorest. Due to on going issues, we've decided to remove gorest. It started with gorest not being thread-safe (it does introspection to create a new handler which is an easy process to mess up with multiple threads of execution): https://code.google.com/p/gorest/issues/detail?id=15 While the issue has been marked fixed, it looks like the patch has introduced more problems than the original issue and simply doesn't work properly. I'm not sure the behaviour was thought through properly. If a new instance is needed every request then a handler-factory is needed or the library needs to set expectations about how the new objects should interact with their constructor state. While it was tempting to try out another routing library, I think for now it's better to use dumb vanilla Go routing. At least until we decide which URL format we intend to standardize on. Change-Id: Ica3da135d05f8ab8fc206f51eeca4f684f8efa0e	2013-10-23 14:19:14 +02:00
Stuart Nelson	72b861bebb	remove duplicate users word from README Change-Id: I3a9c84f16731c76f957155e58d05beda26505924	2013-10-22 23:25:08 -04:00
Julius Volz	eb461a707d	Add chunk sanity checking to dumper tool. Also, move codecs/filters to common location so they can be used in subsequent test. Change-Id: I3ffeb09188b8f4552e42683cbc9279645f45b32e	2013-10-23 01:06:49 +02:00
Julius Volz	6ea22f2bf9	Add compaction regression tests. This adds regression tests that catch the two error cases reported in https://github.com/prometheus/prometheus/issues/367 It also adds a commented-out test case for the crash in https://github.com/prometheus/prometheus/issues/368 but there's no fix for the latter crash yet. Change-Id: Idffefea4ed7cc281caae660bcad2e3c13ec3bd17	2013-10-23 01:06:28 +02:00
Conor Hennessy	aada5ded85	Replace some uses of obsolete `/metrics.json` with `/metrics` (haven't touched test files yet). Change-Id: I48c7c0cf27a39d596627a06cbb4f5913fb3da13c	2013-10-22 20:54:43 +02:00
Conor Hennessy	2d2c434d48	More updates for first time users. - Modified sample conf so it is useable by default, also added some comments from the 'hello world' configuration. - Updated README so there's a clear two step start for newbies. - Added extra vim swap files to gitignore. Change-Id: I76203973db4a7b332014662fcfb2ce5e7d137bd8	2013-10-22 20:54:43 +02:00
Conor Hennessy	986adfa557	Update example config file from json to new protobuf format. Change-Id: I38646c2be53b6993abe464d9cdd9b211678de496	2013-10-22 20:54:43 +02:00

... 276 277 278 279 280 ...

14726 Commits (9cf597c492a6af7316989deb99860e815bbaf13e) All Branches Search

14726 Commits (9cf597c492a6af7316989deb99860e815bbaf13e)

All Branches