prometheus

Commit Graph

Author	SHA1	Message	Date
Brian Brazil	60271d58bf	Change the 2nd argument of round to toNearest. This is more useful if you want get a multiple of 2 or 5, while still working for .001.	2015-02-05 16:13:40 +00:00
Julius Volz	82613527f3	Remove unnecessary float64() conversion in round().	2015-02-05 15:14:05 +01:00
Marko Mikulicic	8fdacbdf17	Add floor, ceil and round functions. Closes #402	2015-02-04 17:20:56 +01:00
Fabian Reinartz	fa1e90003b	Query timeout added. This is related to #454. Queries now timeout after a duration set by the -query.timeout flag. The TotalEvalTimer is now started/stopped inside any of the ast.Eval* functions.	2015-02-03 08:04:27 +01:00
Bjoern Rabenstein	26e22e6ad6	Fix rule manager shutdown.	2015-01-29 15:05:10 +01:00
Julius Volz	d4374a9265	More efficient JSON query result format. This depends on https://github.com/prometheus/client_golang/pull/51. For vectors, the result format looks like this: ```json { "version": 1, "type" : "vector", "value" : [ { "timestamp" : 1421765411.045, "value" : "65.475000", "metric" : { "quantile" : "0.5", "instance" : "http://localhost:9090/metrics", "job" : "prometheus", "__name__" : "http_request_duration_microseconds", "handler" : "/static/", "method" : "get", "code" : "304" } }, { "timestamp" : 1421765411.045, "value" : "5826.339000", "metric" : { "quantile" : "0.9", "instance" : "http://localhost:9090/metrics", "job" : "prometheus", "__name__" : "http_request_duration_microseconds", "handler" : "prometheus", "method" : "get", "code" : "200" } }, /* ... / ] } ``` For matrices, it looks like this: ```json { "version": 1, "type" : "matrix", "value" : [ { "metric" : { "quantile" : "0.99", "instance" : "http://localhost:9090/metrics", "job" : "prometheus", "__name__" : "http_request_duration_microseconds", "handler" : "/static/", "method" : "get", "code" : "200" }, "values" : [ [ 1421765547.659, "29162.953000" ], [ 1421765548.659, "29162.953000" ], [ 1421765549.659, "29162.953000" ], / ... */ ] } ] } ```	2015-01-26 13:06:22 +01:00
Brian Brazil	a31730e88b	Make 2nd arg to delta optional. Add a deriv() function. The 2nd isCounter argument to delta is ugly, make it optional as the first step of deprecating it. This will makes delta only ever applied to gauges. Add a deriv function to calculate the least squares slope of a gauge. This is more useful for prediction than delta, as it isn't as heavily influenced by outliers at the boundaries.	2015-01-23 14:50:27 +00:00
Bjoern Rabenstein	5859b74f1b	Clean up license issues. - Move CONTRIBUTORS.md to the more common AUTHORS. - Added the required NOTICE file. - Changed "Prometheus Team" to "The Prometheus Authors". - Reverted the erroneous changes to the Apache License.	2015-01-21 20:07:45 +01:00
Bjoern Rabenstein	b09453af1d	Adjust to new client_golang API.	2015-01-21 15:42:25 +01:00
Julius Volz	bb1e49383e	Log rule evalation errors.	2015-01-08 17:50:55 +01:00
Julius Volz	d6b9e97655	Remove extraction.Result type, simplify code.	2015-01-08 16:34:01 +01:00
Julius Volz	9a4ca68a61	Add metrics for rule evaluation failures. Fixes https://github.com/prometheus/prometheus/issues/417	2015-01-08 16:33:35 +01:00
Brian Brazil	ffa2e73803	Fix regression from `5e8d57bec1` 0 is a false value, so shortcutting no longer works. Update other places in the code that assumed graph was the default.	2014-12-27 00:28:36 +00:00
Julius Volz	cc27fb8aab	Rename remaining all-caps constants in AST layer. Change-Id: Ibe97e30981969056ffcdb89e63c1468ea1ffa140	2014-12-25 01:30:47 +01:00
Julius Volz	895523ad14	Include necessary Makefile.INCLUDE from rules/Makefile. Change-Id: I077d018dbe4093cd40ddf38d66a996df222bf5e4	2014-12-25 01:13:59 +01:00
Julius Volz	2ade9d40cf	Clarify why we need int constants for expression types. Change-Id: I053fc5d32c118dbdb204dc8193337f981aff796e	2014-12-25 00:45:30 +01:00
Julius Volz	00a2a93a05	Add regression tests for metrics mutations in AST. It turned out in the end, that only drop_common_metrics() produced any erroneous output in the old system. The second expression in the test ("sum(testmetric) keeping_extra") already worked in the old code, but why not keep it in... The way to test ranged evaluations is a bit clumsy so far, so I want to build a nicer test framework in the end, where all the test cases can be specified as text files which specify desired inputs, outputs, query step widths, etc. Change-Id: I821859789e69b8232bededf670a1b76e9e8c8ca4	2014-12-12 20:34:55 +01:00
Julius Volz	c9618d11e8	Introduce copy-on-write for metrics in AST. This depends on changes in: https://github.com/prometheus/client_golang/tree/cow-metrics. Change-Id: I80b94833a60ddf954c7cd92fd2cfbebd8dd46142	2014-12-12 20:34:55 +01:00
Bjoern Rabenstein	b1e4956142	Apply a giant code cleanup. Essentially: - Remove unused code. - Make it 'go vet' clean. The only remaining warnings are in generated code. - Make it 'golint' clean. The only remaining warnings are in gerenated code. - Smoothed out same minor things. Change-Id: I3fe5c1fbead27b0e7a9c247fee2f5a45bc2d42c6	2014-12-10 16:16:49 +01:00
Bjoern Rabenstein	fee88a7a77	Remove the remaining races, new and old. Also, resolve a few other TODOs. Change-Id: Icb39b5a5e8ca22ebcb48771cd8951c5d9e112691	2014-12-03 18:07:23 +01:00
Bjoern Rabenstein	7d11019aa2	Squash a few trivial TODOs. - Delete unneeded file view_adapter.go. - Assessed that we still need the fingerprints in nodes (to create iterators). - Turned numMemChunkDescs into a metric. Change-Id: I29be963c795a075ec00c095f76bf26405535609d	2014-11-27 18:26:06 +01:00
Julius Volz	6eecee55b7	Fix acronym caps in GeneratorURL. Change-Id: Ib18c1f617dcde1039e848059545a6d8831d9bf66	2014-11-25 17:13:04 +01:00
Bjoern Rabenstein	0ae1d8889a	Fix tests after merge. Change-Id: Ia90da9a3e48ed780ec38c4a6a1fd9ea34e7f6a58	2014-11-25 17:13:04 +01:00
Julius Volz	b7bf11230a	Add absent() function. A common problem in Prometheus alerting is to detect when no timeseries exist for a given metric name and label combination. Unfortunately, Prometheus alert expressions need to be of vector type, and "count(nonexistent_metric)" results in an empty vector, yielding no output vector elements to base an alert on. The newly introduced absent() function solves this issue: ALERT FooAbsent IF absent(foo{job="myjob"}) [...] absent() has the following behavior: - if the vector passed to it has any elements, it returns an empty vector. - if the vector passed to it has no elements, it returns a 1-element vector with the value 1. In the second case, absent() tries to be smart about deriving labels of the 1-element output vector from the input vector: absent(nonexistent{job="myjob"}) => {job="myjob"} absent(nonexistent{job="myjob",instance=~".*"}) => {job="myjob"} absent(sum(nonexistent{job="myjob"})) => {} That is, if the passed vector is a literal vector selector, it takes all "=" label matchers as the basis for the output labels, but ignores all non-equals or regex matchers. Also, if the passed vector results from a non-selector expression, no labels can be derived. Change-Id: I948505a1488d50265ab5692a3286bd7c8c70cd78	2014-11-25 17:13:04 +01:00
Julius Volz	3d47f94149	Drop metric names after transformations. After many transformations, it doesn't make sense to keep the metric names, since the result of the transformation is no longer that metric. This drops the metric name after such transformations and makes the web UI deal well with missing metric names. This depends on the current branch on the following things: - prometheus/client_golang needs to be at `e237cf15c6` in branch "julius/int-fingerprints" (to be merged with new storage) - prometheus/promdash needs to be at `dd7691c9c2` Change-Id: Ib3c8cad8d647d9854e8c653c424b8c235ccc231d	2014-11-25 17:13:04 +01:00
Bjoern Rabenstein	14bda4180c	Changes after pair code review. Change-Id: Ib72d40f8e9027818cfbbd32a7a7201eebda07455	2014-11-25 17:12:59 +01:00
Bjoern Rabenstein	006b5517e2	Simplify makefiles. This removes the dependancy on C leveldb and snappy. It also takes care of fewer dependencies as they would anyway not work on any non-Debian, non-Brew system. Change-Id: Ia70dce1ba8a816a003587927e0b3a3f8ad2fd28c	2014-11-25 17:10:39 +01:00
Bjoern Rabenstein	74c143c4c9	Improve scraper shutdown time. - Stop target pools in parallel. - Stop individual scrapers in goroutines, too. - Timing tweaks. Change-Id: I9dff1ee18616694f14b04408eaf1625d0f989696	2014-11-25 17:10:39 +01:00
Julius Volz	0712d738d1	Allow alternative "by"-clause position in grammar. In addition to the existing by-clause syntax: sum(<expression>) by (<labels>) [keeping_extra] ...this allows the following new syntax: sum by (<labels>) [keeping_extra] (<expression>) Both orderings may be used in a single expression. It is up to the users to establish guidelines around their usage. Change-Id: Iba10c9cc5fb6ac62edfcf246d281473e82467992	2014-11-25 17:09:04 +01:00
Julius Volz	0e48c18bbf	Allow omitting the metric name in queries. This allows the following expression syntaxes for selecting timeseries: foo (already valid before) foo{} (already valid before) {job="prometheus"} (new, select all timeseries for job "prometheus") Omitting both the metric name and any label matchers ("" or "{}") will still yield a syntax error. To get all timeseries, you could do: {__name__=~"."} or, without relying on knowledge about __metric__: {job=~"."} Change-Id: Ifee000b9ac0184ef6ced18411069c7f2699a2dda	2014-11-25 17:09:04 +01:00
Bjoern Rabenstein	096fa0f8b2	Squash a number of TODOs. - Staleness delta is no a proper function parameter and not replicated from package ast. - Named type 'chunks' replaced by explicit '[]chunk' to avoid confusion. - For the same reason, replaced 'chunkDescs' by '[]*chunkDescs'. - Verified that math.Modf is not a speed enhancement over conversion (actually 5x slower). - Renamed firstTimeField, lastTimeField into chunkFirstTime and chunkLastTime. - Verified unpin() is sufficiently goroutine-safe. - Decided not to update archivedFingerprintToTimeRange upon series truncation and added a rationale why. Change-Id: I863b8d785e5ad9f71eb63e229845eacf1bed8534	2014-11-25 17:09:04 +01:00
Bjoern Rabenstein	b3ed9aa7a2	Clean up start-up and shut-down. Change-Id: Idff4bbb0a15a9f879bfbb3da5b1025179cab5e2c	2014-11-25 17:08:45 +01:00
Bjoern Rabenstein	38fc24d0ed	Fix targetpool_test.go and other tests. Change-Id: I91a4dd1d39e01f174e1aaae653ce1ed7aecaa624	2014-11-25 17:08:26 +01:00
Julius Volz	7f5d3c2c29	Fix and improve the fp locker. Benchmark: $ go test -bench 'Fingerprint' -test.run 'Fingerprint' -test.cpu=1,2,4 OLD BenchmarkFingerprintLockerParallel 500000 3618 ns/op BenchmarkFingerprintLockerParallel-2 100000 12257 ns/op BenchmarkFingerprintLockerParallel-4 500000 10164 ns/op BenchmarkFingerprintLockerSerial 10000000 283 ns/op BenchmarkFingerprintLockerSerial-2 10000000 284 ns/op BenchmarkFingerprintLockerSerial-4 10000000 288 ns/op NEW BenchmarkFingerprintLockerParallel 1000000 1018 ns/op BenchmarkFingerprintLockerParallel-2 1000000 1164 ns/op BenchmarkFingerprintLockerParallel-4 2000000 910 ns/op BenchmarkFingerprintLockerSerial 50000000 56.0 ns/op BenchmarkFingerprintLockerSerial-2 50000000 47.9 ns/op BenchmarkFingerprintLockerSerial-4 50000000 54.5 ns/op Change-Id: I3c65a43822840e7e64c3c3cfe759e1de51272581	2014-11-25 17:07:45 +01:00
Julius Volz	358f97791d	Minor cleanups. Change-Id: Ia8685d8439a421fe2143d9ec7120d5bb5ab88d78	2014-11-25 17:07:44 +01:00
Bjoern Rabenstein	f5f9f3514a	Major code cleanup. - Make it go-vet and golint clean. - Add comments, TODOs, etc. Change-Id: If1392d96f3d5b4cdde597b10c8dff1769fcfabe2	2014-11-25 17:02:53 +01:00
Julius Volz	e7ed39c9a6	Initial experimental snapshot of next-gen storage. Change-Id: Ifb8709960dbedd1d9f5efd88cdd359ee9fa9d26d	2014-11-25 17:02:00 +01:00
Julius Volz	85497e3f38	Add function to drop common labels in a vector. This fixes https://github.com/prometheus/prometheus/issues/384. Change-Id: I2973c4baeb8a4618ec3875fb11c6fcf5d111784b	2014-11-25 17:02:00 +01:00
Julius Volz	3fdb74e571	Add more topk() / bottomk() tests. Test what happens if k > number of input elements. Change-Id: Ie724b850939e297ebf085f0a5a3522e9cfcc6534	2014-11-25 17:02:00 +01:00
Julius Volz	c582ae73c2	Implement topk() and bottomk() functions. To achieve O(log n * k) runtime, this uses a heap to track the current bottom-k or top-k elements while iterating over the full set of available elements. It would be possible to reuse more code between topk and bottomk, but I decided for some more duplication for the sake of clarity. This fixes https://github.com/prometheus/prometheus/issues/399 Change-Id: I7487ddaadbe7acb22ca2cf2283ba6e7915f2b336	2014-11-25 17:02:00 +01:00
Bjoern Rabenstein	1909686789	Make metrics exported by the Prometheus server itself more consistent. - Always spell out the time unit (e.g. milliseconds instead of ms). - Remove "_total" from the names of metrics that are not counters. - Make use of the "Namespace" and "Subsystem" fields in the options. - Removed the "capacity" facet from all metrics about channels/queues. These are all fixed via command line flags and will never change during the runtime of a process. Also, they should not be part of the same metric family. I have added separate metrics for the capacity of queues as convenience. (They will never change and are only set once.) - I left "metric_disk_latency_microseconds" unchanged, although that metric measures the latency of the storage device, even if it is not a spinning disk. "SSD" is read by many as "solid state disk", so it's not too far off. (It should be "solid state drive", of course, but "metric_drive_latency_microseconds" is probably confusing.) - Brian suggested to not mix "failure" and "success" outcome in the same metric family (distinguished by labels). For now, I left it as it is. We are touching some bigger issue here, especially as other parts in the Prometheus ecosystem are following the same principle. We still need to come to terms here and then change things consistently everywhere. Change-Id: If799458b450d18f78500f05990301c12525197d3	2014-11-25 17:02:00 +01:00
Julius Volz	00b9489f1c	Fix time() behavior. time() should return the timestamp for which the query is executed, not the actual current time. Change-Id: I430a45cabad7785cd58f95b1028a71dff4c87710	2014-11-25 17:02:00 +01:00
Julius Volz	c5984f1818	Add abs() and over-time aggregation functions. This implements aggregation functions over time as request in https://github.com/prometheus/prometheus/issues/383. Change-Id: Ifd69b850de8cfdf6e7a6c0e042056fa4c672410e	2014-11-25 17:02:00 +01:00
Brian Brazil	f525ca5d9e	Let consoles get graph links from experssions. Rename ConsoleLinkFromExpression, as we now have consoles. Change-Id: I7ed2c9c83863adb390b51121dd9736845f7bcdfc	2014-11-25 17:01:59 +01:00
Bjoern Rabenstein	8956faeccb	Migrate to new client_golang. This change will only be submitted when the new client_golang has been moved to the new version. Change-Id: Ifceb59333072a08286a8ac910709a8ba2e3a1581	2014-11-25 17:01:59 +01:00
Brian Brazil	960ede66dc	Use html/template for console templates and add template libary support. Add a function to bypass the new auto-escaping. Add a function to workaround go's templates only allowing passing in one argument. Change-Id: Id7aa3f95e7c227692dc22108388b1d9b1e2eec99	2014-11-25 17:01:59 +01:00
Brian Brazil	e041c0cd46	Add console and alert templates with access to all data. Move rulemanager to it's own package to break cicrular dependency. Make NewTestTieredStorage available to tests, remove duplication. Change-Id: I33b321245a44aa727bfc3614a7c9ae5005b34e03	2014-05-30 16:24:56 +01:00
Bjoern Rabenstein	ca6a4fccef	Weed out our homegrown test.Tester. The Go stdlib has testing.TB now, which fulfills the exact same purpose. Change-Id: I0db9c73400e208ca376b932a02b7e3402234b87c	2014-05-21 19:27:24 +02:00
Julius Volz	6297a405f2	Do not indent API JSON responses. In one example response, this reduced the uncompressed size by 25% and the gzipped size by 11%. Change-Id: Ie80d44253124b9f8601b8ef9fc978e92dacff523	2014-04-22 15:16:37 +02:00
Julius Volz	01f652cb4c	Separate storage implementation from interfaces. This was initially motivated by wanting to distribute the rule checker tool under `tools/rule_checker`. However, this was not possible without also distributing the LevelDB dynamic libraries because the tool transitively depended on Levigo: rule checker -> query layer -> tiered storage layer -> leveldb This change separates external storage interfaces from the implementation (tiered storage, leveldb storage, memory storage) by putting them into separate packages: - storage/metric: public, implementation-agnostic interfaces - storage/metric/tiered: tiered storage implementation, including memory and LevelDB storage. I initially also considered splitting up the implementation into separate packages for tiered storage, memory storage, and LevelDB storage, but these are currently so intertwined that it would be another major project in itself. The query layers and most other parts of Prometheus now have notion of the storage implementation anymore and just use whatever implementation they get passed in via interfaces. The rule_checker is now a static binary :) Change-Id: I793bbf631a8648ca31790e7e772ecf9c2b92f7a0	2014-04-16 13:30:19 +02:00
Julius Volz	d411a7d810	Allow reversing vector and scalar arguments in binops. This allows putting a scalar as the first argument of a binary operator in which the second argument is a vector: <scalar> <binop> <vector> For example, 1 / http_requests_total ...will output a vector in which every sample value is 1 divided by the respective input vector element. This even works for filter binary operators now: 1 == http_requests_total Returns a vector with all values set to 1 for every element in http_requests_total whose initial value was 1. Note: For filter binary operators, the resulting values are always taken from the left-hand-side of the operation, no matter whether the scalar or the vector argument is the left-hand-side. That is, 1 != http_requests_total ...will set all result vector sample values to 1, although these are exactly the sample elements that were != 1 in the input vector. If you want to just filter elements without changing their sample values, you still need to do: http_requests_total != 1 The new filter form is a bit exotic, and so probably won't be used often. But it was easier to implement it than disallow it completely or change its behavior. Change-Id: Idd083f2bd3a1219ba1560cf4ace42f5b82e797a5	2014-04-08 17:16:18 +02:00
Julius Volz	c7c0b33d0b	Add regex-matching support for labels. There are four label-matching ops for selecting timeseries now: - Equal: = - NotEqual: != - RegexMatch: =~ - RegexNoMatch: !~ Instead of looking up labels by a simple clientmodel.LabelSet (basically an equals op for every key/value pair in the set), timeseries fingerprint selection is now done via a list of metric.LabelMatchers. Change-Id: I510a83f761198e80946146770ebb64e4abc3bb96	2014-04-01 14:24:53 +02:00
Bjoern Rabenstein	0a65b691cc	Disallow ":" in identifiers, but still allow it in metric names. Change-Id: Iace925ab1b71a360bd63357e87f68e727f7afbcb	2014-03-21 13:44:37 +01:00
Julius Volz	86fc13a52e	Convert metric.Values to slice of values. The initial impetus for this was that it made unmarshalling sample values much faster. Other relevant benchmark changes in ns/op: Benchmark old new speedup ================================================================== BenchmarkMarshal 179170 127996 1.4x BenchmarkUnmarshal 404984 132186 3.1x BenchmarkMemoryGetValueAtTime 57801 50050 1.2x BenchmarkMemoryGetBoundaryValues 64496 53194 1.2x BenchmarkMemoryGetRangeValues 66585 54065 1.2x BenchmarkStreamAdd 45.0 75.3 0.6x BenchmarkAppendSample1 1157 1587 0.7x BenchmarkAppendSample10 4090 4284 0.95x BenchmarkAppendSample100 45660 44066 1.0x BenchmarkAppendSample1000 579084 582380 1.0x BenchmarkMemoryAppendRepeatingValues 22796594 22005502 1.0x Overall, this gives us good speedups in the areas where they matter most: decoding values from disk and accessing the memory storage (which is also used for views). Some of the smaller append examples take minimally longer, but the cost seems to get amortized over larger appends, so I'm not worried about these. Also, we're currently not bottlenecked on the write path and have plenty of other optimizations available in that area if it becomes necessary. Memory allocations during appends don't change measurably at all. Change-Id: I7dc7394edea09506976765551f35b138518db9e8	2014-03-11 18:23:37 +01:00
Julius Volz	bc6ee6611e	Rename persistence_adapter.go -> view_adapter.go Change-Id: Ib45081393b734531d2f85a02f46e87930aab3273	2014-02-22 22:43:11 +01:00
Julius Volz	3f226c9724	Rename {Scalar,Vector}Literal to {Scalar,Vector}Selector. Change-Id: Ie92301f47f5f49f30b3a62c365e377108982b080	2014-02-22 22:33:42 +01:00
Bjoern Rabenstein	682cf6fc51	Simplify QueryAnalizer.Visit(). Change-Id: I628582a1903b7273e78921e22a475f1dae5ebaae	2014-02-14 15:15:57 +01:00
Bjoern Rabenstein	fd63500ed3	Make rules/ast golint clean. Mostly, that means adding compliant doc strings to exported items. Also, remove 'go vet' warnings where possible. (Some are unfortunately not to avoid, arguably bugs in 'go vet'.) Change-Id: I2827b6dd317492864c1383c3de1ea9eac5a219bb	2014-02-14 15:01:39 +01:00
Björn Rabenstein	59febe771a	Merge "Minor code cleanups."	2014-02-13 15:29:16 +01:00
Julius Volz	c4adfc4f25	Minor code cleanups. Change-Id: Ib3729cf38b107b7f2186ccf410a745e0472e3630	2014-02-13 15:24:43 +01:00
Julius Volz	7e9ecaac3a	Add count_scalar() function. Change-Id: I63f09dd0479d0a6b016f5f857dd39dcbda56c7f9	2014-01-30 13:07:26 +01:00
Julius Volz	0378c2ca1f	Nonexistent labels in BY-clauses shouldn't propagate to result. This fixes bug 2. of https://github.com/prometheus/prometheus/issues/374 Change-Id: Ia4a13153616bafce5bf10597966b071434422d09	2014-01-24 16:05:30 +01:00
Julius Volz	6dc36d0c3e	Don't keep extra labels in aggregations by default. MIN/MAX/SUM/AVG/COUNT aggregations will now by default drop all labels that are not specifically part of a BY-clause, even if a label value is the same within all timeseries of an aggregation group. The old behavior of keeping extra labels may still be switched on by adding KEEPING_EXTRA to the end of an aggregation statement: sum(http_requests) by (job, method) keeping_extra I'm open to better syntax/naming suggestions. Change-Id: I21d3fe7af9e98552ce3dffa3ce7c0a4ba4c0b4a4	2013-12-16 12:53:10 +01:00
Julius Volz	20bfaf80ab	Merge "Display filename when encountering bad rule file."	2013-12-13 15:01:02 +01:00
Julius Volz	3bf3a555b2	Merge "add evalDuration histogram and ruleCount counter for rules"	2013-12-11 22:52:19 +01:00
Stuart Nelson	b75adfebad	add evalDuration histogram and ruleCount counter for rules Change-Id: I3508fe72526348d96b8158828388c3ac8d7c3fa9	2013-12-11 15:42:53 -05:00
Julius Volz	77a79d1fc0	Display filename when encountering bad rule file. Change-Id: I4729371be92c5659a6938145c5fde66771d7be22	2013-12-11 15:44:11 +01:00
Julius Volz	fb44580110	Cleanup/fix program termination sequence. Change-Id: I2bc58a2583fb079c9ef383cfc7a5e0fbe613f1cd	2013-12-11 15:40:32 +01:00
Julius Volz	740d448983	Use custom timestamp type for sample timestamps and related code. So far we've been using Go's native time.Time for anything related to sample timestamps. Since the range of time.Time is much bigger than what we need, this has created two problems: - there could be time.Time values which were out of the range/precision of the time type that we persist to disk, therefore causing incorrectly ordered keys. One bug caused by this was: https://github.com/prometheus/prometheus/issues/367 It would be good to use a timestamp type that's more closely aligned with what the underlying storage supports. - sizeof(time.Time) is 192, while Prometheus should be ok with a single 64-bit Unix timestamp (possibly even a 32-bit one). Since we store samples in large numbers, this seriously affects memory usage. Furthermore, copying/working with the data will be faster if it's smaller. MEMORY USAGE RESULTS Initial memory usage comparisons for a running Prometheus with 1 timeseries and 100,000 samples show roughly a 13% decrease in total (VIRT) memory usage. In my tests, this advantage for some reason decreased a bit the more samples the timeseries had (to 5-7% for millions of samples). This I can't fully explain, but perhaps garbage collection issues were involved. WHEN TO USE THE NEW TIMESTAMP TYPE The new clientmodel.Timestamp type should be used whenever time calculations are either directly or indirectly related to sample timestamps. For example: - the timestamp of a sample itself - all kinds of watermarks - anything that may become or is compared to a sample timestamp (like the timestamp passed into Target.Scrape()). When to still use time.Time: - for measuring durations/times not related to sample timestamps, like duration telemetry exporting, timers that indicate how frequently to execute some action, etc. NOTE ON OPERATOR OPTIMIZATION TESTS We don't use operator optimization code anymore, but it still lives in the code as dead code. It still has tests, but I couldn't get all of them to pass with the new timestamp format. I commented out the failing cases for now, but we should probably remove the dead code soon. I just didn't want to do that in the same change as this. Change-Id: I821787414b0debe85c9fffaeb57abd453727af0f	2013-12-03 09:11:28 +01:00
Julius Volz	c7daedc840	Merge "Add scalar() function."	2013-10-16 15:49:54 +02:00
Julius Volz	be8024e18c	Add scalar() function. Change-Id: I1d1183e926a18fc98c9e94bbb9a808a3fb313102	2013-09-17 15:01:16 +02:00
Julius Volz	93a8d03221	Merge "Add alert-expression console links to notifications."	2013-08-24 19:40:50 +02:00
Julius Volz	1eb1ceac8c	Add alert-expression console links to notifications. The ConsoleLinkForExpression() function now escapes console URLs in such a way that works both in emails and in HTML. Change-Id: I917bae0b526cbbac28ccd2a4ec3c5ac03ee4c647	2013-08-20 15:45:41 +02:00
Matt T. Proud	7db518d3a0	Abstract high watermark cache into standard LRU. Conflicts: storage/metric/memory.go storage/metric/tiered.go storage/metric/watermark.go Change-Id: Iab2aedbd8f83dc4ce633421bd4a55990fa026b85	2013-08-19 12:26:55 +02:00
Julius Volz	0003027dce	Add needed trailing spaces in logs.	2013-08-12 18:22:48 +02:00
Julius Volz	aa5d251f8d	Use github.com/golang/glog for all logging.	2013-08-12 17:54:36 +02:00
Julius Volz	3b970c5133	Add variable interpolation to notification messages. This includes required refactorings to enable replacing the http client (for testing) and moving the NotificationReq type definitions to the "notifications" package, so that this package doesn't need to depend on "rules" anymore and that it can instead use a representation of the required data which only includes the necessary fields.	2013-08-12 12:29:08 +02:00
Julius Volz	35ee2cd3cb	Add alertmanager notification support to Prometheus. Alert definitions now also have mandatory SUMMARY and DESCRIPTION fields that get sent along a firing alert to the alert manager.	2013-07-30 17:23:41 +02:00
Julius Volz	81f0b85013	Return [] instead of null for empty result vectors.	2013-07-25 12:16:32 +02:00
Julius Volz	64b0ade171	Swap rules lexer for much faster one. This swaps github.com/kivikakk/golex for github.com/cznic/golex. The old lexer would have taken 3.5 years to load a set of 5000 test rules (quadratic time complexity for input length), whereas this one takes only 32ms. Furthermore, since the new lexer is embedded differently, this gets rid of the global parser variables and makes the rule loader fully reentrant without a lock.	2013-07-11 19:35:29 +02:00
Julius Volz	d2da21121c	Implement getValueRangeAtIntervalOp for faster range queries. This also short-circuits optimize() for now, since it is complex to implement for the new operator, and ops generated by the query layer already fulfill the needed invariants. We should still investigate later whether to completely delete operator optimization code or extend it to support getValueRangeAtIntervalOp operators.	2013-06-26 18:10:36 +02:00
Matt T. Proud	30b1cf80b5	WIP - Snapshot of Moving to Client Model.	2013-06-25 15:52:42 +02:00
Julius Volz	8ee7947b1e	Ensure metric name is dropped correctly from alert labels in UI.	2013-06-14 13:03:19 +02:00
Julius Volz	0226d1ac7a	Implement alerts dashboard and expression console links.	2013-06-13 22:35:40 +02:00
Julius Volz	ba29d07901	Show loaded rules in Status dashboard.	2013-06-11 11:39:31 +02:00
Julius Volz	fc97e688c6	Improve printing of rules and expressions.	2013-06-11 11:39:31 +02:00
Julius Volz	74cb676537	Implement Stringer interface for rules and all their children.	2013-06-07 15:54:32 +02:00
Matt T. Proud	2c3df44af6	Ensure database access waits until it is started. This commit introduces a channel message to ensure serving state has been reached with the storage stack before anything attempts to use it.	2013-06-06 10:42:21 +02:00
Julius Volz	51689d965d	Add debug timers to instant and range queries. This adds timers around several query-relevant code blocks. For now, the query timer stats are only logged for queries initiated through the UI. In other cases (rule evaluations), the stats are simply thrown away. My hope is that this helps us understand where queries spend time, especially in cases where they sometimes hang for unusual amounts of time.	2013-06-05 18:32:54 +02:00
Julius Volz	adb87816f4	Put RuleManager concurrency in hands of caller, fix races.	2013-06-05 13:56:56 +02:00
Julius Volz	138334fb31	Fix handling of negative deltas for non-counter values.	2013-05-28 17:36:53 +02:00
Julius Volz	66d4620061	Don't assume delta has at least one sample per vector element.	2013-05-28 14:02:36 +02:00
Julius Volz	21c3be0814	Skip any empty range/boundary elements, not only nil ones.	2013-05-28 14:02:08 +02:00
Matt T. Proud	c10780c966	Introduce telemetry for rule evaluator durations. This commit adds telemetry for the Prometheus expression rule evaluator, which will enable meta-Prometheus monitoring of customers to ensure that no instance is falling behind in answering routine queries. A few other sundry simplifications are introduced, too.	2013-05-23 21:29:27 +02:00
Julius Volz	750f862d9a	Use GetBoundaryValues() for non-counter deltas.	2013-05-22 19:13:47 +02:00
Julius Volz	5b105c77fc	Repointerize fingerprints.	2013-05-21 14:28:14 +02:00
Matt T. Proud	8f4c7ece92	Destroy naked returns in half of corpus. The use of naked return values is frowned upon. This is the first of two bulk updates to remove them.	2013-05-16 10:53:25 +03:00
juliusv	516101f015	Merge pull request #250 from prometheus/refactor/drop-unused-storage-setting Drop unused writeMemoryInterval	2013-05-14 08:45:59 -07:00
juliusv	9ff00b651d	Merge pull request #251 from prometheus/fix/memory-metric-mutability Fix GetMetricForFingerprint() metric mutability.	2013-05-14 08:12:45 -07:00
Bernerd Schaefer	63d9988b9c	Drop unused writeMemoryInterval	2013-05-14 17:03:03 +02:00
Bernerd Schaefer	aa96c7d141	Fix rules_test.go This is smelly, but for now we copy a helper method from the metric tests into rules.	2013-05-14 16:55:18 +02:00
Julius Volz	83c60ad43a	Fix GetMetricForFingerprint() metric mutability. Some users of GetMetricForFingerprint() end up modifying the returned metric labelset. Since the memory storage's implementation of GetMetricForFingerprint() returned a pointer to the metric (and maps are reference types anyways), the external mutation propagated back into the memory storage. The fix is to make a copy of the metric before returning it.	2013-05-14 16:46:30 +02:00
Bernerd Schaefer	428d91c86f	Rename test helper files to helpers_test.go This ensures that these files are properly included only in testing.	2013-05-14 16:30:47 +02:00
Matt T. Proud	244a4a9cdb	Update to go1.1. This commit updates the documentation, Makefiles, formatting, and code semantics to support the 1.1. runtime, which includes ... 1. ``make advice``, 2. ``make format``, and 3. ``go fix`` on various targets.	2013-05-14 12:39:08 +02:00
Matt T. Proud	161c8fbf9b	Include deletion processor for long-tail values. This commit extracts the model.Values truncation behavior into the actual tiered storage, which uses it and behaves in a peculiar way—notably the retention of previous elements if the chunk were to ever go empty. This is done to enable interpolation between sparse sample values in the evaluation cycle. Nothing necessarily new here—just an extraction. Now, the model.Values TruncateBefore functionality would do what a user would expect without any surprises, which is required for the DeletionProcessor, which may decide to split a large chunk in two if it determines that the chunk contains the cut-off time.	2013-05-10 12:19:12 +02:00
Julius Volz	0877680761	Implement a COUNT ... BY aggregation operator. This also removes the now obsolete scalar count() function and corrects the expressions test naming (broken in `2202cd71c9 (L6R59)`) so that the expression tests will actually run.	2013-05-08 16:35:16 +02:00
Julius Volz	56324d8ce2	Make AST query storage non-global.	2013-05-07 13:15:10 +02:00
Matt T. Proud	ce45787dbf	Storage interface to TieredStorage. This commit drops the Storage interface and just replaces it with a publicized TieredStorage type. Storage had been anticipated to be used as a wrapper for testability but just was not used due to practicality. Merely overengineered. My bad. Anyway, we will eventually instantiate the TieredStorage dependencies in main.go and pass them in for more intelligent lifecycle management. These changes will pave the way for managing the curators without Law of Demeter violations.	2013-05-03 15:54:14 +02:00
Julius Volz	9cea5d9df8	Convert the Prometheus configuration to protocol buffers.	2013-04-30 22:26:00 +02:00
Julius Volz	d8110fcd9c	Send sample arrays instead of single samples over channels.	2013-04-29 17:24:17 +02:00
Julius Volz	dcf2e82752	Cleanup and idiomaticize rule/expression dot graph output.	2013-04-29 12:57:34 +02:00
Matt T. Proud	b3e34c6658	Implement batch database sample curator. This commit introduces to Prometheus a batch database sample curator, which corroborates the high watermarks for sample series against the curation watermark table to see whether a curator of a given type needs to be run. The curator is an abstract executor, which runs various curation strategies across the database. It remarks the progress for each type of curation processor that runs for a given sample series. A curation procesor is responsible for effectuating the underlying batch changes that are request. In this commit, we introduce the CompactionProcessor, which takes several bits of runtime metadata and combine sparse sample entries in the database together to form larger groups. For instance, for a given series it would be possible to have the curator effectuate the following grouping: - Samples Older than Two Weeks: Grouped into Bunches of 10000 - Samples Older than One Week: Grouped into Bunches of 1000 - Samples Older than One Day: Grouped into Bunches of 100 - Samples Older than One Hour: Grouped into Bunches of 10 The benefits hereof of such a compaction are 1. a smaller search space in the database keyspace, 2. better employment of compression for repetious values, and 3. reduced seek times.	2013-04-27 17:38:18 +02:00
Julius Volz	2202cd71c9	Track alerts over time and write out alert timeseries.	2013-04-26 14:35:21 +02:00
Julius Volz	c0601abf46	Implement initial no-op alert parsing and rule parsing tests.	2013-04-23 13:48:24 +02:00
Matt T. Proud	f9e99bd08a	Refresh SampleValue to 64-bit floating point. We always knew that this needed to be fixed.	2013-04-21 20:31:50 +02:00
Julius Volz	99dcbe0f94	Integrate memory and disk layers in view rendering.	2013-04-19 16:01:27 +02:00
Julius Volz	63625bd244	Make view use memory persistence, remove obsolete code. This makes the memory persistence the backing store for views and adjusts the MetricPersistence interface accordingly. It also removes unused Get* method implementations from the LevelDB persistence so they don't need to be adapted to the new interface. In the future, we should rethink these interfaces. All staleness and interpolation handling is now removed from the storage layer and will be handled only by the query layer in the future.	2013-04-18 22:26:29 +02:00
Julius Volz	1eb586db7d	Fix rule evaluation closure.	2013-04-17 15:11:21 +02:00
Julius Volz	5f5ea03105	Run "make format".	2013-04-16 17:23:59 +02:00
Julius Volz	1cff4f3d91	Fix rate() per-second adjustment. This got broken during the depointerization of the Vector type.	2013-04-15 14:41:34 +02:00
juliusv	62f33f1fc2	Merge pull request #138 from prometheus/julius-fix-aliasing Correct delta()/rate() intervals and temporal aliasing.	2013-04-15 05:38:48 -07:00
Matt T. Proud	167504efd6	Merge pull request #142 from prometheus/julius-lowercase-by Allow lower-case BY operator.	2013-04-15 05:13:35 -07:00
Julius Volz	d53b8cf956	Correct delta()/rate() intervals and temporal aliasing.	2013-04-15 12:30:46 +02:00
Julius Volz	000f6a2e23	Allow lower-case BY operator.	2013-04-15 11:56:23 +02:00
Julius Volz	a0d311c9e6	Constantize job name label.	2013-04-15 11:47:54 +02:00
Julius Volz	1bc83e1b65	Also allow lower-cased aggregation ops.	2013-04-11 18:25:22 +02:00
juliusv	f9c291120f	Merge pull request #123 from prometheus/julius-propagate-rule-errors Propagate more errors during rule evaluation.	2013-04-11 06:38:33 -07:00
Julius Volz	9a81b9838f	Make expression parser goroutine-safe. See https://github.com/prometheus/prometheus/issues/127	2013-04-10 19:17:28 +02:00
Julius Volz	6cb3c51d24	Add sort() and sort_desc() expression language functions.	2013-04-10 18:05:45 +02:00
Julius Volz	c4d0969c00	Propagate more errors during rule evaluation.	2013-04-09 13:47:20 +02:00
Julius Volz	e31591e6fe	Allow single-letter identifiers (metric and label names).	2013-03-28 18:37:54 +01:00
Julius Volz	ec413459fa	Depointerize Matrix/Vector types as well as time.Time arguments.	2013-03-28 18:07:12 +01:00
Julius Volz	676845afaf	Implement sample interpolation in query layer.	2013-03-28 16:41:51 +01:00
Matt T. Proud	c53a72a894	Test data for the curator.	2013-03-27 18:13:43 +01:00
Julius Volz	b836066c71	Eliminate need to get fingerprints during query execution time.	2013-03-27 14:42:03 +01:00
Julius Volz	55ca65aa6e	More userfriendly output when we fail to create the tiered storage.	2013-03-27 11:25:05 +01:00
Matt T. Proud	c4e971d7d9	Merge pull request #101 from prometheus/refactor/test/directory-extraction Create temporary directory handler.	2013-03-26 10:46:28 -07:00
Matt T. Proud	b86b0ea41a	Create temporary directory handler.	2013-03-26 18:09:25 +01:00
Julius Volz	2b8f0b2cc7	Constantize metric name label name.	2013-03-26 16:20:23 +01:00
Julius Volz	3880a86c9c	In case of empty query results, return an empty matrix.	2013-03-25 12:14:48 +01:00
Julius Volz	8e4c5b0cea	Use AST query analyzer and views with tiered storage.	2013-03-21 18:16:52 +01:00
Julius Volz	2f814d0e6d	AST persistence adapter simplifications after storage changes.	2013-03-21 18:11:03 +01:00
Julius Volz	6001d22f87	Change Get* methods to receive fingerprints instead of metrics.	2013-03-21 18:11:03 +01:00
Matt T. Proud	5959cd9e53	Include Julius' feedback.	2013-03-21 18:08:48 +01:00
Matt T. Proud	a70ee43ad3	Niladic ``ToString()`` to idiomatic ``String()``.	2013-03-21 18:08:47 +01:00
Matt T. Proud	41068c2e84	Checkpoint.	2013-03-21 18:06:51 +01:00
Matt T. Proud	13ae29b304	Initial in-memory arena implementation. It is unbounded, and nothing uses it except for a gating flag in main.	2013-02-18 09:38:14 -06:00
Julius Volz	c3d31febd6	Move durationToString to common place and cleanup error handling.	2013-02-14 19:02:23 +01:00
Matt T. Proud	efbe0e8a12	Interface simplification. GetMetricForFingerprint(model.Fingerprint) (*Metric, error) -> GetMetricForFingerprint(model.Fingerprint) (Metric, error)	2013-02-14 08:43:02 -08:00
Matt T. Proud	e8a733b525	Interface simplifications. GetFingerprintsForLabelSet ([]*Fingerprint, error) -> GetFingerprintsForLabelSet ([]Fingerprint, error)	2013-02-14 08:07:59 -08:00

1 2 3 4 5 ...

279 Commits (9cc7b393c50415f147616e3a87ae26e03f62ca20)