prometheus

Commit Graph

Author	SHA1	Message	Date
Fabian Reinartz	3ca11bcaf5	Switch Prometheus to promql package. This commit removes all functionality from rules/ that is now handled in promql/. All parts of Prometheus are changed to use the promql/ package.	2015-04-28 16:19:23 +02:00
Fabian Reinartz	624f27f4b6	Add ln, log2, log10 and exp functions to the query language.	2015-03-16 18:26:19 +01:00
Julius Volz	b2651027fc	Fix special value handling in division and modulo. This fixes https://github.com/prometheus/prometheus/issues/597	2015-03-16 14:23:40 +01:00
beorn7	13fcf1ddbc	Implement double-delta encoded chunks.	2015-03-05 20:33:26 +01:00
Fabian Reinartz	182de6b99f	Fix unary +/- expressions. Unary expressions cause parsing errors if they are done in the lexer by tokenizing them into the number. This fix moves unary expressions to the parser.	2015-03-03 13:30:08 +01:00
Fabian Reinartz	6f754073d5	Add OR operation and vector matching options. This commits implements the OR operation between two vectors. Vector matching using the ON clause is added to limit the set of labels that define a match between two elements. Group modifiers (GROUP_LEFT/GROUP_RIGHT) to request many-to-one matching are added.	2015-03-03 11:35:10 +01:00
Julius Volz	0ac931aed1	Also support parsing float formats like "2.".	2015-03-02 12:58:05 +01:00
Julius Volz	c2ab54e9a6	Support scientific notation and special float values. This adds support for scientific notation in the expression language, as well as for all possible literal forms of +Inf/-Inf/NaN. TODO: Keep enough state in the parser/lexer to distinguish contexts in which "Inf", "NaN", etc. should be parsed as a number vs. parsed as a label name. Currently, foo{nan="bar"} would be a syntax error. However, that is an existing bug for all our reserved words. E.g. foo{sum="bar"} is a syntax error as well. This should be fixed separately.	2015-03-01 19:31:16 +01:00
beorn7	9e7c3e3bcd	Add the histogram_quantile function. Since we are now getting really deep into floating point calculation, the tests had to take into account the precision loss. Since the rule tests are based on direct line matching in the output, implementing the "almost equal" semantics was pretty cumbersome, but here we are.	2015-02-22 01:04:51 +01:00
Julius Volz	42601acfde	Replace labelsToKey() with metric Fingerprint (fixes grouping bug).	2015-02-21 17:45:47 +01:00
Julius Volz	645cf57bed	Fix aggregation grouping key calculation.	2015-02-21 14:05:50 +01:00
Julius Volz	15b2b5aa66	Add tests for invalid uses of "offset".	2015-02-18 02:56:40 +01:00
Julius Volz	72d7b325a1	Implement offset operator. This allows changing the time offset for individual instant and range vectors in a query. For example, this returns the value of `foo` 5 minutes in the past relative to the current query evaluation time: foo offset 5m Note that the `offset` modifier always needs to follow the selector immediately. I.e. the following would be correct: sum(foo offset 5m) // GOOD. While the following would be incorrect: sum(foo) offset 5m // INVALID. The same works for range vectors. This returns the 5-minutes-rate that `foo` had a week ago: rate(foo[5m] offset 1w) This change touches the following components: * Lexer/parser: additions to correctly parse the new `offset`/`OFFSET` keyword. * AST: vector and matrix nodes now have an additional `offset` field. This is used during their evaluation to adjust query and result times appropriately. * Query analyzer: now works on separate sets of ranges and instants per offset. Isolating different offsets from each other completely in this way keeps the preloading code relatively simple. No storage engine changes were needed by this change. The rules tests have been changed to not probe the internal implementation details of the query analyzer anymore (how many instants and ranges have been preloaded). This would also become too cumbersome to test with the new model, and measuring the result of the query should be sufficient. This fixes https://github.com/prometheus/prometheus/issues/529 This fixed https://github.com/prometheus/promdash/issues/201	2015-02-18 02:41:27 +01:00
Brian Brazil	60271d58bf	Change the 2nd argument of round to toNearest. This is more useful if you want get a multiple of 2 or 5, while still working for .001.	2015-02-05 16:13:40 +00:00
Marko Mikulicic	8fdacbdf17	Add floor, ceil and round functions. Closes #402	2015-02-04 17:20:56 +01:00
Brian Brazil	a31730e88b	Make 2nd arg to delta optional. Add a deriv() function. The 2nd isCounter argument to delta is ugly, make it optional as the first step of deprecating it. This will makes delta only ever applied to gauges. Add a deriv function to calculate the least squares slope of a gauge. This is more useful for prediction than delta, as it isn't as heavily influenced by outliers at the boundaries.	2015-01-23 14:50:27 +00:00
Bjoern Rabenstein	5859b74f1b	Clean up license issues. - Move CONTRIBUTORS.md to the more common AUTHORS. - Added the required NOTICE file. - Changed "Prometheus Team" to "The Prometheus Authors". - Reverted the erroneous changes to the Apache License.	2015-01-21 20:07:45 +01:00
Julius Volz	cc27fb8aab	Rename remaining all-caps constants in AST layer. Change-Id: Ibe97e30981969056ffcdb89e63c1468ea1ffa140	2014-12-25 01:30:47 +01:00
Julius Volz	00a2a93a05	Add regression tests for metrics mutations in AST. It turned out in the end, that only drop_common_metrics() produced any erroneous output in the old system. The second expression in the test ("sum(testmetric) keeping_extra") already worked in the old code, but why not keep it in... The way to test ranged evaluations is a bit clumsy so far, so I want to build a nicer test framework in the end, where all the test cases can be specified as text files which specify desired inputs, outputs, query step widths, etc. Change-Id: I821859789e69b8232bededf670a1b76e9e8c8ca4	2014-12-12 20:34:55 +01:00
Bjoern Rabenstein	0ae1d8889a	Fix tests after merge. Change-Id: Ia90da9a3e48ed780ec38c4a6a1fd9ea34e7f6a58	2014-11-25 17:13:04 +01:00
Julius Volz	b7bf11230a	Add absent() function. A common problem in Prometheus alerting is to detect when no timeseries exist for a given metric name and label combination. Unfortunately, Prometheus alert expressions need to be of vector type, and "count(nonexistent_metric)" results in an empty vector, yielding no output vector elements to base an alert on. The newly introduced absent() function solves this issue: ALERT FooAbsent IF absent(foo{job="myjob"}) [...] absent() has the following behavior: - if the vector passed to it has any elements, it returns an empty vector. - if the vector passed to it has no elements, it returns a 1-element vector with the value 1. In the second case, absent() tries to be smart about deriving labels of the 1-element output vector from the input vector: absent(nonexistent{job="myjob"}) => {job="myjob"} absent(nonexistent{job="myjob",instance=~".*"}) => {job="myjob"} absent(sum(nonexistent{job="myjob"})) => {} That is, if the passed vector is a literal vector selector, it takes all "=" label matchers as the basis for the output labels, but ignores all non-equals or regex matchers. Also, if the passed vector results from a non-selector expression, no labels can be derived. Change-Id: I948505a1488d50265ab5692a3286bd7c8c70cd78	2014-11-25 17:13:04 +01:00
Julius Volz	3d47f94149	Drop metric names after transformations. After many transformations, it doesn't make sense to keep the metric names, since the result of the transformation is no longer that metric. This drops the metric name after such transformations and makes the web UI deal well with missing metric names. This depends on the current branch on the following things: - prometheus/client_golang needs to be at `e237cf15c6` in branch "julius/int-fingerprints" (to be merged with new storage) - prometheus/promdash needs to be at `dd7691c9c2` Change-Id: Ib3c8cad8d647d9854e8c653c424b8c235ccc231d	2014-11-25 17:13:04 +01:00
Julius Volz	0712d738d1	Allow alternative "by"-clause position in grammar. In addition to the existing by-clause syntax: sum(<expression>) by (<labels>) [keeping_extra] ...this allows the following new syntax: sum by (<labels>) [keeping_extra] (<expression>) Both orderings may be used in a single expression. It is up to the users to establish guidelines around their usage. Change-Id: Iba10c9cc5fb6ac62edfcf246d281473e82467992	2014-11-25 17:09:04 +01:00
Julius Volz	0e48c18bbf	Allow omitting the metric name in queries. This allows the following expression syntaxes for selecting timeseries: foo (already valid before) foo{} (already valid before) {job="prometheus"} (new, select all timeseries for job "prometheus") Omitting both the metric name and any label matchers ("" or "{}") will still yield a syntax error. To get all timeseries, you could do: {__name__=~"."} or, without relying on knowledge about __metric__: {job=~"."} Change-Id: Ifee000b9ac0184ef6ced18411069c7f2699a2dda	2014-11-25 17:09:04 +01:00
Julius Volz	7f5d3c2c29	Fix and improve the fp locker. Benchmark: $ go test -bench 'Fingerprint' -test.run 'Fingerprint' -test.cpu=1,2,4 OLD BenchmarkFingerprintLockerParallel 500000 3618 ns/op BenchmarkFingerprintLockerParallel-2 100000 12257 ns/op BenchmarkFingerprintLockerParallel-4 500000 10164 ns/op BenchmarkFingerprintLockerSerial 10000000 283 ns/op BenchmarkFingerprintLockerSerial-2 10000000 284 ns/op BenchmarkFingerprintLockerSerial-4 10000000 288 ns/op NEW BenchmarkFingerprintLockerParallel 1000000 1018 ns/op BenchmarkFingerprintLockerParallel-2 1000000 1164 ns/op BenchmarkFingerprintLockerParallel-4 2000000 910 ns/op BenchmarkFingerprintLockerSerial 50000000 56.0 ns/op BenchmarkFingerprintLockerSerial-2 50000000 47.9 ns/op BenchmarkFingerprintLockerSerial-4 50000000 54.5 ns/op Change-Id: I3c65a43822840e7e64c3c3cfe759e1de51272581	2014-11-25 17:07:45 +01:00
Julius Volz	e7ed39c9a6	Initial experimental snapshot of next-gen storage. Change-Id: Ifb8709960dbedd1d9f5efd88cdd359ee9fa9d26d	2014-11-25 17:02:00 +01:00
Julius Volz	85497e3f38	Add function to drop common labels in a vector. This fixes https://github.com/prometheus/prometheus/issues/384. Change-Id: I2973c4baeb8a4618ec3875fb11c6fcf5d111784b	2014-11-25 17:02:00 +01:00
Julius Volz	3fdb74e571	Add more topk() / bottomk() tests. Test what happens if k > number of input elements. Change-Id: Ie724b850939e297ebf085f0a5a3522e9cfcc6534	2014-11-25 17:02:00 +01:00
Julius Volz	c582ae73c2	Implement topk() and bottomk() functions. To achieve O(log n * k) runtime, this uses a heap to track the current bottom-k or top-k elements while iterating over the full set of available elements. It would be possible to reuse more code between topk and bottomk, but I decided for some more duplication for the sake of clarity. This fixes https://github.com/prometheus/prometheus/issues/399 Change-Id: I7487ddaadbe7acb22ca2cf2283ba6e7915f2b336	2014-11-25 17:02:00 +01:00
Julius Volz	00b9489f1c	Fix time() behavior. time() should return the timestamp for which the query is executed, not the actual current time. Change-Id: I430a45cabad7785cd58f95b1028a71dff4c87710	2014-11-25 17:02:00 +01:00
Julius Volz	c5984f1818	Add abs() and over-time aggregation functions. This implements aggregation functions over time as request in https://github.com/prometheus/prometheus/issues/383. Change-Id: Ifd69b850de8cfdf6e7a6c0e042056fa4c672410e	2014-11-25 17:02:00 +01:00
Brian Brazil	e041c0cd46	Add console and alert templates with access to all data. Move rulemanager to it's own package to break cicrular dependency. Make NewTestTieredStorage available to tests, remove duplication. Change-Id: I33b321245a44aa727bfc3614a7c9ae5005b34e03	2014-05-30 16:24:56 +01:00
Bjoern Rabenstein	ca6a4fccef	Weed out our homegrown test.Tester. The Go stdlib has testing.TB now, which fulfills the exact same purpose. Change-Id: I0db9c73400e208ca376b932a02b7e3402234b87c	2014-05-21 19:27:24 +02:00
Julius Volz	01f652cb4c	Separate storage implementation from interfaces. This was initially motivated by wanting to distribute the rule checker tool under `tools/rule_checker`. However, this was not possible without also distributing the LevelDB dynamic libraries because the tool transitively depended on Levigo: rule checker -> query layer -> tiered storage layer -> leveldb This change separates external storage interfaces from the implementation (tiered storage, leveldb storage, memory storage) by putting them into separate packages: - storage/metric: public, implementation-agnostic interfaces - storage/metric/tiered: tiered storage implementation, including memory and LevelDB storage. I initially also considered splitting up the implementation into separate packages for tiered storage, memory storage, and LevelDB storage, but these are currently so intertwined that it would be another major project in itself. The query layers and most other parts of Prometheus now have notion of the storage implementation anymore and just use whatever implementation they get passed in via interfaces. The rule_checker is now a static binary :) Change-Id: I793bbf631a8648ca31790e7e772ecf9c2b92f7a0	2014-04-16 13:30:19 +02:00
Julius Volz	d411a7d810	Allow reversing vector and scalar arguments in binops. This allows putting a scalar as the first argument of a binary operator in which the second argument is a vector: <scalar> <binop> <vector> For example, 1 / http_requests_total ...will output a vector in which every sample value is 1 divided by the respective input vector element. This even works for filter binary operators now: 1 == http_requests_total Returns a vector with all values set to 1 for every element in http_requests_total whose initial value was 1. Note: For filter binary operators, the resulting values are always taken from the left-hand-side of the operation, no matter whether the scalar or the vector argument is the left-hand-side. That is, 1 != http_requests_total ...will set all result vector sample values to 1, although these are exactly the sample elements that were != 1 in the input vector. If you want to just filter elements without changing their sample values, you still need to do: http_requests_total != 1 The new filter form is a bit exotic, and so probably won't be used often. But it was easier to implement it than disallow it completely or change its behavior. Change-Id: Idd083f2bd3a1219ba1560cf4ace42f5b82e797a5	2014-04-08 17:16:18 +02:00
Julius Volz	c7c0b33d0b	Add regex-matching support for labels. There are four label-matching ops for selecting timeseries now: - Equal: = - NotEqual: != - RegexMatch: =~ - RegexNoMatch: !~ Instead of looking up labels by a simple clientmodel.LabelSet (basically an equals op for every key/value pair in the set), timeseries fingerprint selection is now done via a list of metric.LabelMatchers. Change-Id: I510a83f761198e80946146770ebb64e4abc3bb96	2014-04-01 14:24:53 +02:00
Julius Volz	7e9ecaac3a	Add count_scalar() function. Change-Id: I63f09dd0479d0a6b016f5f857dd39dcbda56c7f9	2014-01-30 13:07:26 +01:00
Julius Volz	0378c2ca1f	Nonexistent labels in BY-clauses shouldn't propagate to result. This fixes bug 2. of https://github.com/prometheus/prometheus/issues/374 Change-Id: Ia4a13153616bafce5bf10597966b071434422d09	2014-01-24 16:05:30 +01:00
Julius Volz	6dc36d0c3e	Don't keep extra labels in aggregations by default. MIN/MAX/SUM/AVG/COUNT aggregations will now by default drop all labels that are not specifically part of a BY-clause, even if a label value is the same within all timeseries of an aggregation group. The old behavior of keeping extra labels may still be switched on by adding KEEPING_EXTRA to the end of an aggregation statement: sum(http_requests) by (job, method) keeping_extra I'm open to better syntax/naming suggestions. Change-Id: I21d3fe7af9e98552ce3dffa3ce7c0a4ba4c0b4a4	2013-12-16 12:53:10 +01:00
Julius Volz	740d448983	Use custom timestamp type for sample timestamps and related code. So far we've been using Go's native time.Time for anything related to sample timestamps. Since the range of time.Time is much bigger than what we need, this has created two problems: - there could be time.Time values which were out of the range/precision of the time type that we persist to disk, therefore causing incorrectly ordered keys. One bug caused by this was: https://github.com/prometheus/prometheus/issues/367 It would be good to use a timestamp type that's more closely aligned with what the underlying storage supports. - sizeof(time.Time) is 192, while Prometheus should be ok with a single 64-bit Unix timestamp (possibly even a 32-bit one). Since we store samples in large numbers, this seriously affects memory usage. Furthermore, copying/working with the data will be faster if it's smaller. MEMORY USAGE RESULTS Initial memory usage comparisons for a running Prometheus with 1 timeseries and 100,000 samples show roughly a 13% decrease in total (VIRT) memory usage. In my tests, this advantage for some reason decreased a bit the more samples the timeseries had (to 5-7% for millions of samples). This I can't fully explain, but perhaps garbage collection issues were involved. WHEN TO USE THE NEW TIMESTAMP TYPE The new clientmodel.Timestamp type should be used whenever time calculations are either directly or indirectly related to sample timestamps. For example: - the timestamp of a sample itself - all kinds of watermarks - anything that may become or is compared to a sample timestamp (like the timestamp passed into Target.Scrape()). When to still use time.Time: - for measuring durations/times not related to sample timestamps, like duration telemetry exporting, timers that indicate how frequently to execute some action, etc. NOTE ON OPERATOR OPTIMIZATION TESTS We don't use operator optimization code anymore, but it still lives in the code as dead code. It still has tests, but I couldn't get all of them to pass with the new timestamp format. I commented out the failing cases for now, but we should probably remove the dead code soon. I just didn't want to do that in the same change as this. Change-Id: I821787414b0debe85c9fffaeb57abd453727af0f	2013-12-03 09:11:28 +01:00
Julius Volz	35ee2cd3cb	Add alertmanager notification support to Prometheus. Alert definitions now also have mandatory SUMMARY and DESCRIPTION fields that get sent along a firing alert to the alert manager.	2013-07-30 17:23:41 +02:00
Julius Volz	64b0ade171	Swap rules lexer for much faster one. This swaps github.com/kivikakk/golex for github.com/cznic/golex. The old lexer would have taken 3.5 years to load a set of 5000 test rules (quadratic time complexity for input length), whereas this one takes only 32ms. Furthermore, since the new lexer is embedded differently, this gets rid of the global parser variables and makes the rule loader fully reentrant without a lock.	2013-07-11 19:35:29 +02:00
Matt T. Proud	30b1cf80b5	WIP - Snapshot of Moving to Client Model.	2013-06-25 15:52:42 +02:00
Julius Volz	fc97e688c6	Improve printing of rules and expressions.	2013-06-11 11:39:31 +02:00
Matt T. Proud	2c3df44af6	Ensure database access waits until it is started. This commit introduces a channel message to ensure serving state has been reached with the storage stack before anything attempts to use it.	2013-06-06 10:42:21 +02:00
Julius Volz	51689d965d	Add debug timers to instant and range queries. This adds timers around several query-relevant code blocks. For now, the query timer stats are only logged for queries initiated through the UI. In other cases (rule evaluations), the stats are simply thrown away. My hope is that this helps us understand where queries spend time, especially in cases where they sometimes hang for unusual amounts of time.	2013-06-05 18:32:54 +02:00
Julius Volz	138334fb31	Fix handling of negative deltas for non-counter values.	2013-05-28 17:36:53 +02:00
Julius Volz	750f862d9a	Use GetBoundaryValues() for non-counter deltas.	2013-05-22 19:13:47 +02:00
Bernerd Schaefer	63d9988b9c	Drop unused writeMemoryInterval	2013-05-14 17:03:03 +02:00
Bernerd Schaefer	aa96c7d141	Fix rules_test.go This is smelly, but for now we copy a helper method from the metric tests into rules.	2013-05-14 16:55:18 +02:00

1 2

76 Commits (95fbe51c5091aa74a4963aa718e2a63d719e1a18)