Commit Graph

405 Commits (1a38075f83831cb26162dd521b22b2da05bbe6e4)

Author SHA1 Message Date
Fabian Reinartz 5772f1a7ba retrieval/storage: adapt to new interface
This simplifies the interface to two add methods for
appends with labels or faster reference numbers.
2017-02-02 13:05:46 +01:00
Fabian Reinartz 1d3cdd0d67 Merge branch 'master' into dev-2.0-rebase 2017-01-30 17:43:01 +01:00
Fabian Reinartz 035976b275 retrieval: handle not found error correctly 2017-01-20 11:27:01 +01:00
Fabian Reinartz ad9bc62e4c storage: extend appender and adapt it 2017-01-13 14:48:01 +01:00
André Carvalho c43dfaba1c Add max concurrent and current queries engine metrics (#2326)
* Add max concurrent and current queries engine metrics

This commit adds two metrics to the promql/engine: the
number of max concurrent queries, as configured by the flag, and
the number of current queries being served+blocked in the engine.
2017-01-07 14:41:25 +00:00
Fabian Reinartz bc20d93f0a storage: rename iterator value getters to At() 2017-01-02 13:33:37 +01:00
Fabian Reinartz 28f547bcc7 api/v1: fix tests, restore series queries 2016-12-30 10:43:44 +01:00
Fabian Reinartz e94b0899ee rules: fix tests, remove model types 2016-12-29 17:31:14 +01:00
Fabian Reinartz f8fc1f5bb2 *: migrate ingestion to new batch Appender 2016-12-29 11:03:56 +01:00
Fabian Reinartz 71fe0c58a8 promql: misc fixes 2016-12-28 11:32:15 +01:00
Fabian Reinartz fecf9532b9 *: fix misc compile errors 2016-12-25 11:42:57 +01:00
Fabian Reinartz 0492ddbd4d *: fully decouple tsdb, add new storage interfaces 2016-12-25 01:43:22 +01:00
Fabian Reinartz 9ea10d5265 promql: use labels.Builder to modify labels 2016-12-24 14:35:24 +01:00
Fabian Reinartz c6cd998905 promql: use local labels, add conversion 2016-12-24 14:01:37 +01:00
Fabian Reinartz ff504af2aa promql: undo accidental exports 2016-12-24 11:41:37 +01:00
Fabian Reinartz 6dedf89cc3 promql: rename SampleStream to Series 2016-12-24 11:32:42 +01:00
Fabian Reinartz c5f225b920 promql: export Sample 2016-12-24 11:32:10 +01:00
Fabian Reinartz 65581a3d46 promql: export SmapleStream 2016-12-24 11:29:39 +01:00
Fabian Reinartz 6315d00942 promql: export String value 2016-12-24 11:25:26 +01:00
Fabian Reinartz ac5d3bc05e promql: scalar T/V and Point 2016-12-24 11:23:06 +01:00
Fabian Reinartz 09666e2e2a promql: make scalar public 2016-12-24 10:44:04 +01:00
Fabian Reinartz b3f71df350 promql: make matrix exported 2016-12-24 10:42:54 +01:00
Fabian Reinartz a62df87022 promql: rename vector 2016-12-24 10:40:09 +01:00
Fabian Reinartz 15a931dbdb promql: migrate model types, use tsdb interfaces 2016-12-24 00:39:52 +01:00
Tristan Colgate ab60bc3929 Fix export of grouping modifier 2016-11-21 14:42:45 +00:00
Tristan Colgate 68fc15fe4e Report type names in the form used in documentation 2016-11-18 10:12:55 +00:00
beorn7 4e3abc6cbf Simply use `math.Mod(float64, float64)` after all
This circumvents all the problems with int overflow, plus it is what was originally intended.
2016-11-08 21:03:31 +01:00
beorn7 5cf5bb427a Check for int64 overflow when converting from float64 2016-11-05 00:48:32 +01:00
beorn7 92c0ef1a92 Merge branch 'release-1.2' into beorn7/release 2016-11-03 22:48:39 +01:00
beorn7 07f1bdfe94 Fix MOD binop for scalars and vectors
Previously, a floating point number that would round down to 0 would
cause a "division by zero" panic.
2016-11-03 19:03:44 +01:00
Brian Brazil e1cfc994f7 Correctly handle on() in alerts. (#2096)
Fixes #2082
2016-10-28 14:15:24 +02:00
Brian Brazil c4b4a58e3a Correctly handle on() in alerts. (#2096)
Fixes #2082
2016-10-19 18:38:26 +01:00
Fabian Reinartz 8fa18d564a storage: enhance Querier interface usage
This extracts Querier as an instantiateable and closeable object
rather than just defining extending methods of the storage interface.
This improves composability and allows abstracting query transactions,
which can be useful for transaction-level caches, consistent data views,
and encapsulating teardown.
2016-10-16 10:39:29 +02:00
Fabian Reinartz ccbce0c51f promql: handle NaN in changes() correctly 2016-09-30 11:04:25 +02:00
Julius Volz c187308366 storage: Contextify storage interfaces.
This is based on https://github.com/prometheus/prometheus/pull/1997.

This adds contexts to the relevant Storage methods and already passes
PromQL's new per-query context into the storage's query methods.
The immediate motivation supporting multi-tenancy in Frankenstein, but
this could also be used by Prometheus's normal local storage to support
cancellations and timeouts at some point.
2016-09-19 16:29:07 +02:00
Julius Volz ed5a0f0abe promql: Allow per-query contexts.
For Weaveworks' Frankenstein, we need to support multitenancy. In
Frankenstein, we initially solved this without modifying the promql
package at all: we constructed a new promql.Engine for every
query and injected a storage implementation into that engine which would
be primed to only collect data for a given user.

This is problematic to upstream, however. Prometheus assumes that there
is only one engine: the query concurrency gate is part of the engine,
and the engine contains one central cancellable context to shut down all
queries. Also, creating a new engine for every query seems like overkill.

Thus, we want to be able to pass per-query contexts into a single engine.

This change gets rid of the promql.Engine's built-in base context and
allows passing in a per-query context instead. Central cancellation of
all queries is still possible by deriving all passed-in contexts from
one central one, but this is now the responsibility of the caller. The
central query context is now created in main() and passed into the
relevant components (web handler / API, rule manager).

In a next step, the per-query context would have to be passed to the
storage implementation, so that the storage can implement multi-tenancy
or other features based on the contextual information.
2016-09-19 15:38:17 +02:00
Tobias Schmidt 29ced0090f Fix common english misspellings 2016-09-14 23:23:28 -04:00
Matt Bostock a0201036fa PromQL: Add tests for time/date funcs with arg
Add tests for the date and time functions where an argument is
specified.

Suggested by @grobie:
https://github.com/prometheus/prometheus/pull/1984#issuecomment-246508286

`1136239445` is the reference time used by Go:
https://golang.org/src/time/format.go
2016-09-12 23:12:43 +01:00
Matt Bostock 9628eb5998 PromQL: Add minute() function
Returns the minutes from the current time in UTC. Related to the
`hour()` function.

Fixes #1983.
2016-09-12 20:34:23 +01:00
Tobias Schmidt 04ae6196f2 Fix parsing of label names which are also keywords
The current separation between lexer and parser is a bit fuzzy when it
comes to operators, aggregators and other keywords. The lexer already
tries to determine the type of a token, even though that type might
change depending on the context.

This led to the problematic behavior that no tokens known to the lexer
could be used as label names, including operators (and, by, ...),
aggregators (count, quantile, ...) or other keywords (for, offset, ...).

This change additionally checks whether an identifier is one of these
types. We might want to check whether the specific item identification
should be moved from the lexer to the parser.
2016-09-07 17:45:58 -04:00
Fabian Reinartz ab88057063 Merge pull request #1908 from prometheus/on-dates
Add various time and date functions
2016-08-30 11:03:23 +02:00
Brian Brazil 4680daf237 Default date functions to current time. 2016-08-29 18:22:12 +01:00
Fabian Reinartz 23ddbd64aa Merge pull request #1925 from hashmap/1898-test-race
Fix data race in lexer and lexer test
2016-08-29 09:28:02 +02:00
Alexey Miroshkin bf0e441576 Instantiate lexer inline for the test
Don't use the lex constructor, remove the constructor introduced in the
prevous commit.
2016-08-29 09:20:43 +02:00
Alexey Miroshkin 485f7dde08 Fix data race in lexer and lexer test
As described in #1898 'go test -race' detects a race in lexer code. This
pacth fixes it and also add '-race' option to test target to prevent
regression.
2016-08-26 17:07:17 +02:00
beorn7 71571a8ec4 promql: Fix (and simplify) populating iterators
This was only relevant so far for the benchmark suite as it would
recycle Expr for repetitions. However, the append is unnecessary as
each node is only inspected once when populating iterators, and
population must always start from scratch.

This also introduces error checking during benchmarks and fixes the so
far undetected test errors during benchmarking.

Also, remove a style nit (two golint warnings less…).
2016-08-24 18:37:09 +02:00
Brian Brazil ea1318f38b Short names of some date related functions 2016-08-23 22:34:22 +01:00
Brian Brazil d2ca2b496a Add days_in_month function. 2016-08-22 21:15:35 +01:00
Brian Brazil 0ed31c8c47 Sort list of functions. 2016-08-22 21:15:34 +01:00
Brian Brazil fd7822829c Add date related functions.
Add day_of_month, day_of_week, hour_of_day, month_of_year and year.
This only work for UTC, and ignore leap seconds the same as Go.
2016-08-22 21:15:30 +01:00
Fabian Stäber 08b6556ee6 Assume counters start at zero after reset. 2016-08-12 20:21:04 +02:00
Fabian Reinartz 98c0d33567 Merge pull request #1875 from brancz/idelta-function
add idelta function
2016-08-08 12:33:07 +02:00
Frederic Branczyk f02df4138c refactor duplication of irate and idelta functions implementations 2016-08-08 10:52:00 +02:00
Frederic Branczyk dbf83666bb add idelta function
similar to the irate function the idelta function calculates the delta
function with the last two values
2016-08-08 10:40:50 +02:00
Frederic Branczyk 0ce5e7fe6d move legacy test for delta function 2016-08-08 10:02:58 +02:00
Julius Volz 3bfec97d46 Make the storage interface higher-level.
See discussion in
https://groups.google.com/forum/#!topic/prometheus-developers/bkuGbVlvQ9g

The main idea is that the user of a storage shouldn't have to deal with
fingerprints anymore, and should not need to do an individual preload
call for each metric. The storage interface needs to be made more
high-level to not expose these details.

This also makes it easier to reuse the same storage interface for remote
storages later, as fewer roundtrips are required and the fingerprint
concept doesn't work well across the network.

NOTE: this deliberately gets rid of a small optimization in the old
query Analyzer, where we dedupe instants and ranges for the same series.
This should have a minor impact, as most queries do not have multiple
selectors loading the same series (and at the same offset).
2016-07-25 13:59:22 +02:00
Brian Brazil 0303ccc6a7 Add quantile aggregator. 2016-07-21 00:09:19 +01:00
Brian Brazil 15f9fe0a45 Factor out quantile fucntion. 2016-07-20 23:56:18 +01:00
Brian Brazil b0342ba9ec Add quantile_over_time function 2016-07-20 23:56:18 +01:00
beorn7 fc6737b7fb storage: improve index lookups
tl;dr: This is not a fundamental solution to the indexing problem
(like tindex is) but it at least avoids utilizing the intersection
problem to the greatest possible amount.

In more detail:

Imagine the following query:

    nicely:aggregating:rule{job="foo",env="prod"}

While it uses a nicely aggregating recording rule (which might have a
very low cardinality), Prometheus still intersects the low number of
fingerprints for `{__name__="nicely:aggregating:rule"}` with the many
thousands of fingerprints matching `{job="foo"}` and with the millions
of fingerprints matching `{env="prod"}`. This totally innocuous query
is dead slow if the Prometheus server has a lot of time series with
the `{env="prod"}` label. Ironically, if you make the query more
complicated, it becomes blazingly fast:

    nicely:aggregating:rule{job=~"foo",env=~"prod"}

Why so? Because Prometheus only intersects with non-Equal matchers if
there are no Equal matchers. That's good in this case because it
retrieves the few fingerprints for
`{__name__="nicely:aggregating:rule"}` and then starts right ahead to
retrieve the metric for those FPs and checking individually if they
match the other matchers.

This change is generalizing the idea of when to stop intersecting FPs
and go into "retrieve metrics and check them individually against
remaining matchers" mode:

- First, sort all matchers by "expected cardinality". Matchers
  matching the empty string are always worst (and never used for
  intersections). Equal matchers are in general consider best, but by
  using some crude heuristics, we declare some better than others
  (instance labels or anything that looks like a recording rule).

- Then go through the matchers until we hit a threshold of remaining
  FPs in the intersection. This threshold is higher if we are already
  in the non-Equal matcher area as intersection is even more expensive
  here.

- Once the threshold has been reached (or we have run out of matchers
  that do not match the empty string), start with "retrieve metrics
  and check them individually against remaining matchers".

A beefy server at SoundCloud was spending 67% of its CPU time in index
lookups (fingerprintsForLabelPairs), serving mostly a dashboard that
is exclusively built with recording rules. With this change, it spends
only 35% in fingerprintsForLabelPairs. The CPU usage dropped from 26
cores to 18 cores. The median latency for query_range dropped from 14s
to 50ms(!). As expected, higher percentile latency didn't improve that
much because the new approach is _occasionally_ running into the worst
case while the old one was _systematically_ doing so. The 99th
percentile latency is now about as high as the median before (14s)
while it was almost twice as high before (26s).
2016-07-20 17:35:53 +02:00
Brian Brazil 40f8da699e Merge pull request #1815 from prometheus/stddev
Add stddev_over_time and stdvar_over_time.
2016-07-19 15:48:32 +01:00
Brian Brazil 1edd6875f5 Add stddev_over_time and stdvar_over_time. 2016-07-16 00:34:44 +01:00
Fabian Reinartz f8bb0ee91f Merge pull request #1793 from prometheus/count_values
Add count_values() aggregator.
2016-07-08 11:50:42 +02:00
Brian Brazil 875818d060 Clean out old keywords 2016-07-07 05:30:48 +01:00
Brian Brazil 16690736ab Add count_values() aggregator.
This is useful for counting how many instances
of a job are running a particular version/build.

Fixes #622
2016-07-05 17:14:01 +01:00
Brian Brazil 7f23a4a099 Add type check on topk/bottomk parameter. 2016-07-04 18:03:05 +01:00
Brian Brazil fa9cc15573 Add topk/bottomk tests for multiple buckets. 2016-07-04 13:18:28 +01:00
Brian Brazil 3b0c182eee Move topk/bottomk unittests over to aggregators. 2016-07-04 13:18:28 +01:00
Brian Brazil 3e5136e36d Make topk/bottomk aggregators. 2016-07-04 13:18:19 +01:00
Fabian Reinartz 4d1985e405 Merge pull request #1778 from mattbostock/fix_annotations
promql: Fix annotations conflated with labels
2016-07-01 11:45:18 +02:00
Matt Bostock cc98e164d3 promql: Fix annotations conflated with labels
When converting `AlertStmt` to a string, the alert rule labels were
printed as `ANNOTATIONS` instead of the annotations themselves.

Fix and add a test to catch future regressions.
2016-07-01 10:39:17 +01:00
Brian Brazil 3b89616d82 Allow on, ignoring, by and without wit empty laberls.
This offers new semantics in allowing on() for matching
two single-element vectors with no known common labels.
Previosuly this was often done using on(dummy).

This also allows making it explict that you meant
to do an aggregation without labels via by().

Fixes #1597.
2016-06-24 14:12:51 +01:00
Brian Brazil 246a817300 Flip vector matching to be ignoring by default.
This is a noop semantically.
2016-06-23 17:23:44 +01:00
Julius Volz b7b6717438 Separate query interface out of local.Storage.
PromQL only requires a much narrower interface than local.Storage in
order to run queries. Narrower interfaces are easier to replace and
test, too.

We could also change the web interface to use local.Querier, except that
we'll probably use appending functions from there in the future.
2016-06-23 15:14:38 +02:00
Fabian Reinartz 0e281f5500 Merge pull request #1687 from royels/issue-1629
Added power binop
2016-06-23 10:28:57 +02:00
royels 2fdc5717a3 promql: add power binary operation 2016-06-22 23:34:46 -04:00
Fatih Arslan 362e44501a promql: fix printing annotations of an *AlertStmt
Currently the printer doesn't print the annotations of an `*AlertStmt`
declaration. I've added a test case as well, which fails for the current
master.
2016-06-16 17:43:54 +03:00
beorn7 e3ec8fa83b Merge branch 'release-0.19' 2016-05-29 21:06:44 +02:00
beorn7 5408666387 Correctly stringify GROUP_x modifiers without labels
Since rule evaluations work via String(), this fixes evaluation of
rules containing GROUP_x modifiers without labels. This change is the
minimal bugfix (so that we can release a fixed version without
risk). It does not intend to implement any additional features (like
allowing `GROUP_LEFT()` or `ON()` or even `ON` - see discussion in
https://github.com/prometheus/prometheus/issues/1597 ).
2016-05-28 20:15:02 +02:00
Ali Reza e7eba75690 remove keeping_extra because it's replaced with keep_common
change all keepExtra label into keepCommon, and move action into removed list

change incorrect token list
2016-05-27 00:02:04 +07:00
Brian Brazil 74094947ea effect -> affect 2016-05-12 15:14:48 +01:00
Brian Brazil 68aaea618a Merge pull request #1624 from dmitris/golint
(trivial) fix several minor golint style issues
2016-05-11 14:20:19 +01:00
Fabian Reinartz bbc4f11bcc Merge pull request #945 from msiebuhr/fuzz
Fuzz parsers
2016-05-11 14:31:31 +02:00
Dmitry Savintsev 7fdb62c253 fix several minor golint style issues 2016-05-11 14:26:18 +02:00
Morten Siebuhr ffc8cab39a Updates fuzzers to discard less interesting data 2016-05-10 11:46:03 +02:00
Brian Brazil ef55fd6176 Add unittest for using a metric for thresholds with group_left. 2016-05-08 16:58:23 +01:00
Morten Siebuhr 981b636004 Bring fuzzer error handling in line. 2016-04-29 22:50:24 +02:00
Morten Siebuhr 9eb2e98509 Fix up documentation + go fmt. 2016-04-29 22:50:24 +02:00
Morten Siebuhr 7371dcc787 Fuzzing corpus for ParseMetric. 2016-04-29 22:50:24 +02:00
Morten Siebuhr 5fec020b27 Initial fuzzing corpus for ParseExpr. 2016-04-29 22:50:24 +02:00
Morten Siebuhr 0ebcca5eb7 Add basic fuzzer of the parser. 2016-04-29 22:50:24 +02:00
Brian Brazil 68e70d992a Clarify error message around on(x) group_left(x) 2016-04-26 14:31:00 +01:00
Brian Brazil 7201c010c4 Rename On to MatchingLabels 2016-04-26 14:28:36 +01:00
Brian Brazil d991f0cf47 For many-to-one matches, always copy label from one side.
This is a breaking change for everyone using the machine roles
labeling approach.
2016-04-21 19:35:41 +01:00
Brian Brazil 768d09fd2a Change on+group_* to take copy from the one side.
If the label doesn't exist on the one side, it's not copied.

All labels on the many inside are included, this is a breaking change
but likely low impact.
2016-04-21 19:35:40 +01:00
Brian Brazil d1edfb25b3 Add support for OneToMany with IGNORING.
The labels listed in the group_ modifier will be copied from the one
side to the many side. It will be valid to specify no labels.

This is intended to replace the existing ON/GROUP_* support.,
2016-04-21 19:35:35 +01:00
Brian Brazil 1d08c4fef0 Add 'ignoring' as modifier for binops.
Where 'on' uses the given labels to match,
'ignoring' uses all other labels to match.

group_left/right is not supported yet.
2016-04-21 19:34:29 +01:00
Brian Brazil f5084ab1c5 Add tests for group_left/group_right 2016-04-21 16:52:53 +01:00
Fabian Reinartz fceedfa807 Add error message if old alert rule tokens are read 2016-04-16 22:25:51 +02:00
Julius Volz 6ac39700ea Fix missing printed keep_common without grouping. 2016-04-15 19:48:17 +02:00
Jonathan Boulle 38098f8c95 Add missing license headers
Prometheus is Apache 2 licensed, and most source files have the
appropriate copyright license header, but some were missing it without
apparent reason. Correct that by adding it.
2016-04-13 16:08:22 +02:00
Fabian Reinartz 9ee91062c4 Merge pull request #1522 from prometheus/unless-operator
Implement relative complement set operator "unless"
2016-04-04 21:36:17 +02:00
Tobias Schmidt 8cc86f25c0 Implement relative complement set operator "unless"
The `unless` set operator can be used to return all vector elements from
the LHS which do not match the elements on the RHS. A use case is to
return all metrics for nodes which do not have a specific role:

    node_load1 unless on(instance) chef_role{role="app"}
2016-04-04 01:29:44 -04:00
Tobias Schmidt e82ef154ee Remove unused code leftovers 2016-04-02 20:20:55 -04:00
Tobias Schmidt 4c3dc25e35 Fix whitespace in promql test data 2016-04-02 18:25:26 -04:00
Fabian Reinartz 235e6c554b Use ContainsRune 2016-04-01 10:36:17 +02:00
Brian Brazil 24a3ad3d16 Merge pull request #1485 from eliothedeman/master
Adds holt-winters query function
2016-03-28 20:53:01 +01:00
eliothedeman 1543ef92b2 Adds holt-winters query function 2016-03-28 15:42:27 -04:00
beorn7 507f550cd4 Merge branch 'master' into beorn7/storage7 2016-03-24 14:21:28 +01:00
Brian Brazil 070d663948 Merge pull request #1501 from prometheus/and-dummy
Pull in fix for and with empty labelsets
2016-03-24 12:52:28 +00:00
Fabian Reinartz ab3d7a0ec0 Remove old alerting syntax 2016-03-23 10:19:00 +01:00
beorn7 4b574e8a61 Switch chunk encoding to type 2 where it was hardcoded type 1 before
The chunk encoding was hardcoded there because it mostly doesn't
matter what encoding is chosen in that test. Since type 1 is
battle-hardened enough, I'm switching to type 2 here so that we can
catch unexpected problems as a byproduct. My expectation is that the
chunk encoding doesn't matter anyway, as said, but then "unexpected
problems" contains the word "unexpected".
2016-03-20 23:32:20 +01:00
Brian Brazil 8788701ce7 Add test for incorrect behaviour 2016-03-18 12:07:40 +00:00
Brian Brazil 39d556f0d5 Move all the operator tests into one file 2016-03-18 12:02:44 +00:00
beorn7 99854a84d7 Merge branch 'beorn7/storage6' into beorn7/storage7 2016-03-09 17:23:25 +01:00
beorn7 d0a4477446 Merge branch 'beorn7/storage3' into beorn7/storage4
Conflicts:
	storage/local/preload.go
	storage/local/storage.go
	storage/local/storage_test.go
2016-03-09 17:13:16 +01:00
beorn7 dad302144d Make a naked return less naked 2016-03-09 15:06:00 +01:00
beorn7 836f1db04c Improve MetricsForLabelMatchers
WIP: This needs more tests.

It now gets a from and through value, which it may opportunistically
use to optimize the retrieval. With possible future range indices,
this could be used in a very efficient way. This change merely applies
some easy checks, which should nevertheless solve the use case of
heavy rule evaluations on servers with a lot of series churn.

Idea is the following:

- Only archive series that are at least as old as the headChunkTimeout
  (which was already extremely unlikely to happen).

- Then maintain a high watermark for the last archival, i.e. no
  archived series has a sample more recent than that watermark.

- Any query that doesn't reach to a time before that watermark doesn't
  have to touch the archive index at all. (A production server at
  Soundcloud with the aforementioned series churn and heavy rule
  evaluations spends 50% of its CPU time in archive index
  lookups. Since rule evaluations usually only touch very recent
  values, most of those lookup should disappear with this change.)

- Federation with a very broad label matcher will profit from this,
  too.

As a byproduct, the un-needed MetricForFingerprint method was removed
from the Storage interface.
2016-03-09 00:25:59 +01:00
beorn7 f7fc542db6 Merge branch 'master' into beorn7/storage4
Conflicts:
	storage/local/persistence.go
2016-03-08 00:14:00 +01:00
beorn7 3d86130d8c Merge branch 'master' into beorn7/storage3 2016-03-07 23:39:12 +01:00
Björn Rabenstein 2a2cc52828 Merge pull request #1405 from prometheus/beorn7/storage
Streamline series iterator creation
2016-03-07 13:30:56 +01:00
Patrick Bogen 250344b344 use short variable assignment 2016-03-03 09:46:50 -08:00
Patrick Bogen 2062fbae0f rewrite operator balancing to be recursive 2016-03-02 15:56:40 -08:00
beorn7 0ea5801e47 Handle errors caused by data corruption more gracefully
This requires all the panic calls upon unexpected data to be converted
into errors returned. This pollute the function signatures quite
lot. Well, this is Go...

The ideas behind this are the following:

- panic only if it's a programming error. Data corruptions happen, and
  they are not programming errors.

- If we detect a data corruption, we "quarantine" the series,
  essentially removing it from the database and putting its data into
  a separate directory for forensics.

- Failure during writing to a series file is not considered corruption
  automatically. It will call setDirty, though, so that a
  crashrecovery upon the next restart will commence and check for
  that.

- Series quarantining and setDirty calls are logged and counted in
  metrics, but are hidden from the user of the interfaces in
  interface.go, whith the notable exception of Append(). The reasoning
  is that we treat corruption by removing the corrupted series, i.e. a
  query for it will return no results on its next call anyway, so
  return no results right now. In the case of Append(), we want to
  tell the user that no data has been appended, though.

Minor side effects:

- Now consistently using filepath.* instead of path.*.

- Introduced structured logging where I touched it. This makes things
  less consistent, but a complete change to structured logging would
  be out of scope for this PR.
2016-03-02 23:02:34 +01:00
beorn7 8766f99085 Merge branch 'beorn7/storage2' into beorn7/storage3 2016-03-02 23:02:06 +01:00
beorn7 162f6fa6f6 Merge branch 'beorn7/storage' into beorn7/storage2 2016-03-02 23:01:26 +01:00
beorn7 79a2ae2d2e Add missing test file 2016-03-02 23:00:23 +01:00
beorn7 b6840997a7 Merge branch 'beorn7/storage2' into beorn7/storage3 2016-03-02 16:11:25 +01:00
beorn7 ce58fd357b Merge branch 'beorn7/storage' into beorn7/storage2
Conflicts:
	storage/local/chunk.go
	storage/local/interface.go
2016-03-02 16:09:32 +01:00
beorn7 2581648f70 Separate iterators by offset
Add test that exposes the problem.
2016-03-02 16:01:03 +01:00
Fabian Reinartz 95c9706d2d Fix missing comment period. 2016-03-02 09:16:56 +01:00
Julius Volz 9ea2465b99 Fix typo in lexer test. 2016-03-02 01:13:27 +01:00
Tobias Schmidt 907b1380a7 Add tests to specify the string escaping behavior 2016-03-01 17:23:18 -05:00
beorn7 c740789ce3 Improve predict_linear
Fixes https://github.com/prometheus/prometheus/issues/1401

This remove the last (and in fact bogus) use of BoundaryValues.

Thus, a whole lot of unused (and arguably sub-optimal / ugly) code can
be removed here, too.
2016-02-25 12:10:55 +01:00
beorn7 454ecf3f52 Rework the way ranges and instants are handled
In a way, our instants were also ranges, just with the staleness delta
as range length. They are no treated equally, just that in one case,
the range length is set as range, in the other the staleness
delta. However, there are "real" instants where start and and time of
a query is the same. In those cases, we only want to return a single
value (the one closest before or at the equal start and end time). If
that value is the last sample in the series, odds are we have it
already in the series object. In that case, there is no need to pin or
load any chunks. A special singleSampleSeriesIterator is created for
that. This should greatly speed up instant queries as they happen
frequently for rule evaluations.
2016-02-22 01:47:18 +01:00
beorn7 0e202dacb4 Streamline series iterator creation
This will fix issue #1035 and will also help to make issue #1264 less
bad.

The fundamental problem in the current code:

In the preload phase, we quite accurately determine which chunks will
be used for the query being executed. However, in the subsequent step
of creating series iterators, the created iterators are referencing
_all_ in-memory chunks in their series, even the un-pinned ones. In
iterator creation, we copy a pointer to each in-memory chunk of a
series into the iterator. While this creates a certain amount of
allocation churn, the worst thing about it is that copying the chunk
pointer out of the chunkDesc requires a mutex acquisition. (Remember
that the iterator will also reference un-pinned chunks, so we need to
acquire the mutex to protect against concurrent eviction.) The worst
case happens if a series doesn't even contain any relevant samples for
the query time range. We notice that during preloading but then we
will still create a series iterator for it. But even for series that
do contain relevant samples, the overhead is quite bad for instant
queries that retrieve a single sample from each series, but still go
through all the effort of series iterator creation. All of that is
particularly bad if a series has many in-memory chunks.

This commit addresses the problem from two sides:

First, it merges preloading and iterator creation into one step,
i.e. the preload call returns an iterator for exactly the preloaded
chunks.

Second, the required mutex acquisition in chunkDesc has been greatly
reduced. That was enabled by a side effect of the first step, which is
that the iterator is only referencing pinned chunks, so there is no
risk of concurrent eviction anymore, and chunks can be accessed
without mutex acquisition.

To simplify the code changes for the above, the long-planned change of
ValueAtTime to ValueAtOrBefore time was performed at the same
time. (It should have been done first, but it kind of accidentally
happened while I was in the middle of writing the series iterator
changes. Sorry for that.) So far, we actively filtered the up to two
values that were returned by ValueAtTime, i.e. we invested work to
retrieve up to two values, and then we invested more work to throw one
of them away.

The SeriesIterator.BoundaryValues method can be removed once #1401 is
fixed. But I really didn't want to load even more changes into this
PR.

Benchmarks:

The BenchmarkFuzz.* benchmarks run 83% faster (i.e. about six times
faster) and allocate 95% fewer bytes. The reason for that is that the
benchmark reads one sample after another from the time series and
creates a new series iterator for each sample read.

To find out how much these improvements matter in practice, I have
mirrored a beefy Prometheus server at SoundCloud that suffers from
both issues #1035 and #1264. To reach steady state that would be
comparable, the server needs to run for 15d. So far, it has run for
1d. The test server currently has only half as many memory time series
and 60% of the memory chunks the main server has. The 90th percentile
rule evaluation cycle time is ~11s on the main server and only ~3s on
the test server. However, these numbers might get much closer over
time.

In addition to performance improvements, this commit removes about 150
LOC.
2016-02-19 16:24:38 +01:00
Julius Volz 9b6d69610a Fix various typos in comments.
Helpfully reported by
https://goreportcard.com/report/github.com/prometheus/prometheus :)
2016-02-10 03:47:00 +01:00
Brian Brazil 9d0112d7cf Add without aggregator modifier.
This has the advantage that the user doesn't need
to list all labels they want to keep (as with "by")
but without having to worry about inconsistent labels
as when there's only one time series (as with "keeping_common").

Almost all aggregation should use this rather than the existing
two options as it's much less error prone and easier to maintain
due to not having to always add in "job" plus whatever other common
job-level labels you have like "region".
2016-02-08 14:05:33 +00:00
Brian Brazil b7ef0b45e8 Break aggregation tests out. Add missing tests. 2016-02-07 18:02:51 +00:00
beorn7 a7408bfb47 Unify duration parsing
It's actually happening in several places (and for flags, we use the
standard Go time.Duration...). This at least reduces all our
home-grown parsing to one place (in model).
2016-01-29 15:41:50 +01:00
Fabian Reinartz a6935024e1 Remove old WITH clause in alert printing 2016-01-26 15:45:27 +01:00
Tobias Schmidt 1a91cd6e09 Rename matrix to range selector in external error messages
The documentation speaks about range vectors and range vector selectors.
This change does not fix all issues, we might still expose the term
"Matrix" in error messages using %T.
2016-01-25 13:25:56 -05:00
Tobias Schmidt 411ca4dba1 Consolidate offset modifier parsing
Remove duplicated offset modifier parsing and ensure offset can only
appear at the end of a selector statement.
2016-01-24 23:11:44 -05:00
Fabian Reinartz 6b4a6962d2 Support old alerting rule syntax 2016-01-11 12:14:06 +01:00
Brian Brazil c77c3a8c56 promql: Limit extrapolation of delta/rate/increase
The new implementation detects the start and end of a series by
looking at the average sample interval within the range. If the first
(last) sample in the range is more than 1.1*interval distant from the
beginning (end) of the range, it is considered the first (last) sample
of the series as a whole, and extrapolation is limited to half the
interval (rather than all the way to the beginning (end) of the
range). In addition, if the extrapolated starting point of a counter
(where it is zero) is within the range, it is used as the starting
point of the series.

Fixes #581
2016-01-08 15:32:43 +01:00
Brian Brazil 89760dd77d Handle NaN for min/max.
Similar to topk and sort, prefer not returning NaN
where possible.
2016-01-06 12:41:40 +00:00
Brian Brazil bac1f28cad Similar to topk/bottomk, have sort/sort_desc put NaN at end.
This makes topk and bottomk consistent with the sorting functions,
as per #1271.
2015-12-31 14:52:48 +00:00
Fabian Reinartz 4209ec6864 Change WITH keyword to LABELS 2015-12-23 14:54:02 +01:00
Brian Brazil 88ca82304c Make topk/bottomk prefer returning real numbers over NaN. 2015-12-22 13:53:43 +00:00
Brian Brazil edf3e123f5 Move topk/bottomk tests from legacy. 2015-12-22 12:38:32 +00:00
Fabian Reinartz af3a6661ed Implement new alerting rule syntax 2015-12-11 17:02:34 +01:00
James Sanford 5b53262b7a promql: Add clamp_max/clamp_min functions. 2015-11-26 13:38:06 -08:00
Brian Brazil a287264989 Print offsets in promql. 2015-11-15 16:24:29 +00:00
Fabian Reinartz 33aab4169c Anchor regexes in vector matching
This commit makes the regex behavior of vector matching consistent with
configuration and label_replace() by anchoring it.

Fixes #1200
2015-11-05 11:23:43 +01:00
Fabian Reinartz 51e8badc7f Merge pull request #1159 from prometheus/scalar-bool
promql: Remove scalar/scalar comparisons.
2015-10-16 12:28:56 +02:00
Brian Brazil c36961130b promql: Remove scalar/scalar comparisons.
This change is breaking, use the 'bool' modifier for such comprisons.

After this change all comparisons without 'bool' will filter, and all
comparisons with 'bool' will return 0/1. This makes the language more
consistent and orthogonal, and ultimately easier to learn and use.

If we ever figure out sane semantics for filtering scalar/scalar
comparisons we can add them in, which will most likely come out of how
the new vector() function is used.
2015-10-11 08:51:04 +01:00
Brian Brazil 5740a8fade promql: Remove deprecated 2nd argument to delta()
This change is breaking, use increase() instead.

I'm not cleaning up the function in this PR, as my solution to #581 will
rewrite and simplify increase/rate/delta.
2015-10-10 15:41:23 +01:00
Brian Brazil 965a71dc4d Merge pull request #1155 from prometheus/irate
promql: Add irate() function
2015-10-10 08:05:05 +01:00
Brian Brazil f08abdb48b promql: Add irate() function
irate is a rate function that only looks at the most
recent two data points, and calucaltes a per-second value
from that. This produces much more granular graphs for
fast moving data, and works sanely across many scrape intervals.

It doesn't do so well for slowly moving data.
2015-10-09 21:44:35 +01:00
Julius Volz 0088aa4d45 Merge pull request #1132 from prometheus/fix-quoting-and-escaping
Support escape sequences in strings and add raw strings
2015-10-08 20:51:18 +02:00
Julius Volz 46c5260761 Support escape sequences in strings and add raw strings.
This adapts some functionality from the Go standard library for string
literal lexing and unquoting/unescaping.

The following string types are now supported:

Double- or single-quoted strings:

  These support all escape sequences that Go supports in double-quoted
  string literals. The difference is that Prometheus also has
  single-quoted strings (instead of single-quoted runes in Go). Raw
  newlines are not allowed.

Backtick-quoted raw strings:

  Strings quoted in backticks are treated as raw strings just like in Go
  and may contain raw newlines and other special characters directly.

Fixes https://github.com/prometheus/prometheus/issues/1122
Fixes https://github.com/prometheus/prometheus/issues/1121
2015-10-08 19:17:21 +02:00
Fabian Reinartz e3b6ec9784 Switch to common/log 2015-10-03 10:21:43 +02:00
Brian Brazil 653ff71f1f promql: Reduce flakiness of concurrency test 2015-09-23 10:07:30 +01:00
Fabian Reinartz 171f50706a Fix unkeyed field errors. 2015-09-18 17:00:08 +02:00
Fabian Reinartz 36ec8ba460 Fix missing return on error 2015-09-18 16:50:13 +02:00
Fabian Reinartz e005f939fd Fix scalar construction in function 2015-09-18 16:49:32 +02:00
Fabian Reinartz eca41f5319 Run gofmt 2015-09-16 14:33:12 +02:00
Brian Brazil fa793d917e Merge pull request #1080 from prometheus/query-timeout-test
promql: Bump sleep in query timeout test
2015-09-14 13:00:47 +01:00
Brian Brazil ce7f31e03c promql: Bump sleep in query timeout test
This test is flaky, I'm presuming the time.AfterFunc
call is being delayed so the evaluation isn't getting
cancelled.
2015-09-14 11:49:18 +01:00
Julius Volz 347630431c Merge pull request #1077 from prometheus/cleanups
Fix some dead code, missing error checks, shadowings.
2015-09-14 12:37:26 +02:00
Julius Volz af513468eb Fix some dead code, missing error checks, shadowings.
I applied
https://medium.com/@jgautheron/quality-pipeline-for-go-projects-497e34d6567
and was greeted with a deluge of warnings, most of which were not
applicable or really fixable realistically. These are some of the first
ones I decided to fix.
2015-09-14 12:21:34 +02:00
Brian Brazil 29de4ee2b0 Merge pull request #1078 from prometheus/whats-our-vector-victor
Remove optional vector() 2nd argument
2015-09-13 14:14:20 +01:00
Brian Brazil 9b382647b5 Remove optional vector() 2nd argument 2015-09-13 09:13:22 +01:00
Fabian Reinartz a1617d90f4 Merge pull request #1073 from prometheus/whats-our-vector-victor
promql: Add vector function.
2015-09-12 08:36:13 +02:00
Brian Brazil 69f5fa0c1e promql: Add vector function.
Currently the only way to convert a scalar to a vector is to
use absent(), which isn't very clean. This adds a vector()
function that's the inverse of scalar() and lets your optionally
set labels.

Example usage would be
vector(time() % 86400) < 3600
to filter to only the first hour of the day.
2015-09-11 12:09:34 +01:00
Julius Volz 6d3e054692 Fix bool modifier in recording rules and printing.
Fixes https://github.com/prometheus/prometheus/issues/1065
2015-09-10 01:37:05 +02:00
Brian Brazil 9ec11b1847 Merge pull request #1049 from prometheus/bool-nofilter
promql: Add 'bool' modifier to comparison functions
2015-09-03 15:08:38 +01:00
Brian Brazil 29e8dc2c49 promql: Add 'bool' modifier to comparison functions
When doing comparison operations on vectors, filtering
sometimes gets in the way and you have to go to a fair bit of
effort to workaround it in order to always return a result.
The 'bool' modifier instead of filtering returns 0/1 depending
on the result of the compairson.

This is also a prerequisite to removing plain scalar/scalar comparisons,
as it maintains the current behaviour under a new syntax.
2015-09-02 14:51:44 +01:00
Julius Volz 61c42c8da0 Change relabel_replace() to do full-string matches.
THIS IS A BREAKING CHANGE.

Fixes part of https://github.com/prometheus/prometheus/issues/996
2015-09-01 15:49:28 +02:00
Julius Volz 744d5d5a7a Merge pull request #1029 from prometheus/vet-fixes
Fix "go vet" errors.
2015-08-26 12:50:18 +02:00
Julius Volz 995d3b831d Fix most golint warnings.
This is with `golint -min_confidence=0.5`.

I left several lint warnings untouched because they were either
incorrect or I felt it was better not to change them at the moment.
2015-08-26 12:44:46 +02:00
Julius Volz 963ad82dcb Fix "go vet" errors.
I ignored all errors of the type "composite literal uses unkeyed
fields". Most of them are wrong because of
https://github.com/golang/go/issues/9171.
2015-08-26 02:05:04 +02:00
Julius Volz 077a753e6b Merge pull request #1006 from prometheus/true-values
promql: Remove interpolation of vector values.
2015-08-25 16:11:07 +02:00
Fabian Reinartz d6b8da8d43 Switch promql types to common/model 2015-08-25 13:49:14 +02:00
Brian Brazil fb585e4591 promql: Remove interpolation of vector values.
The current behaviour produces values that are not
from rules or scrapes. So if for example I have
a boolean 0/1 it can be returned as 0.2344589. This
prevents a number of advanced use cases, introduces
race conditions and can produce misleading graphs.
2015-08-24 17:37:31 +01:00
Fabian Reinartz 1535ef1457 Replace metric.SamplePair with model.SamplePair 2015-08-22 14:52:35 +02:00
Fabian Reinartz 438e232c9b Fix grouping of import blocks 2015-08-22 09:42:45 +02:00
Fabian Reinartz 306e8468a0 Switch from client_golang/model to common/model 2015-08-21 13:33:38 +02:00
Brian Brazil 296f551418 Merge pull request #1014 from prometheus/scalar-rules
rules: Allow recorded rules expressions to be scalars.
2015-08-19 22:10:49 +01:00
Brian Brazil e6a67476c2 rules: Allow recorded rules expressions to be scalars.
This is useful if you want to build up a constant metric,
such as a set of alert thresholds that vary by label value.
2015-08-19 21:09:00 +01:00
Laurie Malau cdf38ab93a Log runtime errors during query evaluation instead of panicking. 2015-08-19 16:56:41 +02:00
Julius Volz 27ed874358 Implement label_replace()
Implements part of https://github.com/prometheus/prometheus/issues/959.
2015-08-18 14:20:07 +02:00
Fabian Reinartz 690b5f1575 Remove multi-statement queries
This commit removes the possibility to have multi-statement queries
which had no full support anyway. This makes the caller responsible
for multi-statement semantics.
Multiple tests are no longer timing-dependent.
2015-08-10 14:26:20 +02:00
Julius Volz e324910ff2 Merge pull request #936 from prometheus/predict
promql: Add support for predict(my_timeseries[1h], 2h)
2015-08-05 16:40:51 +02:00
Brian Brazil d6a80c2b76 promql: Add support for predict_linear(my_timeseries[1h], 7200)
This will give a prediction for the value of my_timeseries in 2 hours,
based on the last hour of data.
2015-08-05 15:16:49 +01:00
Fabian Reinartz 579fdf65e2 Implement unary expression for vector types.
Closes #956
2015-08-04 15:46:36 +02:00
Fabian Reinartz c322422412 Merge pull request #954 from prometheus/fabxc/fuzz-fix
Add missing check for nil expression
2015-08-03 16:48:20 +02:00
Fabian Reinartz adf109795c forbid unexpected (runtime) errors in parse tests 2015-08-03 12:53:31 +02:00
Fabian Reinartz c20e25f718 Add missing check for nil expression 2015-08-03 12:28:40 +02:00
Brian Brazil a0f0b82348 promql: Test errors aren't always ParseErr 2015-08-02 23:26:21 +01:00