Commit Graph

3288 Commits (0c64ad1a3db954a6ec97a7690457353d33c4ce54)

Author SHA1 Message Date
Julius Volz af5c7d5616 Merge pull request #1938 from prometheus/remote-write-experimental
Mark remote write address flag as experimental.
2016-09-01 01:19:48 +02:00
Julius Volz a88e950d1f Mark remote write address flag as experimental. 2016-09-01 00:58:53 +02:00
Julius Volz 1c271897be Merge pull request #1931 from tomwilkie/1843-remote-storage-ordering
Remote storage ordering
2016-08-30 17:49:13 +02:00
Tom Wilkie d41d91388f Update for new generic remote storage. 2016-08-30 17:43:29 +02:00
Tom Wilkie a6931b71e8 Rationalise retrieval metrics so we have the state (success/failed) on both samples and batches, in a consistent fashion.
Also, report total queue capacity of all queues, i.e. capacity * shards.
2016-08-30 17:42:42 +02:00
Tom Wilkie ece12bff93 Shard/parrallelise samples by fingerprint in StorageQueueManager
By splitting the single queue into multiple queues and flushing each individual queue in serially (and all queues in parallel), we can guarantee to preserve the order of timestampsin samples sent to downstream systems.
2016-08-30 17:42:36 +02:00
Julius Volz fe29e87824 Merge pull request #1930 from prometheus/generic-write-grpc
Generic write via gRPC
2016-08-30 17:28:48 +02:00
Julius Volz aa3f2b7216 Generic write cleanups and changes.
- fold metric name into labels
- return initialization errors back to main
- add snappy compression
- better context handling
- pre-allocation of labels
- remove generic naming
- other cleanups
2016-08-30 17:24:48 +02:00
Julius Volz d8ce6e5849 Update vendoring. 2016-08-30 17:19:18 +02:00
Brian Brazil 36d2c4bd0b Add generic write path using grpc.
This uses a new proto format, with scope for multiple samples per
timeseries in future. This will allow users to pump samples out to
whatever they like without having to change the core Prometheus code.

There's also an example receiver to save users figuring out the
boilerplate themselves.
2016-08-30 17:19:18 +02:00
Brian Brazil 72475cfa84 Update vendoring 2016-08-30 17:19:18 +02:00
Björn Rabenstein 2dd651770e Merge pull request #1923 from dmilstein/fix-delta-unmarshaling
Catch errors when unmarshalling delta/doubleDelta encoded chunks
2016-08-30 15:22:41 +02:00
Dan Milstein ec064c96f6 Add field names to table-driven test fixtures 2016-08-30 07:57:39 -04:00
Dan Milstein ac8788aca6 Convert to table-driven test and inline helper func 2016-08-30 07:57:39 -04:00
Dan Milstein f50f656a66 Fix double-delta unmarshaling to respect actual min header size
Turns out its valid to have an overall chunk which is smaller than the
full doubleDeltaHeaderBytes size -- if it has a single sample, it
doesn't fill the whole header.  Updated unmarshalling check to respect
this.
2016-08-30 07:57:39 -04:00
Dan Milstein b815956341 Catch errors when unmarshalling delta/doubleDelta encoded chunks
This is (hopefully) a fix for #1653

Specifically, this makes it so that if the length for the stored
delta/doubleDelta is somehow corrupted to be too small, the attempt to
unmarshal will return an error.

The current (broken) behavior is to return a malformed chunk, which can
then lead to a panic when there is an attempt to read header values.

The referenced issue proposed creating chunks with a minimum length -- I
instead opted to just error on the attempt to unmarshal, since I'm not
clear on how it could be safe to proceed when the length is
incorrect/unknown.

The issue also talked about possibly "quarantining series", but I don't
know the surrounding code well enough to understand how to make that
happen.
2016-08-30 07:57:39 -04:00
Tobias Schmidt ee9abf4535 Merge pull request #1932 from prometheus/sdurrheimer-circleci-tests-1.6-tag
Use 1.6 tag of the test image in circleci
2016-08-30 11:49:44 +02:00
Fabian Reinartz ab88057063 Merge pull request #1908 from prometheus/on-dates
Add various time and date functions
2016-08-30 11:03:23 +02:00
Steve Durrheimer 5792eaa1e8
Use 1.6 tag of the test image in circleci 2016-08-30 09:34:17 +02:00
Fabian Reinartz a83c8156b5 Merge pull request #1926 from dmilstein/add-tests-for-targetmanager
Add basic test for TargetManager.targetSet
2016-08-30 08:14:04 +02:00
Dan Milstein b9fb9742ed Move test helper function into scope of test func 2016-08-29 16:08:40 -04:00
Brian Brazil 4680daf237 Default date functions to current time. 2016-08-29 18:22:12 +01:00
Tobias Schmidt bc1178bf70 Merge pull request #1929 from hashmap/1900_validate_relabel_conf
Forbid invalid relabel configurations
2016-08-29 17:45:05 +02:00
Alexey Miroshkin e29d9394e5 Forbid invalid relabel configurations
This fix adds check if target_label value is set in case if action is replace or
hashmod
Issue [#1900]
2016-08-29 16:56:06 +02:00
Björn Rabenstein c9d9f564ff Merge pull request #1928 from prometheus/beorn7/testing
Testing: Add more test targets
2016-08-29 11:36:53 +02:00
Bjoern Rabenstein c7bd563b26 Testing: Add more test targets 2016-08-29 10:53:10 +02:00
Fabian Reinartz 23ddbd64aa Merge pull request #1925 from hashmap/1898-test-race
Fix data race in lexer and lexer test
2016-08-29 09:28:02 +02:00
Alexey Miroshkin bf0e441576 Instantiate lexer inline for the test
Don't use the lex constructor, remove the constructor introduced in the
prevous commit.
2016-08-29 09:20:43 +02:00
Julius Volz a614e3dd27 Merge pull request #1927 from mattbostock/add_crash_recovery_metric
Storage: Add crash recovery metric 'started_dirty'
2016-08-28 18:42:32 +02:00
Matt Bostock e618af5d0b Storage: Add crash recovery metric 'started_dirty'
...to indicate when crash recovery was invoked during Prometheus
startup.

Fixes #1918.
2016-08-27 21:41:06 +02:00
Dan Milstein 79216011cb Add basic test for TargetManager.targetSet
Verify that if the configs change, target groups are cleaned on
TargetManager.reload (rather than having old ones linger around, even if
they are no longer present in the configs).

This covers the bug fixed in #1907 -- I verified that by checking out
source from before that commit.

This is a start on #1906
2016-08-26 14:30:26 -04:00
Alexey Miroshkin 485f7dde08 Fix data race in lexer and lexer test
As described in #1898 'go test -race' detects a race in lexer code. This
pacth fixes it and also add '-race' option to test target to prevent
regression.
2016-08-26 17:07:17 +02:00
Björn Rabenstein 0ac2dbe6aa Merge pull request #1917 from prometheus/beorn7/promql
promql: Fix (and simplify) populating iterators
2016-08-26 08:42:44 +02:00
Björn Rabenstein b23169d86f Merge pull request #1913 from dmilstein/fix-remote-storage-test
Fix one of the tests for a remote storage QueueManager
2016-08-24 18:56:22 +02:00
beorn7 71571a8ec4 promql: Fix (and simplify) populating iterators
This was only relevant so far for the benchmark suite as it would
recycle Expr for repetitions. However, the append is unnecessary as
each node is only inspected once when populating iterators, and
population must always start from scratch.

This also introduces error checking during benchmarks and fixes the so
far undetected test errors during benchmarking.

Also, remove a style nit (two golint warnings less…).
2016-08-24 18:37:09 +02:00
Dan Milstein 764ceaa939 Add timeout to test, cap waiting at 1 second 2016-08-24 11:30:38 -04:00
Björn Rabenstein 4b8f963847 Merge pull request #1915 from prometheus/release-1.0
Forward-merge the bug fix from release-1.0
2016-08-24 13:04:45 +02:00
Björn Rabenstein d9b40793ac Merge pull request #1914 from prometheus/beorn7/release
Cut v1.0.2
2016-08-24 13:03:00 +02:00
beorn7 acd8937bab Cut v1.0.2 2016-08-24 12:04:10 +02:00
Björn Rabenstein 30cfee81a1 Merge pull request #1907 from prometheus/beorn7/sd
retrieval: Clean up target group map on config reload
2016-08-24 12:00:30 +02:00
Brian Brazil ea1318f38b Short names of some date related functions 2016-08-23 22:34:22 +01:00
Dan Milstein 007907b410 Fix one of the tests for a remote storage QueueManager
Specifically, the TestSpawnNotMoreThanMaxConcurrentSendsGoroutines was failing on a fresh checkout of master.

The test had a race condition -- it would only pass if one of the
spawned goroutines happened to very quickly pull a set of samples off an
internal queue.

This patch rewrites the test so that it deterministically waits until
all samples have been pulled off that queue.  In case of errors, it also
now reports on the difference between what it expected and what it found.

I verified that, if the code under test is deliberately broken, the test
successfully reports on that.
2016-08-23 16:26:33 -04:00
Brian Brazil d2ca2b496a Add days_in_month function. 2016-08-22 21:15:35 +01:00
Brian Brazil 0ed31c8c47 Sort list of functions. 2016-08-22 21:15:34 +01:00
Brian Brazil fd7822829c Add date related functions.
Add day_of_month, day_of_week, hour_of_day, month_of_year and year.
This only work for UTC, and ignore leap seconds the same as Go.
2016-08-22 21:15:30 +01:00
beorn7 e2b3626e0c retrieval: Clean up target group map on config reload
Also, remove unused `providers` field in targetSet.

If the config file changes, we recreate all providers (by calling
`providersFromConfig`) and retrieve all targets anew from the newly
created providers. From that perspective, it cannot harm to clean up
the target group map in the targetSet. Not doing so (as it was the
case so far) keeps stale targets around. This mattered if an existing
key in the target group map was not overwritten in the initial fetch
of all targets from the providers. Examples where that mattered:

```
scrape_configs:
- job_name: "foo"
  static_configs:
  - targets: ["foo:9090"]
  - targets: ["bar:9090"]
```
updated to:
```
scrape_configs:
- job_name: "foo"
  static_configs:
  - targets: ["foo:9090"]
```

`bar:9090` would still be monitored. (The static provider just
enumerates the target groups. If the number of target groups
decreases, the old ones stay around.

```
scrape_configs:
- job_name: "foo"
  dns_sd_configs:
  - names:
    - "srv.name.one.example.org"
```
updated to:
```
scrape_configs:
- job_name: "foo"
  dns_sd_configs:
  - names:
    - "srv.name.two.example.org"
```

Now both SRV records are still monitored. The SRV name is part of the
key in the target group map, thus the new one is just added and the
old ane stays around.

Obviously, this should have tests, and should have tests before, not
only for this case. This is the quick fix. I have created
https://github.com/prometheus/prometheus/issues/1906 to track test
creation.

Fixes https://github.com/prometheus/prometheus/issues/1610 .
2016-08-22 19:25:33 +02:00
Tobias Schmidt 16d70a8b6b Merge pull request #1901 from amorken/master
Run scrape loop with interval 1 instead of 0
2016-08-18 13:36:05 -04:00
Anders Daljord Morken 95cadd0702 Run scrape loop with interval 1 instead of 0
0 is considered an invalid interval by time.NewTicker() and will cause a
panic if control reaches that point. Given the vagaries of timekeeping,
this may occasionally happen and make this test unstable.
2016-08-18 09:39:11 +02:00
Fabian Reinartz fcac52ebbf Merge pull request #1899 from amorken/master
Trim stray whitespace from bearer token file
2016-08-17 16:01:58 +02:00
Anders Daljord Morken 8633ac180e Strip stray whitespace from bearer token file
Apart from not trying to send a newline in a HTTP header,
this also allows Prometheus to build and pass tests with Go 1.7,
which features stricter checking of HTTP headers.
2016-08-17 15:36:18 +02:00