prometheus

Commit Graph

Author	SHA1	Message	Date
Bartlomiej Plotka	fb79f515fc	Fixed second bug. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	5 years ago
Bartlomiej Plotka	2cf637fbf5	Addressed comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	5 years ago
Bartlomiej Plotka	34426766d8	Unify Iterator interfaces. All point to storage now. This is part of https://github.com/prometheus/prometheus/pull/5882 that can be done to simplify things. All todos I added will be fixed in follow up PRs. * querier.Querier, querier.Appender, querier.SeriesSet, and querier.Series interfaces merged with storage interface.go. All imports that. * querier.SeriesIterator replaced by chunkenc.Iterator * Added chunkenc.Iterator.Seek method and tests for xor implementation (?) * Since we properly handle SelectParams for Select methods I adjusted min max based on that. This should help in terms of performance for queries with functions like offset. * added Seek to deletedIterator and test. * storage/tsdb was removed as it was only a unnecessary glue with incompatible structs. No logic was changed, only different source of abstractions, so no need for benchmarks. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	5 years ago
李国忠	3cd6a5b050	Storage concurrently tests and bug fix (#6808 ) * Storage concurrently tests and bug fix Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>	5 years ago
李国忠	40dd13b074	Storage concurrently (#6770 ) * Storage concurrently Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>	5 years ago
helenxu1221	7df4fe3faa	reset counter after collecting metric (#6798 ) Signed-off-by: HelenXu <helenxu@Helens-MacBook-Pro.local>	5 years ago
Robert Fratto	a53e00f9fd	pass registerer from storage to queue manager for its metrics (#6728 ) * pass registerer from storage to queue manager for its metrics Signed-off-by: Robert Fratto <robert.fratto@grafana.com>	5 years ago
Peter Štibraný	08c5549055	Document that NewMergeSeriesSet expects individual sets to be sorted. (#6718 ) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>	5 years ago
Brian Brazil	38d32e0686	Don't sort postings if we only have one block. Sorting the heads postings can be quite slow. We only need sorted series when merging with another querier, so only sort then. This will make big queries that only touch the head faster, though queries that touch both the head and a block will still be the same speed. This probably won't help much with graphing unless the range is under an hour, however it should make most recording rules faster. Add gaurantee that remote read streaming produces sorted series. PromQL benchmarks for histograms show only 2-3% improvement, but they're only over 1k series. benchmark old ns/op new ns/op delta BenchmarkQuerierSelect/Head/1of1000000-4 1375486282 507657736 -63.09% BenchmarkQuerierSelect/Head/10of1000000-4 1387859004 507769850 -63.41% BenchmarkQuerierSelect/Head/100of1000000-4 1387087935 506029110 -63.52% BenchmarkQuerierSelect/Head/1000of1000000-4 1386869064 504521986 -63.62% BenchmarkQuerierSelect/Head/10000of1000000-4 1386213685 505210422 -63.55% BenchmarkQuerierSelect/Head/100000of1000000-4 1392754988 529842406 -61.96% BenchmarkQuerierSelect/Head/1000000of1000000-4 1569414722 725059506 -53.80% BenchmarkQuerierSelect/SortedHead/1of1000000-4 1381019902 1370495863 -0.76% BenchmarkQuerierSelect/SortedHead/10of1000000-4 1375696209 1366789468 -0.65% BenchmarkQuerierSelect/SortedHead/100of1000000-4 1386009422 1364519297 -1.55% BenchmarkQuerierSelect/SortedHead/1000of1000000-4 1377700532 1364486191 -0.96% BenchmarkQuerierSelect/SortedHead/10000of1000000-4 1383539536 1369545314 -1.01% BenchmarkQuerierSelect/SortedHead/100000of1000000-4 1410089163 1394731339 -1.09% BenchmarkQuerierSelect/SortedHead/1000000of1000000-4 1634744148 1581554956 -3.25% BenchmarkQuerierSelect/Block/1of1000000-4 881741242 879839470 -0.22% BenchmarkQuerierSelect/Block/10of1000000-4 880381562 882846038 +0.28% BenchmarkQuerierSelect/Block/100of1000000-4 887519357 881016916 -0.73% BenchmarkQuerierSelect/Block/1000of1000000-4 902194205 883433524 -2.08% BenchmarkQuerierSelect/Block/10000of1000000-4 892321964 885130170 -0.81% BenchmarkQuerierSelect/Block/100000of1000000-4 938604466 933527150 -0.54% BenchmarkQuerierSelect/Block/1000000of1000000-4 1313510845 1295881124 -1.34% benchmark old allocs new allocs delta BenchmarkQuerierSelect/Head/1of1000000-4 4000056 4000018 -0.00% BenchmarkQuerierSelect/Head/10of1000000-4 4000074 4000036 -0.00% BenchmarkQuerierSelect/Head/100of1000000-4 4000254 4000216 -0.00% BenchmarkQuerierSelect/Head/1000of1000000-4 4002054 4002016 -0.00% BenchmarkQuerierSelect/Head/10000of1000000-4 4020054 4020016 -0.00% BenchmarkQuerierSelect/Head/100000of1000000-4 4200054 4200016 -0.00% BenchmarkQuerierSelect/Head/1000000of1000000-4 6000054 6000016 -0.00% BenchmarkQuerierSelect/SortedHead/1of1000000-4 4000071 4000071 +0.00% BenchmarkQuerierSelect/SortedHead/10of1000000-4 4000089 4000089 +0.00% BenchmarkQuerierSelect/SortedHead/100of1000000-4 4000269 4000269 +0.00% BenchmarkQuerierSelect/SortedHead/1000of1000000-4 4002069 4002069 +0.00% BenchmarkQuerierSelect/SortedHead/10000of1000000-4 4020069 4020069 +0.00% BenchmarkQuerierSelect/SortedHead/100000of1000000-4 4200069 4200069 +0.00% BenchmarkQuerierSelect/SortedHead/1000000of1000000-4 6000069 6000069 +0.00% BenchmarkQuerierSelect/Block/1of1000000-4 6000023 6000022 -0.00% BenchmarkQuerierSelect/Block/10of1000000-4 6000059 6000058 -0.00% BenchmarkQuerierSelect/Block/100of1000000-4 6000419 6000418 -0.00% BenchmarkQuerierSelect/Block/1000of1000000-4 6004019 6004018 -0.00% BenchmarkQuerierSelect/Block/10000of1000000-4 6040019 6040018 -0.00% BenchmarkQuerierSelect/Block/100000of1000000-4 6400019 6400018 -0.00% BenchmarkQuerierSelect/Block/1000000of1000000-4 10000020 10000019 -0.00% benchmark old bytes new bytes delta BenchmarkQuerierSelect/Head/1of1000000-4 229192200 176001176 -23.21% BenchmarkQuerierSelect/Head/10of1000000-4 229193352 176002328 -23.21% BenchmarkQuerierSelect/Head/100of1000000-4 229204872 176013848 -23.21% BenchmarkQuerierSelect/Head/1000of1000000-4 229320072 176129048 -23.20% BenchmarkQuerierSelect/Head/10000of1000000-4 230472072 177281048 -23.08% BenchmarkQuerierSelect/Head/100000of1000000-4 241992072 188801048 -21.98% BenchmarkQuerierSelect/Head/1000000of1000000-4 357192072 304001048 -14.89% BenchmarkQuerierSelect/SortedHead/1of1000000-4 229193928 229193928 +0.00% BenchmarkQuerierSelect/SortedHead/10of1000000-4 229195080 229195080 +0.00% BenchmarkQuerierSelect/SortedHead/100of1000000-4 229206600 229206600 +0.00% BenchmarkQuerierSelect/SortedHead/1000of1000000-4 229321800 229321800 +0.00% BenchmarkQuerierSelect/SortedHead/10000of1000000-4 230473800 230473800 +0.00% BenchmarkQuerierSelect/SortedHead/100000of1000000-4 241993800 241993800 +0.00% BenchmarkQuerierSelect/SortedHead/1000000of1000000-4 357193800 357193800 +0.00% BenchmarkQuerierSelect/Block/1of1000000-4 227201516 227201500 -0.00% BenchmarkQuerierSelect/Block/10of1000000-4 227202924 227202908 -0.00% BenchmarkQuerierSelect/Block/100of1000000-4 227217036 227217020 -0.00% BenchmarkQuerierSelect/Block/1000of1000000-4 227358156 227358140 -0.00% BenchmarkQuerierSelect/Block/10000of1000000-4 228769356 228769340 -0.00% BenchmarkQuerierSelect/Block/100000of1000000-4 242881356 242881340 -0.00% BenchmarkQuerierSelect/Block/1000000of1000000-4 384001616 384001600 -0.00% Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	5 years ago
Anand Singh Kunwar	aa61e392b2	Make remote client `Store` use passed context (#6673 ) * Remote store client's `Store` API currently doesn't use passed context, but instead just constructs a new `context.Background()` Signed-off-by: Anand Singh Kunwar <anandkunwar95@gmail.com>	5 years ago
Julien Pivotto	cf42888e4d	Fix order of testutil.Equals (#6695 ) Equals takes the expected value as first parameter, and the actual value as second parameter. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	5 years ago
Julien Pivotto	aad8f89ecb	Remote storage: propagate json marshal errors (#6622 ) Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	5 years ago
Chris Marchbanks	7f3aca62c4	Only reduce the number of shards when caught up. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	5 years ago
Chris Marchbanks	9e24e1f9e8	Use samplesPending rather than integral The integral accumulator in the remote write sharding code is just a second way of keeping track of the number of samples pending. Remove integralAccumulator and use the samplesPending value we already calculate to calculate the number of shards. This has the added benefit of fixing a bug where the integralAccumulator was not being initialized correctly due to not taking into account the number of ticks being counted, causing the integralAccumulator initial value to be off by an order of magnitude in some cases. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	5 years ago
Chris Marchbanks	847c66a843	Add sharding test Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	5 years ago
Josh Soref	91d76c8023	Spelling (#6517 ) * spelling: alertmanager Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: attributes Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: autocomplete Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: bootstrap Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: caught Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: chunkenc Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: compaction Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: corrupted Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: deletable Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: expected Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: fine-grained Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: initialized Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: iteration Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: javascript Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: multiple Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: number Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: overlapping Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: possible Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: postings Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: procedure Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: programmatic Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: queuing Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: querier Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: repairing Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: received Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: reproducible Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: retention Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: sample Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: segements Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: semantic Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: software [LICENSE] Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: staging Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: timestamp Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: unfortunately Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: uvarint Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: subsequently Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: ressamples Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>	5 years ago
Julien Pivotto	31700a05df	Improve testutil.ErrorEqual (#6471 ) Also improves TestPopulateLabels: testutil.ErrorEqual just returned a bool without failing the test. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	5 years ago
Callum Styan	67838643ee	Add config option for remote job name (#6043 ) * Track remote write queues via a map so we don't care about index. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Support a job name for remote write/read so we can differentiate between them using the name. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Remote write/read has Name to not confuse the meaning of the field with scrape job names. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Split queue/client label into remote_name and url labels. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Don't allow for duplicate remote write/read configs. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Ensure we restart remote write queues if the hash of their config has not changed, but the remote name has changed. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Include name in remote read/write config hashes, simplify duplicates check, update test accordingly. Signed-off-by: Callum Styan <callumstyan@gmail.com>	5 years ago
Garrett	5a9c4acfbf	Pushdown aggregator group by through read hint (#6401 ) * Pushdown aggregator group by through read hint Implement https://github.com/prometheus/prometheus/issues/6400 * add temporal aggregation pushdown support Signed-off-by: xiancli <xiancli@ebay.com>	5 years ago
Chris Marchbanks	5000c05378	Merge pull request #6378 from prometheus/accurate-desired-shards-metric Change desired shards metric to report raw calculated value	5 years ago
Callum Styan	5830e03691	Merge pull request #6337 from cstyan/rw-log-replay Log the start and end of the WAL replay within the WAL watcher.	5 years ago
Callum Styan	6a24eee340	Simplify duration check for watcher WAL replay. Signed-off-by: Callum Styan <callumstyan@gmail.com>	5 years ago
Chris Marchbanks	6f34e35b3e	Record the exact value of desired shards in metric It is possible that desired shards is always a bit higher than the number of shards (less than 30%) and by exporting desired shards as the raw number it will be easy to tell if a Prometheus is in that situation. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	5 years ago
Chris Marchbanks	0e684ca205	Fix unknown type in sharding up log Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	5 years ago
Callum Styan	c2cb1e4103	Add a metric to track total bytes sent per remote write queue. (#6344 ) Signed-off-by: Callum Styan <callumstyan@gmail.com>	5 years ago
Tom Wilkie	de0a772b8e	Port tsdb to use pkg/labels. (#6326 ) * Port tsdb to use pkg/labels. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com> * Get tests passing. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com> * Remove useless cast. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com> * Appease linters. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com> * Fix review comments Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	5 years ago
Callum Styan	5f1be2cf45	Refactor calculateDesiredShards + don't reshard if we're having issues sending samples. (#6111 ) * Refactor calculateDesiredShards + don't reshard if we're having issues sending samples. * Track lastSendTimestamp via an int64 with atomic add/load, add a test for reshard calculation. * Simplify conditional for skipping resharding, add samplesIn/Out to shard testcase struct. Signed-off-by: Callum Styan <callumstyan@gmail.com>	5 years ago
Krasi Georgiev	81d284f806	Merge the 2.13 release branch to master (#6117 )	5 years ago
Callum Styan	84ff928606	Make sure the remote write storage uses a dedupe logger. (#6113 ) Signed-off-by: Callum Styan <callumstyan@gmail.com>	5 years ago
Chris Marchbanks	8df4bca470	Garbage collect asynchronously in the WAL Watcher The WAL Watcher replays a checkpoint after it is created in order to garbage collect series that no longer exist in the WAL. Currently the garbage collection process is done serially with reading from the tip of the WAL which can cause large delays in writing samples to remote storage just after compaction occurs. This also fixes a memory leak where dropped series are not cleaned up as part of the SeriesReset process. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	5 years ago
George Felix	895abbb7d0	Replaced test validations with testutils on storage/remote/codec_test.go (#6097 ) * Replaced test validations with testutils on storage/remote/codec_test.go Signed-off-by: George Felix <george.felix@ubeeqo.com> * gofmt Signed-off-by: George Felix <george.felix@ubeeqo.com> * Removed shouldPass assertion Signed-off-by: George Felix <gfelixc@gmail.com> * Fixes to improve readability Signed-off-by: George Felix <george.felix@ubeeqo.com> * Fixes based on code review comments Signed-off-by: George Felix <george.felix@ubeeqo.com>	5 years ago
Joe Elliott	95dc59ec7e	Replaced t.Fatalf() with testutil.Assert() in buffer_test.go (#6084 ) * Added Fatal method and used it in buffer_test Signed-off-by: Joe Elliott <number101010@gmail.com> * Added period to meet contributing guidelines Signed-off-by: Joe Elliott <number101010@gmail.com> * Removed fatal testutil method. Refactored test cases to use testutil.Assert Signed-off-by: Joe Elliott <number101010@gmail.com> * Added if found condition for clarity Signed-off-by: Joe Elliott <number101010@gmail.com>	5 years ago
陈谭军	103f26d188	fix the wrong word (#6069 ) Signed-off-by: chentanjun <2799194073@qq.com>	5 years ago
Callum Styan	3344bb5c33	Move WAL watcher code to tsdb/wal package. (#5999 ) * Move WAL watcher code to tsdb/wal package. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Fix tests after moving WAL watcher code. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Lint fixes. Signed-off-by: Callum Styan <callumstyan@gmail.com>	5 years ago
Björn Rabenstein	3b3eaf3496	Merge pull request #5787 from cstyan/reshard-max-logging Add metrics for max/min/desired shards to queue manager.	5 years ago
Chris Marchbanks	b4317768b9	Merge pull request #5849 from csmarchbanks/rw-use-labels Cache labels.Labels to Identify Series in Remote Write	5 years ago
Yao Zengzeng	f65b7c296d	fix TODO: only stop & recreate remote write queues which have changes (#5540 ) Signed-off-by: YaoZengzeng <yaozengzeng@zju.edu.cn>	5 years ago
Chris Marchbanks	791a2409a2	Pre-allocate pendingSamples to reduce allocations Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	5 years ago
Chris Marchbanks	160186da18	Store labels.Labels instead of []prompb.Label This will use half the steady state memory as required by prompb.Label. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	5 years ago
Stanislav Putrya	6141a8bd7c	Show warnings in UI if query have returned some warnings (#5964 ) * Show warnings in UI if query have returned some warnings + improve warning (error) text if query to remote was finished with error * Add prefixes for remote_read errors Signed-off-by: Stan Putrya <root.vagner@gmail.com>	5 years ago
Bartek Płotka	48b2c9c8ea	remote-read: streamed chunked server side; Extended protobuf; Added chunked, checksumed reader (#5703 ) Part of: https://github.com/prometheus/prometheus/issues/4517 and https://github.com/improbable-eng/thanos/issues/488 Changes: * Extended protobuf for chunked remote read and negotation. * Added checksumed, chunked Writer/Reader. * Added Server side implementation for chunked streamed remote-read. Signed-off-by: Bartek Plotka <bwplotka@gmail.com>	5 years ago
Julius Volz	b5c833ca21	Update go.mod dependencies before release (#5883 ) * Update go.mod dependencies before release Signed-off-by: Julius Volz <julius.volz@gmail.com> * Add issue for showing query warnings in promtool Signed-off-by: Julius Volz <julius.volz@gmail.com> * Revert json-iterator back to 1.1.6 It produced errors when marshaling Point values with special float values. Signed-off-by: Julius Volz <julius.volz@gmail.com> * Fix expected step values in promtool tests after client_golang update Signed-off-by: Julius Volz <julius.volz@gmail.com> * Update generated protobuf code after proto dep updates Signed-off-by: Julius Volz <julius.volz@gmail.com>	5 years ago
Bartek Płotka	32be514845	Merge pull request #5805 from codesome/merge-tsdb Merge tsdb into prometheus	5 years ago
Chris Marchbanks	a6a55c433c	Improve desired shards calculation (#5763 ) The desired shards calculation now properly keeps track of the rate of pending samples, and uses the previously unused integralAccumulator to adjust for missing information in the desired shards calculation. Also, configure more capacity for each shard. The default 10 capacity causes shards to block on each other while sending remote requests. Default to a 500 sample capacity and explain in the documentation that having more capacity will help throughput. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	5 years ago
Ganesh Vernekar	5ecef3542d	Cleanup after merging tsdb into prometheus Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	5 years ago
ethan	38ccf0157e	cleanup: correct func name in log message (#5852 ) Signed-off-by: Guangming Wang <guangming.wang@daocloud.io>	5 years ago
Chris Marchbanks	529ccff07b	Remove all usages of stretchr/testify Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	5 years ago
Chris Marchbanks	0685eb5395	Refactor testutil.NewStorage into a new package This avoids a circular dependency between the testutil and storage packages. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	5 years ago
Vadym Martsynovskyy	8318aa2d5d	Check for duplicate label names in remote read (#5829 ) * Check for duplicate label names in remote read Also add test to confirm that #5731 is fixed * Use subtests in TestValidateLabelsAndMetricName * Really check that expectedErr matches err Signed-off-by: Vadym Martsynovskyy <vmartsynovskyy@gmail.com>	5 years ago
Callum Styan	c40a83f386	Add metrics for max shards, min shards, and desired shards. Signed-off-by: Callum Styan <callumstyan@gmail.com>	5 years ago
AllenZMC	758c71b980	fix word `encourter` to `encounter` Signed-off-by: czm <zhongming.chang@daocloud.io>	5 years ago
Devin Trejo	d77f2aa29c	Only check last directory when discovering checkpoint number (#5756 ) * Only check last directory when discovering checkpoint number Signed-off-by: Devin Trejo <dtrejo@palantir.com> * Comments for checkpointNum Signed-off-by: Devin Trejo <dtrejo@palantir.com>	5 years ago
Yao Zengzeng	3cde8a9941	pass error up if WALWathcer.segments() return err (#5741 ) Signed-off-by: YaoZengzeng <yaozengzeng@zju.edu.cn>	5 years ago
Xigang Wang	445bcd1251	Update the runShard method and change len(pendingSamples) to n=len(pendingSamples) (#5708 ) Signed-off-by: xigang <wangxigang2014@gmail.com>	5 years ago
Chris Marchbanks	06f1ba73eb	Provide flag to compress the tsdb WAL Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	6 years ago
Chris Marchbanks	475ca2ecd0	Update to tsdb 0.9.1 Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	6 years ago
Chris Marchbanks	06bdaf076f	Remote Write Allocs Improvements (#5614 ) * Add benchmark for sample delivery * Simplify StoreSeries to have only one loop * Reduce allocations for pending samples in runShard * Only allocate one send slice per segment * Cache a buffer in each shard for snappy to use * Remove queue manager seriesMtx It is not possible for any of the places protected by the seriesMtx to be called concurrently so it is safe to remove. By removing the mutex we can simplify the Append code to one loop. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	6 years ago
Chris Marchbanks	a38a54fa11	Split remote write storage into its own type This allows other processes to reuse just the remote write code without having to use the remote read code as well. This will be used to create a sidecar capable of sending remote write payloads. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	6 years ago
Thomas Jackson	91d7175eaa	Add storage.Warnings to LabelValues and LabelNames (#5673 ) Fixes #5661 Signed-off-by: Thomas Jackson <jacksontj.89@gmail.com>	6 years ago
Dmitry Shmulevich	0c0638b080	resolve race condition in maxGauge (#5647 ) * resolve race condition in maxGauge Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@sysdig.com>	6 years ago
Chris Marchbanks	840872a6f8	Fix remote storage config not updating correctly (#5555 ) * Update remote write and remote read separately * Add external labels to the remote write conf hash * Add unit tests for remote storage lifecycle Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	6 years ago
Simon Pasquier	45506841e6	*: enable all default linters (#5504 ) Signed-off-by: Simon Pasquier <spasquie@redhat.com>	6 years ago
Callum Styan	3639d51eb6	Remote Storage: string interner should not panic in release (#5487 ) * Don't panic if we try to release a string that is not in the interner. * Move seriesMtx locking in QueueManager's StoreSeries function. This stops us from calling release for strings that aren't interned if there's a race between reading a checkpoint and storing new series labels, which could happen during checkpointing or reloading config. Signed-off-by: Callum Styan <callumstyan@gmail.com>	6 years ago
Callum Styan	e87449b59d	Remote Write: Queue Manager specific metrics shouldn't exist if the queue no longer exists (#5445 ) * Unregister remote write queue manager specific metrics when stopping the queue manager. * Use DeleteLabelValues instead of Unregister to remove queue and watcher related metrics when we stop them. Create those metrics in the structs start functions rather than in their constructors because of the ordering of creation, start, and stop in remote storage ApplyConfig. * Add setMetrics function to WAL watcher so we can set the watchers metrics in it's Start function, but not have to call Start in some tests (causes data race). Signed-off-by: Callum Styan <callumstyan@gmail.com>	6 years ago
Callum Styan	b7538e7b49	Don't stop, recreate, and start remote storage QueueManagers if the (#5485 ) remote write config hasn't changed at all. Signed-off-by: Callum Styan <callumstyan@gmail.com>	6 years ago
Romain Baugue	95193fa027	Exhaust every request body before closing it (#5166 ) (#5479 ) From the documentation: > The default HTTP client's Transport may not > reuse HTTP/1.x "keep-alive" TCP connections if the Body is > not read to completion and closed. This effectively enable keep-alive for the fixed requests. Signed-off-by: Romain Baugue <romain.baugue@elwinar.com>	6 years ago
Vasily Sliouniaev	5be9a1426f	Prevent reshard concurrent with calling stop (#5460 ) * Prevent reshard concurrent with calling stop Signed-off-by: Vasily <v.sliouniaev@gmail.com>	6 years ago
Callum Styan	c2b88992a3	Remote Write: fix checkpoint reading (#5429 ) * Fix ReadCheckpoint to ensure that it actually reads all the contents of each segment in a checkpoint dir, or returns an error. Signed-off-by: Callum Styan <callumstyan@gmail.com>	6 years ago
Tariq Ibrahim	8fdfa8abea	refine error handling in prometheus (#5388 ) i) Uses the more idiomatic Wrap and Wrapf methods for creating nested errors. ii) Fixes some incorrect usages of fmt.Errorf where the error messages don't have any formatting directives. iii) Does away with the use of fmt package for errors in favour of pkg/errors Signed-off-by: tariqibrahim <tariq181290@gmail.com>	6 years ago
Tom Wilkie	807fd33ecc	Review feedback. - Update read path to use labels.Labels. - Fix the tests. - Remove pack. - Remove unused function. - Fix race in tests. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	6 years ago
Callum Styan	1a7923dde3	Add ref counting to string interning so we can remove a string when there are no longer any refs. Add tests for interning. Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> Signed-off-by: Callum Styan <callumstyan@gmail.com>	6 years ago
Tom Wilkie	cbf5f13285	Naive string iterning for labes & values in the remote_write path. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	6 years ago
Tom Wilkie	c7b3535997	Use pkg/relabelling in remote write. - Unmarshall external_labels config as labels.Labels, add tests. - Convert some more uses of model.LabelSet to labels.Labels. - Remove old relabel pkg (fixes #3647). - Validate external label names. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	6 years ago
Tom Wilkie	2fa93595d6	More WAL remote_write tweaks. (#5300 ) * Consistently pre-lookup the metrics for a given queue in queue manager. * Don't open the WAL (for writing) in the remote_write code. * Add some more logging. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	6 years ago
Krasi Georgiev	1684dc750a	updated tsdb to 0.6.0 (#5292 ) * updated tsdb to 0.6.0 as part of the update also added the new storage.tsdb.allow-overlapping-blocks flag and mark it as experimental.	6 years ago
Tariq Ibrahim	1adb91738d	fix typo in recordType method of wal_watcher.go (#5297 ) Signed-off-by: tariqibrahim <tariq181290@gmail.com>	6 years ago
Tariq Ibrahim	ab8e9b7423	fix typo in queue_manager.go comment (#5294 ) Signed-off-by: tariqibrahim <tariq181290@gmail.com>	6 years ago
Tom Wilkie	67da8e7b46	Refactor and fix queue resharding (#5286 ) - Remove prometheus_remote_queue_last_send_timestamp_seconds metric. Its not particularly useful, we have highest_timestamp_seconds. - Factor out maxGauage, a gauge that only increases. - Change sharding calculations to use max samples in timestamp - max samples out timestamp (not rates). - Also include the ratio of samples dropped to correctly predict number of pending samples. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	6 years ago
Callum Styan	b8106dd459	Review feedback: - Add a dropped samples EWMA and use it in calculating desired shards. - Update metric names and a log messages. - Limit number of entries in the dedupe logging middleware to prevent potential OOM. Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	6 years ago
Callum Styan	512f549064	Refactor: inline decodeRecord in readSegment and don't bother decoding samples records if we're not tailing the segment, add a benchmark test and fix some other tests Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> Signed-off-by: Callum Styan <callumstyan@gmail.com>	6 years ago
Tom Wilkie	f795942572	Decrement pending sample when queue exits. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	6 years ago
Tom Wilkie	ee7efa93fe	Fix some tests. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	6 years ago
Callum Styan	b69bdfb4d1	Store the checkpoint we read last, so that we don't keep reading the same checkpoint on each tick. Signed-off-by: Callum Styan <callumstyan@gmail.com>	6 years ago
Tom Wilkie	efbd9559f4	Deal with corruptions in the WAL: - If we're replaying the WAL to get series records, skip that segment when we hit corruptions. - If we're tailing the WAL for samples, fail the watcher. - When the watcher fails, restart from the latest checkpoint - and only send new samples by updating startTime. - Tidy up log lines and error handling, don't return so many errors on quiting. - Expect EOF when processing checkpoints. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	6 years ago
Tom Wilkie	adf5307470	Update wal LiveReader to ensure EOF is correctly propagated. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	6 years ago
Callum Styan	d6258aea8f	Fix up remote write tests: - Tests that created a QueueManager were leaving behind files at the end of tests. - WAL replaying (readToEnd)tests seem to require extra time to finish now. - Some fixes to make staticcheck happy Signed-off-by: Callum Styan <callumstyan@gmail.com>	6 years ago
Tom Wilkie	184f06a981	Combine the record decoding metrics into one; break out garbage collection into a separate function. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	6 years ago
Tom Wilkie	859cda27ff	Remove some 'global' state, moving segment numbers to parameters. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	6 years ago
Tom Wilkie	bdc6b764b0	If reading the WAL fails, try again. Also, read from the segment containing the index for the last checkpoint, not the first segment. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	6 years ago
Tom Wilkie	d6f911b511	Factor out logging ratelimit & dedupe middleware. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	6 years ago
Tom Wilkie	a5c20642b3	Refactor WAL watcher to remove some duplication. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	6 years ago
Tom Wilkie	37ad4db485	Export timestamps in seconds since epoch. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	6 years ago
JoeWrightss	362873f72b	Fix .Log() error message (#5257 ) Signed-off-by: zhoulin xie <zhoulin.xie@daocloud.io>	6 years ago
Simon Pasquier	b41d6d54f2	storage/remote: increase timeouts for Travis CI (#5224 ) * storage/remote: adapt tests for Travis CI Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Check filesystems on Travis environment Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Run remote/storage tests on CircleCI for troubleshooting Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Try using tmpfs partition Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Revert "Try using tmpfs partition" This reverts commit `85a30deb72`. Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Don't store labels in writeToMock Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Fix data race Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Bump retries to 100 meaning that the total timeout is 10s Signed-off-by: Simon Pasquier <spasquie@redhat.com> * clean up .travis.yml Signed-off-by: Simon Pasquier <spasquie@redhat.com> * code fixup Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Remove unneeded empty line Signed-off-by: Simon Pasquier <spasquie@redhat.com>	6 years ago
Callum Styan	37e35f9e0c	Various improvements to WAL based remote write. - Use the queue name in WAL watcher logging. - Don't return from watch if the reader error was EOF. - Fix sample timestamp check logic regarding what samples we send. - Refactor so we don't need readToEnd/readSeriesRecords - Fix wal_watcher tests since readToEnd no longer exists Signed-off-by: Callum Styan <callumstyan@gmail.com>	6 years ago
Tom Wilkie	b93bafeee1	Various fixes to locking & shutdown for WAL-based remote write. - Remove datarace in the exported highest scrape timestamp. - Backoff on enqueue should be per-sample - reset the result for each sample. - Remove diffKeys, unused ctx and cancelfunc in WALWatcher, 'name' from writeTo interface, and pass it to constructor. - Reorder functions in WALWatcher depth-first according to call graph. - Fix vendor/modules.txt. - Split out the various timer periods into consts at the top of the file. - Move w.currentSegmentMetric.Set close to where we set the currentSegment. - Combine r.Next() and isClosed(w.quit) into a single loop. - Unnest some ifs in WALWatcher.watch, propagate erros in decodeRecord, add some new lines to make it easier to read. - Reorganise checkpoint handling to reduce nesting and make it easier to follow. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	6 years ago
Callum Styan	6f69e31398	Tail the TSDB WAL for remote_write This change switches the remote_write API to use the TSDB WAL. This should reduce memory usage and prevent sample loss when the remote end point is down. We use the new LiveReader from TSDB to tail WAL segments. Logic for finding the tracking segment is included in this PR. The WAL is tailed once for each remote_write endpoint specified. Reading from the segment is based on a ticker rather than relying on fsnotify write events, which were found to be complicated and unreliable in early prototypes. Enqueuing a sample for sending via remote_write can now block, to provide back pressure. Queues are still required to acheive parallelism and batching. We have updated the queue config based on new defaults for queue capacity and pending samples values - much smaller values are now possible. The remote_write resharding code has been updated to prevent deadlocks, and extra tests have been added for these cases. As part of this change, we attempt to guarantee that samples are not lost; however this initial version doesn't guarantee this across Prometheus restarts or non-retryable errors from the remote end (eg 400s). This changes also includes the following optimisations: - only marshal the proto request once, not once per retry - maintain a single copy of the labels for given series to reduce GC pressure Other minor tweaks: - only reshard if we've also successfully sent recently - add pending samples, latest sent timestamp, WAL events processed metrics Co-authored-by: Chris Marchbanks <csmarchbanks.com> (initial prototype) Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> (sharding changes) Signed-off-by: Callum Styan <callumstyan@gmail.com>	6 years ago
Goutham Veeramachaneni	384cba1211	Add flag for size based retention (#5109 ) * Add flag for size based retention Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> * Deprecate the old retention flag for a new one. Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> * Add ability to take a suffix for size flag Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> * Address feedback Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>	6 years ago
Krasi Georgiev	3bd41cc92c	Udpate tsdb to 0.4 (#5110 ) * update tsdb to v0.4.0 Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com> * remove unused struct field Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>	6 years ago
Matt Layher	302148fd69	*: apply gofmt -s Signed-off-by: Matt Layher <mdlayher@gmail.com>	6 years ago
Callum Styan	5358f76c5c	update remote write path proto so that Labels/Timeseries can't be nil (#4957 ) Signed-off-by: Callum Styan <callumstyan@gmail.com>	6 years ago
Simon Pasquier	f678e27eb6	: use latest release of staticcheck (#5057 ) : use latest release of staticcheck It also fixes a couple of things in the code flagged by the additional checks. Signed-off-by: Simon Pasquier <spasquie@redhat.com> Use official release of staticcheck Also run 'go list' before staticcheck to avoid failures when downloading packages. Signed-off-by: Simon Pasquier <spasquie@redhat.com>	6 years ago
glutamatt	5ddde1965b	tune the "Wal segment size" with a flag (#5029 ) Add WALSegmentSize as an option, and the corresponding flag "storage.tsdb.wal-segment-size" to tune the max size of wal segment files. The addressed base problem is to reduce the disk space used by wal segment files : on a raspberry pi, for instance, we often want to reduce write load of the sd card, then, the wal directory is mounted on a memory (space limited) partition. the default value of the segment max file size, pushed the size of directory to 128 MB for each segment , which is too much ram consumption on a rasp. the initial discussion is at https://github.com/prometheus/tsdb/pull/450	6 years ago
Tom Wilkie	6e08029b56	Move err to be the last return value from storage.Select. (#5054 ) Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	6 years ago
AixesHunter	806632790e	update inconsistent comment (#5046 ) Co-Authored-By: aixeshunter <44970652+aixeshunter@users.noreply.github.com> Signed-off-by: aixeshunter <aixeshunter@gmail.com>	6 years ago
Bartek Płotka	62c8337e77	Moved configuration into `relabel` package. (#4955 ) Adapted top dir relabel to use pkg relabel structs. Removal of this in a separate tracked here: https://github.com/prometheus/prometheus/issues/3647 Signed-off-by: Bartek Plotka <bwplotka@gmail.com>	6 years ago
Alin Sinpalean	44bec482fb	Minor optimization for BufferedSeriesIterator: actually drop the samples falling outside of the new delta from the underlying sampleRing, when ReduceDelta is called. (#4849 ) Signed-off-by: Alin Sinpalean <alin.sinpalean@gmail.com>	6 years ago
Alin Sinpalean	d6adfe2ae2	Use a fake SeriesIterator (that generates samples on the fly instead of using a slice) for BufferedSeriesIterator, to reduce the variance of benchmark results due to memory pressure. (#4847 ) Signed-off-by: Alin Sinpalean <alin.sinpalean@gmail.com>	6 years ago
Ryota Arai	135d580ab2	Introduce min_shards for remote write to set minimum number of shards. (#4924 ) Signed-off-by: Ryota Arai <ryota.arai@gmail.com>	6 years ago
mknapphrt	f0e9196dca	Return warnings on a remote read fail (#4832 ) Signed-off-by: Mark Knapp <mknapp@hudson-trading.com>	6 years ago
Ben Kochie	c6399296dc	Fix spelling/typos (#4921 ) * Fix spelling/typos Fix spelling/typos reported by codespell/misspell. * UK -> US spelling changes. Signed-off-by: Ben Kochie <superq@gmail.com>	6 years ago
Daniele Sluijters	f25a6baedb	remote: Set User-Agent header in requests (#4891 ) Currently Prometheus requests show up with a UA of Go-http-client/1.1 which isn't super helpful. Though the X-Prometheus-Remote-* headers exist they need to be explicitly configured when logging the request in order to be able to deduce this is a request originating from Prometheus. By setting the header we remove this ambiguity and make default server logs just a bit more useful. This also updates a few other places to consistently capitalize the 'P' in the user agent, as well as ensure we set a UA to begin with. Signed-off-by: Daniele Sluijters <daenney@users.noreply.github.com>	6 years ago
Krasi Georgiev	bd100182b2	added tsdb/head mint maxt metrics (#4888 ) added the head metrics with the correct suffix. Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>	6 years ago
Simon Pasquier	ed19373a78	: remove use of golang.org/x/net/context (#4869 ) : remove use of golang.org/x/net/context Signed-off-by: Simon Pasquier <spasquie@redhat.com> scrape: fix TestTargetScrapeScrapeCancel Signed-off-by: Simon Pasquier <spasquie@redhat.com>	6 years ago
Ganesh Vernekar	ca93fd544b	/api/v1/labels endpoint for getting all label names (#4835 ) * vendor: update tsdb Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * /api/v1/labels endpoint Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * regex matchers for API Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Add docs Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Matchers behaving as OR Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Removed the matchers Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * vendor: update tsdb using go mod Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * vendor update: tsdb Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Added LabelNames() to storage.Querier Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Test for api.labelNames Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Nits Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	6 years ago
fengyuceNv	94fff219ab	improve remote storage enqueue performance (#4772 ) Signed-off-by: fyc <fyc22788@ly.com>	6 years ago
Tariq Ibrahim	3f7ed7de49	Adding new metric type to track in-flight remote read queries. (#4677 ) Signed-off-by: tariqibrahim <tariq.ibrahim@microsoft.com>	6 years ago
Tom Wilkie	d3a1ff1abf	Reduce memory usage of remote read by reducing pointer usage. (#4655 ) Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	6 years ago
yzpeninsula	4ae3bce260	Fix typo (#4497 ) Signed-off-by: yzpeninsula <yzpeninsula@gmail.com>	6 years ago
Tom Wilkie	457e4bb58e	Limit the number of samples remote read can return. (#4532 ) * Limit the number of samples remote read can return. - Return 413 entity too large. - Limit can be set be a flag. Allow 0 to mean no limit. - Include limit in error message. - Set default limit to 50M (* 16 bytes = 800MB). Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	6 years ago
Daisy T	7d01ead689	change time.duration to model.duration for standardization (#4479 ) Signed-off-by: Daisy T <daisyts@gmx.com>	6 years ago
Julius Volz	8fbe1b5133	Handle a bunch of unchecked errors (#4461 ) There are many more (mostly finalizers like Close/Stop/etc.), but most of the others seemed like one couldn't do much about them anyway. Signed-off-by: Julius Volz <julius.volz@gmail.com>	6 years ago
Henri DF	ffb7836c14	Send "Accept-Encoding" header in read request (#4421 ) We should be doing this since we only accept Snappy-encoded responses. Signed-off-by: Henri DF <henridf@gmail.com>	6 years ago
Henri DF	3abb2cc349	Fix typo (#4423 ) Signed-off-by: Henri DF <henridf@gmail.com>	6 years ago
Alin Sinpalean	372e7652b7	Reuse (copy) overlapping matrix samples between range evaluation steps (#4315 ) * Reuse (copy) overlapping matrix samples between range evaluation steps. Signed-off-by: Alin Sinpalean <alin.sinpalean@gmail.com>	6 years ago
Goutham Veeramachaneni	c28cc5076c	Saner defaults and metrics for remote-write (#4279 ) * Rename queueCapacity to shardCapacity * Saner defaults for remote write * Reduce allocs on retries Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	6 years ago
Alin Sinpalean	e3b775b78b	Simplify BufferedSeriesIterator usage (#4294 ) * Allow for BufferedSeriesIterator instances to be created without an underlying iterator, to simplify their usage. Signed-off-by: Alin Sinpalean <alin.sinpalean@gmail.com>	6 years ago
Thomas Jackson	92c6f0c92e	Add offset to selectParams (#4226 ) * Add Start/End to SelectParams * Make remote read use the new selectParams for start/end This commit will continue sending the start/end time of the remote read query as the overarching promql time and the specific range of data that the query is intersted in receiving a response to is now part of the ReadHints (upstream discussion in #4226). * Remove unused vendored code The genproto.sh script was updated, but the code wasn't regenerated. This simply removes the vendored deps that are no longer part of the codegen output. Signed-off-by: Thomas Jackson <jacksontj.89@gmail.com>	6 years ago
Brian Brazil	fb695fb435	Merge pull request #4285 from prometheus/release-2.3 Merge release-2.3 back to master	7 years ago
Tom Wilkie	b8217720ac	Review feedback. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	7 years ago
Corentin Chary	db9dbeeaec	federation: nil pointer deference when using remove read ``` level=error ts=2018-06-13T07:19:04.515149169Z caller=stdlib.go:89 component=web caller="http: panic serving [::1" msg="]:56202: runtime error: invalid memory address or nil pointer dereference" level=error ts=2018-06-13T07:19:04.516199547Z caller=stdlib.go:89 component=web caller="http: panic serving [::1" msg="]:56204: runtime error: invalid memory address or nil pointer dereference" level=error ts=2018-06-13T07:19:04.51717692Z caller=stdlib.go:89 component=web caller="http: panic serving [::1" msg="]:56206: runtime error: invalid memory address or nil pointer dereference" level=error ts=2018-06-13T07:19:04.564952878Z caller=stdlib.go:89 component=web caller="http: panic serving [::1" msg="]:56208: runtime error: invalid memory address or nil pointer dereference" level=error ts=2018-06-13T07:19:04.566575791Z caller=stdlib.go:89 component=web caller="http: panic serving [::1" msg="]:56210: runtime error: invalid memory address or nil pointer dereference" level=error ts=2018-06-13T07:19:04.567106063Z caller=stdlib.go:89 component=web caller="http: panic serving [::1" msg="]:56212: runtime error: invalid memory address or nil pointer dereference" ``` When remove read is enabled, federation will call `q.Select(nil, mset...)` which will break remote reads because it currently doesn't handle empty SelectParams. Signed-off-by: Corentin Chary <c.chary@criteo.com>	7 years ago
Brian Brazil	78efdc6d6b	Avoid infinite loop on duplicate NaN values. (#4275 ) Fixes #4254 NaNs don't equal themselves, so a duplicate NaN would always hit the break statement and never get popped. We should not be returning multiple data point for the same timestamp, so don't compare values at all. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	7 years ago
Tom Wilkie	0b189b2da9	Review feedback. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	7 years ago
Corentin Chary	530107f8ef	federation: nil pointer deference when using remove read ``` level=error ts=2018-06-13T07:19:04.515149169Z caller=stdlib.go:89 component=web caller="http: panic serving [::1" msg="]:56202: runtime error: invalid memory address or nil pointer dereference" level=error ts=2018-06-13T07:19:04.516199547Z caller=stdlib.go:89 component=web caller="http: panic serving [::1" msg="]:56204: runtime error: invalid memory address or nil pointer dereference" level=error ts=2018-06-13T07:19:04.51717692Z caller=stdlib.go:89 component=web caller="http: panic serving [::1" msg="]:56206: runtime error: invalid memory address or nil pointer dereference" level=error ts=2018-06-13T07:19:04.564952878Z caller=stdlib.go:89 component=web caller="http: panic serving [::1" msg="]:56208: runtime error: invalid memory address or nil pointer dereference" level=error ts=2018-06-13T07:19:04.566575791Z caller=stdlib.go:89 component=web caller="http: panic serving [::1" msg="]:56210: runtime error: invalid memory address or nil pointer dereference" level=error ts=2018-06-13T07:19:04.567106063Z caller=stdlib.go:89 component=web caller="http: panic serving [::1" msg="]:56212: runtime error: invalid memory address or nil pointer dereference" ``` When remove read is enabled, federation will call `q.Select(nil, mset...)` which will break remote reads because it currently doesn't handle empty SelectParams. Signed-off-by: Corentin Chary <c.chary@criteo.com>	7 years ago
Andreas Auernhammer	37d1bcf495	limit size of POST requests against remote read endpoint (#4239 ) This commit fixes a denial-of-service issue of the remote read endpoint. It limits the size of the POST request body to 32 MB such that clients cannot write arbitrary amounts of data to the server memory. Fixes #4238 Signed-off-by: Andreas Auernhammer <aead@mail.de>	7 years ago
Fabian Reinartz	fe80dddbc4	Merge pull request #4210 from bboreham/log-remote-name Add queue name to logger for remote writes	7 years ago
Brian Brazil	dd6781add2	Optimise PromQL (#3966 ) * Move range logic to 'eval' Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Make aggregegate range aware Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * PromQL is statically typed, so don't eval to find the type. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Extend rangewrapper to multiple exprs Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Start making function evaluation ranged Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Make instant queries a special case of range queries Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Eliminate evalString Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Evaluate range vector functions one series at a time Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Make unary operators range aware Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Make binops range aware Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Pass time to range-aware functions. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Make simple _over_time functions range aware Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Reduce allocs when working with matrix selectors Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Add basic benchmark for range evaluation Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Reuse objects for function arguments Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Do dropmetricname and allocating output vector only once. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Add range-aware support for range vector functions with params Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Optimise holt_winters, cut cpu and allocs by ~25% Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Make rate&friends range aware Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Make more functions range aware. Document calling convention. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Make date functions range aware Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Make simple math functions range aware Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Convert more functions to be range aware Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Make more functions range aware Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Specialcase timestamp() with vector selector arg for range awareness Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Remove transition code for functions Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Remove the rest of the engine transition code Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Remove more obselete code Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Remove the last uses of the eval* functions Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Remove engine finalizers to prevent corruption The finalizers set by matrixSelector were being called just before the value they were retruning to the pool was then being provided to the caller. Thus a concurrent query could corrupt the data that the user has just been returned. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Add new benchmark suite for range functinos Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Migrate existing benchmarks to new system Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Expand promql benchmarks Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Simply test by removing unused range code Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * When testing instant queries, check range queries too. To protect against subsequent steps in a range query being affected by the previous steps, add a test that evaluates an instant query that we know works again as a range query with the tiimestamp we care about not being the first step. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Reuse ring for matrix iters. Put query results back in pool. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Reuse buffer when iterating over matrix selectors Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Unary minus should remove metric name Cut down benchmarks for faster runs. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Reduce repetition in benchmark test cases Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Work series by series when doing normal vectorSelectors Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Optimise benchmark setup, cuts time by 60% Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Have rangeWrapper use an evalNodeHelper to cache across steps Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Use evalNodeHelper with functions Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Cache dropMetricName within a node evaluation. This saves both the calculations and allocs done by dropMetricName across steps. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Reuse input vectors in rangewrapper Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Reuse the point slices in the matrixes input/output by rangeWrapper Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Make benchmark setup faster using AddFast Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Simplify benchmark code. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Add caching in VectorBinop Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Use xor to have one-level resultMetric hash key Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Add more benchmarks Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Call Query.Close in apiv1 This allows point slices allocated for the response data to be reused by later queries, saving allocations. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Optimise histogram_quantile It's now 5-10% faster with 97% less garbage generated for 1k steps Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Make the input collection in rangeVector linear rather than quadratic Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Optimise label_replace, for 1k steps 15x fewer allocs and 3x faster Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Optimise label_join, 1.8x faster and 11x less memory for 1k steps Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Expand benchmarks, cleanup comments, simplify numSteps logic. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Address Fabian's comments Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Comments from Alin. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Address jrv's comments Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Remove dead code Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Address Simon's comments. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Rename populateIterators, pre-init some sizes Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Handle case where function has non-matrix args first Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Split rangeWrapper out to rangeEval function, improve comments Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Cleanup and make things more consistent Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Make EvalNodeHelper public Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Fabian's comments. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	7 years ago
Bryan Boreham	3277aeefaa	Add queue name to logger for remote writes More than one remote_write destination can be configured, in which case it's essential to know which one each log message refers to. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	7 years ago
Tom Wilkie	b58199bf12	Review feedback. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	7 years ago
Tom Wilkie	3353bbd018	Add proper unclean shutdown handling with a cancellable context. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	7 years ago
Tom Wilkie	e51d6c4b6c	Make remote flush deadline a command line param. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	7 years ago
Tom Wilkie	a6c353613a	Make the flush deadline configurable. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	7 years ago
Tom Wilkie	aa17263edd	Remove WaitGroup and extra goroutine. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	7 years ago
Tom Wilkie	f3c61f8bb2	Only give remote queues 1 minute to flush samples on shutdown. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	7 years ago
Tom Wilkie	ba418780be	Dedupe samples in the mergeIterator. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	7 years ago
Henri DF	2952387ed1	Pass query hints down into remote read query proto (#4122 ) Signed-off-by: Henri DF <henridf@gmail.com>	7 years ago
Adam Shannon	809881d7f5	support reading basic_auth password_file for HTTP basic auth (#4077 ) Issue: https://github.com/prometheus/prometheus/issues/4076 Signed-off-by: Adam Shannon <adamkshannon@gmail.com>	7 years ago
Mario Trangoni	464e747f1e	fix some comments typos (#4059 )	7 years ago
ferhat elmas	ec8e4d8a7c	all: remove unnecessary type conversions (#3992 ) excep promql due to not to create conflict with #3966.	7 years ago
Tom Wilkie	02a154ced6	Merge pull request #3941 from prometheus/3809-correctly-stop-timer Correctly stop the timer used in the remote write path.	7 years ago
Tom Wilkie	dc860e7d0e	Fix nit.	7 years ago
Tom Wilkie	390b018c90	Test sample timeout delivery.	7 years ago
Tom Wilkie	22d820ef8e	Review feedback.	7 years ago
Brian Brazil	a8c22c85cc	Correctly handle pruning wraparound after ring expansion (#3942 ) Fixes #3939	7 years ago
Tom Wilkie	f8c9d375b6	Correctly stop the timer used in the remote write path.	7 years ago
ferhat elmas	ffa673f7d8	General simplifications (#3887 ) Another try as in #1516	7 years ago
Fabian Reinartz	7ccd4b39b8	*: implement query params This adds a parameter to the storage selection interface which allows query engine(s) to pass information about the operations surrounding a data selection. This can for example be used by remote storage backends to infer the correct downsampling aggregates that need to be provided.	7 years ago
Tom Wilkie	a730083cbf	Merge pull request #3731 from bboreham/reuse-timer Re-use timer in remote storage queue	7 years ago
Krasi Georgiev	b75428ec19	rename package retrieve to scrape no fucnctinal changes just renaming retrieval to scrape	7 years ago
Tom Wilkie	3dc5b8eef5	Use sub benchmarks.	7 years ago
Tom Wilkie	da29c09dca	Some benchmarks for the mergeSeries set.	7 years ago
Tom Wilkie	749781edf3	Also, don't make a mergeSeriesSet if there is only one SeriesSet.	7 years ago
Tom Wilkie	48e39068bd	Don't allocate a mergeSeries if there is only one series to merge.	7 years ago
Bryan Boreham	8a4535e6ad	Re-use timer instead of creating new ones on every sample The docs for `time.After()` note that "The underlying Timer is not recovered by the garbage collector until the timer fires".	7 years ago
Tom Wilkie	f2c5399e39	Merge pull request #3561 from twiedenbein/master fixed bug with initialization of queueconfig	7 years ago
Shubheksha Jalan	0471e64ad1	Use shared types from the `common` repo (#3674 ) * refactor: use shared types from common repo, remove util/config * vendor: add common/config * fix nit	7 years ago
Shubheksha Jalan	ec94df49d4	Refactor SD configuration to remove `config` dependency (#3629 ) * refactor: move targetGroup struct and CheckOverflow() to their own package * refactor: move auth and security related structs to a utility package, fix import error in utility package * refactor: Azure SD, remove SD struct from config * refactor: DNS SD, remove SD struct from config into dns package * refactor: ec2 SD, move SD struct from config into the ec2 package * refactor: file SD, move SD struct from config to file discovery package * refactor: gce, move SD struct from config to gce discovery package * refactor: move HTTPClientConfig and URL into util/config, fix import error in httputil * refactor: consul, move SD struct from config into consul discovery package * refactor: marathon, move SD struct from config into marathon discovery package * refactor: triton, move SD struct from config to triton discovery package, fix test * refactor: zookeeper, move SD structs from config to zookeeper discovery package * refactor: openstack, remove SD struct from config, move into openstack discovery package * refactor: kubernetes, move SD struct from config into kubernetes discovery package * refactor: notifier, use targetgroup package instead of config * refactor: tests for file, marathon, triton SD - use targetgroup package instead of config.TargetGroup * refactor: retrieval, use targetgroup package instead of config.TargetGroup * refactor: storage, use config util package * refactor: discovery manager, use targetgroup package instead of config.TargetGroup * refactor: use HTTPClient and TLS config from configUtil instead of config * refactor: tests, use targetgroup package instead of config.TargetGroup * refactor: fix tagetgroup.Group pointers that were removed by mistake * refactor: openstack, kubernetes: drop prefixes * refactor: remove import aliases forced due to vscode bug * refactor: move main SD struct out of config into discovery/config * refactor: rename configUtil to config_util * refactor: rename yamlUtil to yaml_config * refactor: kubernetes, remove prefixes * refactor: move the TargetGroup package to discovery/ * refactor: fix order of imports	7 years ago
Ed Schouten	bb724f1bef	Deprecate DeduplicateSeriesSet() in favor of NewMergeSeriesSet(). Federation makes use of dedupedSeriesSet to merge SeriesSets for every query into one output stream. If many match[] arguments are provided, many dedupedSeriesSet objects will get chained. This has the downside of causing a potential O(nk) running time, where n is the number of series and k the number of match[] arguments. In the mean time, the storage package provides a mergeSeriesSet that accomplishes the same with an O(nlog(k)) running time by making use of a binary heap. Let's just get rid of dedupedSeriesSet and change all existing callers to use mergeSeriesSet.	7 years ago
Tom Wiedenbein	937ac8c060	fixed bug with initialization of queueconfig QueueConfigs would only ever initialize to the default settings, and would not pick up their respective values from YAML.	7 years ago
Fabian Reinartz	83cd270ea4	*: adapt to storage interface changes	7 years ago
Tobias Schmidt	7098c56474	Add remote read filter option For special remote read endpoints which have only data for specific queries, it is desired to limit the number of queries sent to the configured remote read endpoint to reduce latency and performance overhead.	7 years ago
Tobias Schmidt	434f0374f7	Refactor remote storage querier handling * Decouple remote client from ReadRecent feature. * Separate remote read filter into a small, testable function. * Use storage.Queryable interface to compose independent functionalities.	7 years ago
Tobias Schmidt	9b0091d487	Add storage.Queryable and storage.QueryableFunc In order to compose different querier implementations more easily, this change introduces a separate storage.Queryable interface grouping the query (Querier) function of the storage. Furthermore, it adds a QueryableFunc type to ease writing very simple queryable implementations.	7 years ago
Julius Volz	9f10c63cff	Fix remote read labelset corruption (#3456 ) The labelsets returned from remote read are mutated in higher levels (like seriesFilter.Labels()) and since the concreteSeriesSet didn't return a copy, the external mutation affected the labelset in the concreteSeries itself. This resulted in bizarre bugs where local and remote series would show with identical label sets in the UI, but not be deduplicated, since internally, a series might come to look like: {__name__="node_load5", instance="192.168.1.202:12090", job="node_exporter", node="odroid", node="odroid"} (note the repetition of the last label)	7 years ago
Krasi Georgiev	5d8f93a22a	now using only github.com/gogo/protobuf bumped all grpc-gateway packages to v1.2.2 updated and run the denproto.sh script	7 years ago
Fabian Reinartz	30e777d10d	tsdb: default too small max block duration	7 years ago
Tom Wilkie	48a7a00a38	Fast path the merge querier (#3358 ) * Fast path the merge querier such that it is completely removed from query path when there is no remote storage. * Add NoopQuerier * Add copyright notice. * Avoid global, use a function.	7 years ago
Tom Wilkie	0e572686db	Revert "Bypass the fanout storage merging if no remote storage is configured."	7 years ago
Tom Wilkie	1af3ef431d	s/TestRemoveLabels/TestSeriesSetFilter/	7 years ago
Tom Wilkie	9c3c98e8de	Revert "Port 'Don't disable HTTP keep-alives for remote storage connections.' to 2.0 (see #3173 )" This reverts commit `0997191b18`.	7 years ago
Tom Wilkie	746752b946	Merge external labels in order.	7 years ago
Tom Wilkie	6e4d4ea402	Initialise some counters in remote storage API.	7 years ago
Tom Wilkie	2ae04d0e79	Add license header.	7 years ago
Tom Wilkie	e8c264e47a	Add comment.	7 years ago
Tom Wilkie	ee011d906d	Port remote read server to 2.0.	7 years ago
Bryan Boreham	0997191b18	Port 'Don't disable HTTP keep-alives for remote storage connections.' to 2.0 (see #3173 ) Removes configurability introduced in #3160 in favour of hard-coding, per advice from @brian-brazil.	7 years ago
Tom Wilkie	56820726fa	Move a couple of the encoding/decoding functions into codec.go	7 years ago
Conor Broderick	08b7328669	Port Metric name validation to 2.0 (see #2975 )	7 years ago
Tom Wilkie	8fe0212ff7	Port 'Make queue manager configurable.' to 2.0, see #2991	7 years ago
Tom Wilkie	3760f56c0c	remote: Expose ClientConfig type (see #3165 )	7 years ago
Tom Wilkie	16f71a7723	Port codec.go over form 1.8 branch.	7 years ago
Fabian Reinartz	e53040e2ac	Merge pull request #3339 from tomwilkie/3065-remote-read-bypass Bypass the fanout storage merging if no remote storage is configured.	7 years ago
Fabian Reinartz	bf56ad4233	Merge branch 'master' into master	7 years ago
Paul Gier	c4c3205d76	storage/tsdb: check that max block duration is larger than min If the user accidentally sets the max block duration smaller than the min, the current error is not informative. This change just performs the check earlier and improves the error message.	7 years ago
Fabian Reinartz	ce63a5a855	Merge pull request #3352 from prometheus/rc2 Cut v2.0.0-rc.2	7 years ago
Thibault Chataigner	fc4406201e	Tsdb StartTime : Use a simplier way to compute StartTime	7 years ago
Julius Volz	099df0c5f0	Migrate "golang.org/x/net/context" -> "context" (#3333 ) In some places, where ctxhttp or gRPC are concerned, we still need to use the old contexts.	7 years ago
Tom Wilkie	4bbef0ec30	Bypass the fanout storage merging if no remote storage is configured.	7 years ago
Fabian Reinartz	a57ea79660	Close index reader properly	7 years ago
Julius Volz	c3d6abc8e6	Fix some lint errors (#3334 ) I left the promql ones and some others untouched as I remember that @fabxc prefers them that way.	7 years ago

... 2 3 4 5 6 ...

1176 Commits (b277571b24ba7acefc168c22c540a929dca58286)