prometheus

Commit Graph

Author	SHA1	Message	Date
Julius Volz	d5ef0c64dc	Merge "Add optional sample replication to OpenTSDB."	11 years ago
Julius Volz	61d26e8445	Add optional sample replication to OpenTSDB. Prometheus needs long-term storage. Since we don't have enough resources to build our own timeseries storage from scratch ontop of Riak, Cassandra or a similar distributed datastore at the moment, we're planning on using OpenTSDB as long-term storage for Prometheus. It's data model is roughly compatible with that of Prometheus, with some caveats. As a first step, this adds write-only replication from Prometheus to OpenTSDB, with the following things worth noting: 1) I tried to keep the integration lightweight, meaning that anything related to OpenTSDB is isolated to its own package and only main knows about it (essentially it tees all samples to both the existing storage and TSDB). It's not touching the existing TieredStorage at all to avoid more complexity in that area. This might change in the future, especially if we decide to implement a read path for OpenTSDB through Prometheus as well. 2) Backpressure while sending to OpenTSDB is handled by simply dropping samples on the floor when the in-memory queue of samples destined for OpenTSDB runs full. Prometheus also only attempts to send samples once, rather than implementing a complex retry algorithm. Thus, replication to OpenTSDB is best-effort for now. If needed, this may be extended in the future. 3) Samples are sent in batches of limited size to OpenTSDB. The optimal batch size, timeout parameters, etc. may need to be adjusted in the future. 4) OpenTSDB has different rules for legal characters in tag (label) values. While Prometheus allows any characters in label values, OpenTSDB limits them to a to z, A to Z, 0 to 9, -, _, . and /. Currently any illegal characters in Prometheus label values are simply replaced by an underscore. Especially when integrating OpenTSDB with the read path in Prometheus, we'll need to reconsider this: either we'll need to introduce the same limitations for Prometheus labels or escape/encode illegal characters in OpenTSDB in such a way that they are fully decodable again when reading through Prometheus, so that corresponding timeseries in both systems match in their labelsets. Change-Id: I8394c9c55dbac3946a0fa497f566d5e6e2d600b5	11 years ago
Stuart Nelson	0c58e388f6	rename curation metrics to prometheus_curation Change-Id: I6a0bf277e88ea8eb737670b7e865ae20f2cbfb91	11 years ago
Stuart Nelson	28f59edf16	Added telemetry for counting stored samples Change-Id: I0f36f7c2738d070ca2f107fcb315f98e46803af3	11 years ago
Tobias Schmidt	6947ee9bc9	Try to create metrics root directory if missing This change tries to be nice and create the metrics directoy first before erroring out. Change-Id: I72691cdc32469708cd671c6ef1fb7db55fe60430	11 years ago
Julius Volz	740d448983	Use custom timestamp type for sample timestamps and related code. So far we've been using Go's native time.Time for anything related to sample timestamps. Since the range of time.Time is much bigger than what we need, this has created two problems: - there could be time.Time values which were out of the range/precision of the time type that we persist to disk, therefore causing incorrectly ordered keys. One bug caused by this was: https://github.com/prometheus/prometheus/issues/367 It would be good to use a timestamp type that's more closely aligned with what the underlying storage supports. - sizeof(time.Time) is 192, while Prometheus should be ok with a single 64-bit Unix timestamp (possibly even a 32-bit one). Since we store samples in large numbers, this seriously affects memory usage. Furthermore, copying/working with the data will be faster if it's smaller. MEMORY USAGE RESULTS Initial memory usage comparisons for a running Prometheus with 1 timeseries and 100,000 samples show roughly a 13% decrease in total (VIRT) memory usage. In my tests, this advantage for some reason decreased a bit the more samples the timeseries had (to 5-7% for millions of samples). This I can't fully explain, but perhaps garbage collection issues were involved. WHEN TO USE THE NEW TIMESTAMP TYPE The new clientmodel.Timestamp type should be used whenever time calculations are either directly or indirectly related to sample timestamps. For example: - the timestamp of a sample itself - all kinds of watermarks - anything that may become or is compared to a sample timestamp (like the timestamp passed into Target.Scrape()). When to still use time.Time: - for measuring durations/times not related to sample timestamps, like duration telemetry exporting, timers that indicate how frequently to execute some action, etc. NOTE ON OPERATOR OPTIMIZATION TESTS We don't use operator optimization code anymore, but it still lives in the code as dead code. It still has tests, but I couldn't get all of them to pass with the new timestamp format. I commented out the failing cases for now, but we should probably remove the dead code soon. I just didn't want to do that in the same change as this. Change-Id: I821787414b0debe85c9fffaeb57abd453727af0f	11 years ago
Julius Volz	6b7de31a3c	Upgrade to LevelDB 1.14.0 to fix LevelDB bugs. This tentatively fixes https://github.com/prometheus/prometheus/issues/368 due to an upstream bugfix in snapshotted LevelDB iterator handling, which got fixed in LevelDB 1.14.0: https://code.google.com/p/leveldb/issues/detail?id=200 Change-Id: Ib0cc67b7d3dc33913a1c16736eff32ef702c63bf	11 years ago
Julius Volz	db015de65b	Comment and "go fmt" fixups in compaction tests. Change-Id: Iaa0eda6a22a5caa0590bae87ff579f9ace21e80a	11 years ago
Julius Volz	51408bdfe8	Merge changes I3ffeb091,Idffefea4 * changes: Add chunk sanity checking to dumper tool. Add compaction regression tests.	11 years ago
Julius Volz	2162e57784	Merge "Fix watermarker default time / LevelDB key ordering bug."	11 years ago
Julius Volz	5e18255920	Merge "Fix chunk corruption compaction bug."	11 years ago
Julius Volz	eb461a707d	Add chunk sanity checking to dumper tool. Also, move codecs/filters to common location so they can be used in subsequent test. Change-Id: I3ffeb09188b8f4552e42683cbc9279645f45b32e	11 years ago
Julius Volz	6ea22f2bf9	Add compaction regression tests. This adds regression tests that catch the two error cases reported in https://github.com/prometheus/prometheus/issues/367 It also adds a commented-out test case for the crash in https://github.com/prometheus/prometheus/issues/368 but there's no fix for the latter crash yet. Change-Id: Idffefea4ed7cc281caae660bcad2e3c13ec3bd17	11 years ago
Conor Hennessy	9a48010cec	Add a check for metrics directory existence. Previously on startup the program would just quit without stating explicitly why. Change-Id: I833b85eb74d2dd27cdc3f0f2e65d7bb1c42caa39	11 years ago
Julius Volz	b5f6e3c90c	Fix watermarker default time / LevelDB key ordering bug. This fixes part 2) of https://github.com/prometheus/prometheus/issues/367 (uninitialized time.Time mapping to a higher LevelDB key than "normal" timestamps). Change-Id: Ib079974110a7b7c4757948f81fc47d3d29ae43c9	11 years ago
Julius Volz	a1a97ed064	Fix chunk corruption compaction bug. This fixes part 1) of https://github.com/prometheus/prometheus/issues/367 (the storing of samples with the wrong fingerprint into a compacted chunk, thus corrupting it). Change-Id: I4c36d0d2e508e37a0aba90b8ca2ecc78ee03e3f1	11 years ago
Matt T. Proud	86fcbe5bde	Retain DTO on each cycle. Change-Id: Ifc6f68f98eacb01097771d0dbf043c98bba1d518	11 years ago
Matt T. Proud	4a87c002e8	Update low-level i'faces to reflect wireformats. This commit fixes a critique of the old storage API design, whereby the input parameters were always as raw bytes and never Protocol Buffer messages that encapsulated the data, meaning every place a read or mutation was conducted needed to manually perform said translations on its own. This is taxing. Change-Id: I4786938d0d207cefb7782bd2bd96a517eead186f	11 years ago
Matt T. Proud	7910f6e863	Prevent total storage locking during memory flush. While a hack, this change should allow us to serve queries expeditiously during a flush operation. Change-Id: I9a483fd1dd2b0638ab24ace960df08773c4a5079	11 years ago
Matt T. Proud	12d5e6ca5a	Curation should not starve user-interactive ops. The background curation should be staggered to ensure that disk I/O yields to user-interactive operations in a timely manner. The lack of routine prioritization necessitates this. Change-Id: I9b498a74ccd933ffb856e06fedc167430e521d86	11 years ago
Matt T. Proud	2b42fd0068	Snapshot of no more frontier. Change-Id: Icd52da3f52bfe4529829ea70b4865ed7c9f6c446	11 years ago
Matt T. Proud	7db518d3a0	Abstract high watermark cache into standard LRU. Conflicts: storage/metric/memory.go storage/metric/tiered.go storage/metric/watermark.go Change-Id: Iab2aedbd8f83dc4ce633421bd4a55990fa026b85	11 years ago
Matt T. Proud	d74c2c54d4	Interfacification of stream. Move the stream to an interface, for a number of additional changes around it are underway. Conflicts: storage/metric/memory.go Change-Id: I4a5fc176f4a5274a64ebdb1cad52600954c463c3	11 years ago
Matt T. Proud	c262907fec	Kill interface cruft. These pieces were never used and should be thusly removed. Change-Id: I8dd151ec4c40b6d3ccffad1bb9b8b75a92e9ee37	11 years ago
Matt T. Proud	b23acccea8	Kill AppendSample interface definition. AppendSample will be repcated with AppendSamples, which will take advantage of bulks appends. This is a necessary step for indexing pipeline decoupling. Change-Id: Ia83811a87bcc89973d3b64d64b85a28710253ebc	11 years ago
Matt T. Proud	aaaf3367d6	Include forgotten imports. This fixes the build. Change-Id: Id132f4342adb9ed20116191086f157ca7f7cf515	11 years ago
Matt T. Proud	acf91f38bd	Build layered indexers. The indexers will be extracted in a short while and wrapped accordingly with these types. Change-Id: I4d1abda4e46117210babad5aa0d42f9ca1f6594f	11 years ago
Matt T. Proud	972e856d9b	Kill the curation state channel. The use of the channels for curation state were always unidiomatic. Change-Id: I1cb1d7175ebfb4faf28dff84201066278d6a0d92	11 years ago
Matt T. Proud	1ceb25b701	Publication of LevelDBMetricPersistence Fields. This will enable us to break down the onerous construction method. Change-Id: Ia89337ba39d6745af6757180af2485ec8a990a3b	11 years ago
Julius Volz	0003027dce	Add needed trailing spaces in logs.	11 years ago
Julius Volz	aa5d251f8d	Use github.com/golang/glog for all logging.	11 years ago
Matt T. Proud	a5141e4d0a	Depointerize storage conf. and chain ingester. The storage builders need to work with the assumption that they have a copy of the underlying configuration data if any mutations are made.	11 years ago
Matt T. Proud	820e551988	Code Review: Nits.	11 years ago
Matt T. Proud	a3bf2efdd5	Replace index writes with wrapped interface. This commit is the first of several and should not be regarded as the desired end state for these cleanups. What this one does it, however, is wrap the query index writing behind an interface type that can be injected into the storage stack and have its lifecycle managed separately as needed. It also would mean we can swap out underlying implementations to support remote indexing, buffering, no-op indexing very easily. In the future, most of the individual index interface members in the tiered storage will go away in favor of agents that can query and resolve what they need from the datastore without the user knowing how and why they work.	11 years ago
Matt T. Proud	52664f701a	Hot Fix: Use extracted time.	11 years ago
Matt T. Proud	38dac35b3e	Code Review: Short name consistency.	11 years ago
Matt T. Proud	a00f18d78b	Code Review: Manual re-alignment.	11 years ago
Matt T. Proud	cc989c68e1	Replace direct curation table access with wrapper.	11 years ago
Matt T. Proud	07ac921aec	Code Review: First pass.	11 years ago
Matt T. Proud	d8792cfd86	Extract HighWatermarking. Clean up the rest.	11 years ago
Matt T. Proud	f4669a812c	Extract index storage into separate types.	11 years ago
Matt T. Proud	772d3d6b11	Consolidate LevelDB storage construction. There are too many parameters to constructing a LevelDB storage instance for a construction method, so I've opted to take an idiomatic approach of embedding them in a struct for easier mediation and versioning.	11 years ago
Julius Volz	e3415e953f	Add notifications telemetry.	11 years ago
juliusv	927435d68e	Merge pull request #333 from prometheus/round-time Round time to nearest second in memory storage.	12 years ago
Julius Volz	5d88e8cc45	Round time to nearest second in memory storage. When samples get flushed to disk, they lose sub-second precision anyways. By already dropping sub-second precision, data fetched from memory vs. disk will behave the same. Later, we should consider also storing a more compact representation than time.Time in memory if we're not going to use its full precision.	12 years ago
Matt T. Proud	f7704af4f8	Code Review: Formatting comments.	12 years ago
Julius Volz	a76a797f3f	Always treat series without watermarks as too old. Current series always get watermarks written out upon append now. This drops support for old series without any watermarks by always reporting them as too old (stale) during queries.	12 years ago
Julius Volz	d2da21121c	Implement getValueRangeAtIntervalOp for faster range queries. This also short-circuits optimize() for now, since it is complex to implement for the new operator, and ops generated by the query layer already fulfill the needed invariants. We should still investigate later whether to completely delete operator optimization code or extend it to support getValueRangeAtIntervalOp operators.	12 years ago
Julius Volz	e7f049c85b	Fix expunging of empty memory series (loop var pointerization bug)	12 years ago
Julius Volz	baa5b07829	Fix condition for dropping empty memory series.	12 years ago
Matt T. Proud	30b1cf80b5	WIP - Snapshot of Moving to Client Model.	12 years ago
juliusv	42198c1f1c	Merge pull request #311 from prometheus/fix/watermarking/on-first-write Ensure new metrics are watermarked early.	12 years ago
Matt T. Proud	b811ccc161	Disable paranoid checks and expose max FDs option. We shouldn't need paranoid checks now. We also shouldn't need too many FDs being open due to rule evaluator hitting in-memory values stream.	12 years ago
Matt T. Proud	4137c75523	Shrink default LRU cache sizes. Observing Prometheus in production confirms we can lower these values safely.	12 years ago
Matt T. Proud	ecb9c7bb9d	Code Review: Swap ordering of elements.	12 years ago
Matt T. Proud	5daa0a09ea	Code Review: Swap ordering of watermark getting. A test for Julius.	12 years ago
Matt T. Proud	ee840904d2	Code Review: !Before -> After.	12 years ago
Matt T. Proud	2d5de99fbf	Regard in-memory series as new. This commit ensures that series that exist only in-memory and not on-disk are not regarded as too old for operation exclusion.	12 years ago
Matt T. Proud	81c406630a	Merge pull request #312 from prometheus/fix/sample-append-logging Log correct sample count when appending to disk.	12 years ago
Matt T. Proud	a1a23fbaf8	Ensure new metrics are watermarked early. With the checking of fingerprint freshness to cull stale metrics from queries, we should write watermarks early to aid in more accurate responses.	12 years ago
Julius Volz	ba8c122147	Log correct sample count when appending to disk.	12 years ago
Julius Volz	f2b4067b7b	Speedup and clean up operation optimization.	12 years ago
Julius Volz	008bc09da8	Move check for empty memory series to separate method.	12 years ago
Julius Volz	16364eda37	Drop empty series from memory after flushing.	12 years ago
Julius Volz	71199e2c93	Cache disk fingerprint->metric lookups in memory.	12 years ago
Matt T. Proud	a73f061d3c	Persist solely Protocol Buffers. An design question was open for me in the beginning was whether to serialize other types to disk, but Protocol Buffers quickly won out, which allows us to drop support for other types. This is a good start to cleaning up a lot of cruft in the storage stack and can let us eventually decouple the various moving parts into separate subsystems for easier reasoning. This commit is not strictly required, but it is a start to making the rest a lot more enjoyable to interact with.	12 years ago
juliusv	95400cb785	Merge pull request #290 from prometheus/fix/go-vet Minor "go tool vet" cleanups	12 years ago
Julius Volz	558281890b	Minor "go tool vet" cleanups	12 years ago
juliusv	615972dd01	Merge pull request #288 from prometheus/fix/curator/fallthrough-compaction-ordering Fix fallthrough compaction value ordering.	12 years ago
Matt T. Proud	86f63b078b	Fix fallthrough compaction value ordering. We discovered a regression whereby data chunks could be appended out of order if the fallthrough case was hit.	12 years ago
Julius Volz	7b9ee95030	Minor LevelDB watermark handling cleanups.	12 years ago
Julius Volz	84741b227d	Use LRU cache to avoid querying stale series.	12 years ago
Julius Volz	f98853d7b7	Fix type error in watermark list handling.	12 years ago
Matt T. Proud	ef1d5fd8a2	Introduce semaphores for tiered storage. This commit wraps the tiered storage access componnets in semaphores, since we can handle several concurrent memory reads.	12 years ago
Matt T. Proud	819045541e	Code Review: Make double-drain a panic.	12 years ago
Matt T. Proud	e217a9fb41	Race Work: Make memory arena locks more coarse. We can optimize these as needed later.	12 years ago
Matt T. Proud	beaaf386e7	Add storage state guards and transition callbacks. To ensure that we access tiered storage in the proper way, we have guards now.	12 years ago
Matt T. Proud	abb5353ade	Merge pull request #283 from prometheus/feature/storage/consult-watermark Include LRU cache for fingerprint watermarks.	12 years ago
Matt T. Proud	2c3df44af6	Ensure database access waits until it is started. This commit introduces a channel message to ensure serving state has been reached with the storage stack before anything attempts to use it.	12 years ago
Matt T. Proud	cbe2f3a7b1	Include LRU cache for fingerprint watermarks.	12 years ago
Julius Volz	51689d965d	Add debug timers to instant and range queries. This adds timers around several query-relevant code blocks. For now, the query timer stats are only logged for queries initiated through the UI. In other cases (rule evaluations), the stats are simply thrown away. My hope is that this helps us understand where queries spend time, especially in cases where they sometimes hang for unusual amounts of time.	12 years ago
Matt T. Proud	8339a189cb	Code Review: Fix seriesPresent scope. The seriesPresent scope should be constrained to the scope of a scanJob, since this is keyed to given series.	12 years ago
Matt T. Proud	fe41ce0b19	Conditionalize disk initializations. This commit conditionalizes the creation of the diskFrontier and seriesFrontier along with the iterator such that they are provisioned once something is actually required from disk.	12 years ago
Julius Volz	a8468a2e5e	Fix reversed disk flush cutoff behavior.	12 years ago
Julius Volz	eb1f956909	Revert "Revert "Ensure that all extracted samples are added to view."" This reverts commit `4b30fb86b4`.	12 years ago
Matt T. Proud	4b30fb86b4	Revert "Ensure that all extracted samples are added to view." This reverts commit `008314b5a8`. By running an automated git bisection described in https://gist.github.com/matttproud-soundcloud/22a371a8d2cba382ea64 this commit was found.	12 years ago
Julius Volz	750f862d9a	Use GetBoundaryValues() for non-counter deltas.	12 years ago
Julius Volz	f2b48b8c4a	Make getValuesAtIntervalOp consume all chunk data in one pass. This is mainly a small performance improvement, since we skip past the last extracted time immediately if it was also the last sample in the chunk, instead of trying to extract non-existent values before the chunk end again and again and only gradually approaching the end of the chunk.	12 years ago
Julius Volz	83d60bed89	extractValuesAroundTime() code simplification.	12 years ago
Julius Volz	008314b5a8	Ensure that all extracted samples are added to view. The current behavior only adds those samples to the view that are extracted by the last pass of the last processed op and throws other ones away. This is a bug. We need to append all samples that are extracted by each op pass. This also makes view.appendSamples() take an array of samples.	12 years ago
Matt T. Proud	b586801830	Code Review: Fix to-disk queue infinite growth. We discovered a bug while manually testing this branch on a live instance, whereby the to-disk queue was never actually dumped to disk.	12 years ago
Matt T. Proud	285a8b701b	Code Review: Extend lock.	12 years ago
Matt T. Proud	2526ab8c81	Code Review: Extend lock scope for appending.	12 years ago
Matt T. Proud	f994482d15	Code Review: Avenues for future improvemnet noted.	12 years ago
Matt T. Proud	298a90c143	Code Review: Initial arena size name.	12 years ago
Matt T. Proud	c07abf8521	Initial move away from skiplist.	12 years ago
Matt T. Proud	74a66fd938	Spawn grouping of fingerprints with free semaphore. The previous implementation spawned N goroutines to group samples together and would not start work until the semaphore unblocked. While this didn't leak, it polluted the scheduling space. Thusly, the routine only starts after a semaphore has been acquired.	12 years ago
Julius Volz	5b105c77fc	Repointerize fingerprints.	12 years ago
Matt T. Proud	ec5b5bae28	Fuck you, Travis.	12 years ago
Matt T. Proud	e5ac91222b	Benchmark memory arena; simplify map generation. The one-off keys have been replaced with ``model.LabelPair``, which is indexable. The performance impact is negligible, but it represents a cognitive simplification.	12 years ago
juliusv	360477f66c	Merge pull request #257 from prometheus/feature/better-memory-behaviors Pointerize memorySeriesArena.	12 years ago
Matt T. Proud	e1f20de2e9	Pointerize memorySeriesArena.	12 years ago
Matt T. Proud	8f4c7ece92	Destroy naked returns in half of corpus. The use of naked return values is frowned upon. This is the first of two bulk updates to remove them.	12 years ago
Matt T. Proud	4e0c932a4f	Simplify Encoder's encoding signature. The reality is that if we ever try to encode a Protocol Buffer and it fails, it's likely that such an error is ultimately not a runtime error and should be fixed forthwith. Thusly, we should rename ``Encoder.Encode`` to ``Encoder.MustEncode`` and drop the error return value.	12 years ago
juliusv	516101f015	Merge pull request #250 from prometheus/refactor/drop-unused-storage-setting Drop unused writeMemoryInterval	12 years ago
juliusv	9ff00b651d	Merge pull request #251 from prometheus/fix/memory-metric-mutability Fix GetMetricForFingerprint() metric mutability.	12 years ago
Bernerd Schaefer	63d9988b9c	Drop unused writeMemoryInterval	12 years ago
Julius Volz	83c60ad43a	Fix GetMetricForFingerprint() metric mutability. Some users of GetMetricForFingerprint() end up modifying the returned metric labelset. Since the memory storage's implementation of GetMetricForFingerprint() returned a pointer to the metric (and maps are reference types anyways), the external mutation propagated back into the memory storage. The fix is to make a copy of the metric before returning it.	12 years ago
Bernerd Schaefer	428d91c86f	Rename test helper files to helpers_test.go This ensures that these files are properly included only in testing.	12 years ago
juliusv	98e512d755	Merge pull request #246 from prometheus/fix/interval-value-extraction Fix and optimize getValuesAtIntervalOp data extraction.	12 years ago
Julius Volz	71a3172abb	Fix and optimize getValuesAtIntervalOp data extraction. - only the data extracted in the last loop iteration of ExtractSamples() was emitted as output - if e.g. op interval < sample interval, there were situations where the same sample was added multiple times to the output	12 years ago
Matt T. Proud	244a4a9cdb	Update to go1.1. This commit updates the documentation, Makefiles, formatting, and code semantics to support the 1.1. runtime, which includes ... 1. ``make advice``, 2. ``make format``, and 3. ``go fix`` on various targets.	12 years ago
Matt T. Proud	b224251981	Simplify compaction and expose database sizes. This commit simplifies the way that compactions across a database's keyspace occur due to reading the LevelDB internals. Secondarily it introduces the database size estimation mechanisms. Include database health and help interfaces. Add database statistics; remove status goroutines. This commit kills the use of Go routines to expose status throughout the web components of Prometheus. It also dumps raw LevelDB status on a separate /databases endpoint.	12 years ago
juliusv	92ad65ff13	Merge pull request #232 from prometheus/optimize/granular-storage-locking Synchronous memory appends and more fine-grained storage locks.	12 years ago
Matt T. Proud	1f7f89b4e3	Simplify compaction and expose database sizes. This commit simplifies the way that compactions across a database's keyspace occur due to reading the LevelDB internals. Secondarily it introduces the database size estimation mechanisms.	12 years ago
Matt T. Proud	d538b0382f	Include long-tail data deletion mechanism. This commit introduces the long-tail deletion mechanism, which will automatically cull old sample values. It is an acceptable hold-over until we get a resampling pipeline implemented. Kill legacy OS X documentation, too.	12 years ago
Julius Volz	ce1ee444f1	Synchronous memory appends and more fine-grained storage locks. This does two things: 1) Make TieredStorage.AppendSamples() write directly to memory instead of buffering to a channel first. This is needed in cases where a rule might immediately need the data generated by a previous rule. 2) Replace the single storage mutex by two new ones: - memoryMutex - needs to be locked at any time that two concurrent goroutines could be accessing (via read or write) the TieredStorage memoryArena. - memoryDeleteMutex - used to prevent any deletion of samples from memoryArena as long as renderView is running and assembling data from it. The LevelDB disk storage does not need to be protected by a mutex when rendering a view since renderView works off a LevelDB snapshot. The rationale against adding memoryMutex directly to the memory storage: taking a mutex does come with a small inherent time cost, and taking it is only required in few places. In fact, no locking is required for the memory storage instance which is part of a view (and not the TieredStorage).	12 years ago
Matt T. Proud	fa6a1f97d0	Expose interfaces for pruner and make pruner tool. In order to run database cleanups and diagnostics, we should have a means for pruning a database---even if LevelDB does this for us.	12 years ago
Matt T. Proud	161c8fbf9b	Include deletion processor for long-tail values. This commit extracts the model.Values truncation behavior into the actual tiered storage, which uses it and behaves in a peculiar way—notably the retention of previous elements if the chunk were to ever go empty. This is done to enable interpolation between sparse sample values in the evaluation cycle. Nothing necessarily new here—just an extraction. Now, the model.Values TruncateBefore functionality would do what a user would expect without any surprises, which is required for the DeletionProcessor, which may decide to split a large chunk in two if it determines that the chunk contains the cut-off time.	12 years ago
Matt Proud	7f0d816574	Schedule the background compactors to run. This commit introduces three background compactors, which compact sparse samples together. 1. Older than five minutes is grouped together into chunks of 50 every 30 minutes. 2. Older than 60 minutes is grouped together into chunks of 250 every 50 minutes. 3. Older than one day is grouped together into chunks of 5000 every 70 minutes.	12 years ago
Julius Volz	caab131ada	Repointerize TieredStorage method receiver types.	12 years ago
juliusv	89de116ea9	Merge pull request #225 from prometheus/refactor/fmt-cleanups Slice expression simplifications.	12 years ago
Julius Volz	05afa970d2	Slice expression simplifications.	12 years ago
Matt T. Proud	f897164bcf	Expose TieredStorage.DiskStorage.	12 years ago
Matt T. Proud	ce45787dbf	Storage interface to TieredStorage. This commit drops the Storage interface and just replaces it with a publicized TieredStorage type. Storage had been anticipated to be used as a wrapper for testability but just was not used due to practicality. Merely overengineered. My bad. Anyway, we will eventually instantiate the TieredStorage dependencies in main.go and pass them in for more intelligent lifecycle management. These changes will pave the way for managing the curators without Law of Demeter violations.	12 years ago
Bernerd Schaefer	5eb9840ed7	Fix goroutine leak in leveldb.AppendSamples The error channels in AppendSamples need to be buffered, since in the presence of errors their values may not be consumed.	12 years ago
Matt T. Proud	a3f1d81e24	Publicize a few storage components for curation. This commit introduces the publicization of Stop and other components, which the compaction curator shall take advantage of.	12 years ago
Matt T. Proud	4298bab2b0	Publicize Curator and Processors. This commit publicizes the curation and processor frameworks for purposes of making them available in the main processor loop.	12 years ago
Julius Volz	368a792dd2	Adjust memory queue size after change to send arrays over channel.	12 years ago
juliusv	b02debd69c	Merge pull request #205 from prometheus/julius-channel-arrays Send sample arrays instead of single samples over channels.	12 years ago
Julius Volz	d8110fcd9c	Send sample arrays instead of single samples over channels.	12 years ago
Matt T. Proud	3362bf36e2	Include curator status in web heads-up-display.	12 years ago
Matt T. Proud	6fac20c8af	Harden the tests against OOMs. This commit employs explicit memory freeing for the in-memory storage arenas. Secondarily, we take advantage of smaller channel buffer sizes in the test.	12 years ago
Matt T. Proud	66bc3711ea	Merge pull request #197 from prometheus/feature/storage/curation-table Add curation remark table and refactor error mgmt.	12 years ago
Matt T. Proud	d46cd089b5	Merge pull request #199 from prometheus/refactor/telemetry/api-refresh Refresh Prometheus client API usage.	12 years ago
Matt T. Proud	3fa260f180	Complete sentence.	12 years ago
Matt T. Proud	e527941b6a	Use tagged struct fields.	12 years ago
Matt T. Proud	a48ab34dd0	Refresh Prometheus client API usage. The client API has been updated per https://github.com/prometheus/client_golang/pull/9.	12 years ago
Matt T. Proud	561974308d	Add curation remark table and refactor error mgmt. The curator requires the existence of a curator remark table, which stores the progress for a given curation policy. The tests for the curator create an ad hoc table, but core Prometheus presently lacks said table, which this commit adds. Secondarily, the error handling for the LevelDB lifecycle functions in the metric persistence have been wrapped into an UncertaintyGroup, which mirrors some of the functions of sync.WaitGroup but adds error capturing capability to the mix.	12 years ago
Matt T. Proud	b3e34c6658	Implement batch database sample curator. This commit introduces to Prometheus a batch database sample curator, which corroborates the high watermarks for sample series against the curation watermark table to see whether a curator of a given type needs to be run. The curator is an abstract executor, which runs various curation strategies across the database. It remarks the progress for each type of curation processor that runs for a given sample series. A curation procesor is responsible for effectuating the underlying batch changes that are request. In this commit, we introduce the CompactionProcessor, which takes several bits of runtime metadata and combine sparse sample entries in the database together to form larger groups. For instance, for a given series it would be possible to have the curator effectuate the following grouping: - Samples Older than Two Weeks: Grouped into Bunches of 10000 - Samples Older than One Week: Grouped into Bunches of 1000 - Samples Older than One Day: Grouped into Bunches of 100 - Samples Older than One Hour: Grouped into Bunches of 10 The benefits hereof of such a compaction are 1. a smaller search space in the database keyspace, 2. better employment of compression for repetious values, and 3. reduced seek times.	12 years ago
Julius Volz	2202cd71c9	Track alerts over time and write out alert timeseries.	12 years ago
Johannes 'fish' Ziemke	1ad41d4c00	Call closer.Close() earlier.	12 years ago
Johannes 'fish' Ziemke	22da76e8ab	Close of reportTicker to exit goroutine.	12 years ago
Johannes 'fish' Ziemke	5043c6fce7	Have goroutine exit on signal via defer block.	12 years ago
juliusv	af7ddc36e2	Merge pull request #176 from prometheus/optimization/view-materialization/slice-chunking Truncate irrelevant chunk values.	12 years ago
Julius Volz	9b8c671ec9	Fixes/cleanups to renderView() samples truncation.	12 years ago
Matt T. Proud	05504d3642	WIP - Truncate irrelevant chunk values. This does not work with the view tests.	12 years ago
Matt T. Proud	a32602140e	Convert the TestInstant value into UTC. For the forthcoming Curator, we don't record timezone information in the samples, nor do we in the curation remarks. All times are recorded UTC. That said, for the test environment to better match production, the special instant should be in UTC.	12 years ago
Matt T. Proud	b1a8e51b07	Extract dto.SampleValueSeries into model.Values.	12 years ago
Matt T. Proud	422003da8e	Convert trailing float64s.	12 years ago
Matt T. Proud	db4ffbb262	Wrap dto.SampleKey with business logic type. The curator work can be done easier if dto.SampleKey is no longer directly accessed but rather has a higher level type around it that captures a certain modicum of business logic. This doesn't look terribly interesting today, but it will get more so.	12 years ago
Matt T. Proud	f9e99bd08a	Refresh SampleValue to 64-bit floating point. We always knew that this needed to be fixed.	12 years ago
Matt T. Proud	092c7bd88e	Stochastic test support plural SampleValueSeries. After SampleValue was refactored into SampleValueSeries, which involves plural values under a common super key, the stochastic test was never refreshed to reflect this reality. We had other tests that validated the functionality, but this one was insufficently forward-ported.	12 years ago
Julius Volz	99dcbe0f94	Integrate memory and disk layers in view rendering.	12 years ago
Julius Volz	63625bd244	Make view use memory persistence, remove obsolete code. This makes the memory persistence the backing store for views and adjusts the MetricPersistence interface accordingly. It also removes unused Get* method implementations from the LevelDB persistence so they don't need to be adapted to the new interface. In the future, we should rethink these interfaces. All staleness and interpolation handling is now removed from the storage layer and will be handled only by the query layer in the future.	12 years ago
Matt T. Proud	d468271e2f	Fix append queue telemetry and parameterize sizes. The original append queue telemetry never worked, because it was updated only upon the exit of the select statement, which would usually liberate the queues of contents. This has been fixed to be reported arbitrarily. The queue sizes are now parameterizable via flags.	12 years ago
Julius Volz	95b081f9bc	Stop serving tiered storage after draining it.	12 years ago
Matt T. Proud	a55602df4a	Validate diskFrontier domain for series candidate. It is the case with the benchmark tool that we thought that we generated multiple series and saved them to the disk as such, when in reality, we overwrote the fields of the outgoing metrics via Go map reference behavior. This was accidental. In the course of diagnosing this, a few errors were found: 1. ``newSeriesFrontier`` should check to see if the candidate fingerprint is within the given domain of the ``diskFrontier``. If not, as the contract in the docstring stipulates, a ``nil`` ``seriesFrontier`` should be emitted. 2. In the interests of aiding debugging, the raw LevelDB ``levigoIterator`` type now includes a helpful forensics ``String()`` method. This work produced additional cleanups: 1. ``Close() error`` with the storage stack is technically incorrect, since nowhere in the bowels of it does an error actually occur. The interface has been simplified to remove this for now.	12 years ago
Matt T. Proud	d79c932a8e	Merge pull request #120 from prometheus/feature/storage/compaction Spin up curator run in the tests.	12 years ago
Matt T. Proud	c3e3460ca6	Spin up curator run in the tests. After this commit, we'll need to add validations that it does the desired work, which we presently know that it doesn't. Given the changes I made with a plethora of renamings, I want to commit this now before it gets even larger.	12 years ago
Matt T. Proud	461da0b3a8	Merge pull request #117 from prometheus/feature/storage/compaction Spin up storage layers for made fixtures.	12 years ago
Matt T. Proud	d0ad6cbeaa	Spin up storage layers for made fixtures.	12 years ago
Julius Volz	c59f3fc538	Fix formatting in tiered_test.go.	12 years ago
juliusv	39826d7335	Merge pull request #107 from prometheus/julius-fix-get-fingerprints Fix bug in GetFingerprintsForLabelSet().	12 years ago
Julius Volz	2668700e54	Fix bug in GetFingerprintsForLabelSet().	12 years ago
Matt T. Proud	c53a72a894	Test data for the curator.	12 years ago
Matt T. Proud	6dcaa28806	Include LevelDB fixture generators for curator. This will help reduce common boilerplate for our test process with respect to LevelDB-related things.	12 years ago
Julius Volz	55ca65aa6e	More userfriendly output when we fail to create the tiered storage.	12 years ago
Matt T. Proud	c4e971d7d9	Merge pull request #101 from prometheus/refactor/test/directory-extraction Create temporary directory handler.	12 years ago
Matt T. Proud	b86b0ea41a	Create temporary directory handler.	12 years ago
Julius Volz	8cf2af3923	Abort view job processing on timeout.	12 years ago
Julius Volz	2b8f0b2cc7	Constantize metric name label name.	12 years ago
Julius Volz	e096896932	PR comment fixups.	12 years ago
Julius Volz	dd67ab115b	Change GetAllMetricNames() to GetAllValuesForLabel().	12 years ago
Julius Volz	42bdf921d1	Fetch integrated memory/disk data for simple Get* functions.	12 years ago
Julius Volz	11bb94a7e5	Implement GetAllMetricNames() for memory storage.	12 years ago
Julius Volz	991dc68d78	Rename misnamed oldestSampleTimestamp variable.	12 years ago
Matt T. Proud	3e97a3630d	Include nascent curator scaffolding. The curator doesn't do anything yet; rather, this is the type definition including the anciliary testing scaffold. Improve Makefile and Git developer experience. The top-level Makefile was a bit overloaded in terms of generation of assets and their management. This has been offloaded into separate Makefiles. The Git developer experience sucked due to lack of .gitignore policies. Also: Fix faulty skiplist naming from old merge.	12 years ago
Matt T. Proud	b2e4c88b80	Wrap LevelDB iterator operations behind interface. The LevelDB storage types return an interface type now that wraps around the underlying iterator. This both enhances testability but improves upon, in my opinion, the interface design for the LevelDB iterator. Secondarily, the resource reaping behaviors for the LevelDB iterators have been improved by dropping the externalized io.Closer object. Finally, the iterator provisioning methods provide the option for indicating whether one wants a snapshotted iterator or not.	12 years ago
Matt T. Proud	f2a30cf20c	Several important cleanups and deprecations. EachFunc is deprecated. Remove deprecated ``Pair`` and ``GetAll``. These were originally used for forensic and the old gorest impl. Nothing today in the user-facing path nor the tests uses them, especially since the advent of the ForEach protocol in the interface.	12 years ago
Matt T. Proud	70448711ec	Merge pull request #95 from prometheus/feature/persistence/batching Several interface cleanups.	12 years ago
Matt T. Proud	8f6b55be71	Several interface cleanups. - Kill Close in Persistent and document interface. - Extract batching behavior into interface. - Kill IteratorManager, which was used for unknown reasons.	12 years ago
Julius Volz	a33d2726bc	Mark range op as consumed if it receives no data points in range.	12 years ago
Julius Volz	3c9d6cb66c	Add several needed persistence proxy methods to tiered storage.	12 years ago
Julius Volz	081d250929	Fix view's GetRangeValues() reverse iteration behavior.	12 years ago
Julius Volz	0be0aa59c2	Wait until storage is drained before closing the underlying leveldb.	12 years ago
Julius Volz	becc278eb6	Fix two bugs in range op time advancement.	12 years ago
Matt T. Proud	ceb6611957	Fix regression in subsequent range op. compactions. We have an anomaly whereby subsequent range operations fail to be compacted into one single range operation. This fixes such behavior.	12 years ago
Matt T. Proud	669abdfefe	``make format`` invocation.	12 years ago
Julius Volz	bdb067b47f	Implement remaining View Get* methods.	12 years ago
Julius Volz	1f42364733	Fix typo in comment.	12 years ago
Matt T. Proud	758a3f0764	Add documentation and cull junk.	12 years ago
Matt T. Proud	bd8bb0edfd	One additional reduction.	12 years ago
Matt T. Proud	73b463e814	Additional simplifications.	12 years ago
Matt T. Proud	fd47ac570f	Implied simplifications.	12 years ago
Matt T. Proud	51a0f21cf8	Interim documentation	12 years ago
Matt T. Proud	b470f925b7	Extract rewriting of interval queries.	12 years ago
Matt T. Proud	eb721fd220	Include note about greediest range.	12 years ago
Julius Volz	e50de005f9	Populate metric in SampleSet returned from GetRangeValues()	12 years ago
Julius Volz	6001d22f87	Change Get* methods to receive fingerprints instead of metrics.	12 years ago
Julius Volz	95f8885c8a	Adopt new ops sorting interface in view rendering.	12 years ago
Julius Volz	4d79dc3602	Replace renderView() by cleaner and more correct reimplementation.	12 years ago
Julius Volz	e0dbc8c561	Fix edge cases in data extraction for point and interval ops.	12 years ago
Julius Volz	a4361e4116	Rename extractSampleValue -> extractSampleValues.	12 years ago
Julius Volz	4e7db57e76	Fix iterator behavior in view.GetSampleAtTime()	12 years ago
Julius Volz	bb9c5ed7aa	Fix nil pointer exception in frontier building.	12 years ago
Matt T. Proud	896e172463	Extract time group optimizations.	12 years ago
Matt T. Proud	5a71814778	Additional greediness.	12 years ago
Matt T. Proud	b00ca7e422	Refactor some greediness computations.	12 years ago
Matt T. Proud	978acd4e96	Simplify time group optimizations. The old code performed well according to the benchmarks, but the new code shaves 1/6th of the time off the original and with less code.	12 years ago
Matt T. Proud	d7b534e624	Update documentation.	12 years ago
Matt T. Proud	1f7ed52b46	Start writing high watermarks.	12 years ago
Julius Volz	a224dda9f0	Fix diskFrontier.ContainsFingerprint() return value.	12 years ago
Matt T. Proud	47ce7ad302	Extract appending from goroutine.	12 years ago
Matt T. Proud	187cd4cdbc	Extract indexing of Fingerprint to Metrics.	12 years ago
Matt T. Proud	532589f728	Extract Label Pair to Fingerprint indexing.	12 years ago
Matt T. Proud	84acfed061	Extract finding unindexed metrics.	12 years ago
Matt T. Proud	67300af137	Extract indexing to separate routine.	12 years ago
Matt T. Proud	582354f6de	Fix remaining ``make advice`` issues.	12 years ago
Matt T. Proud	615e6d13d7	Run ``make format``.	12 years ago
Julius Volz	caeb759ed7	Add tests for and fix getValuesAlongRangeOp value extraction.	12 years ago
Julius Volz	69a24427b7	Minor tiered storage fixups.	12 years ago
Julius Volz	3621148e7f	Comment out panicking test until proper support is implemented.	12 years ago
Julius Volz	e2fb497eba	Add operator value extraction tests.	12 years ago
Julius Volz	12a8863582	Add data extraction methods to operator types.	12 years ago
Julius Volz	1d5df867d1	Set test time to fixed value.	12 years ago
Julius Volz	2f06b8bea6	Fix tiered storage test to trigger iterator rewinding case.	12 years ago
Julius Volz	894ecfe161	Small cleanups and comments in tiered storage.	12 years ago
Julius Volz	f238b23b04	Set -leveldbFlushOnMutate to false by default.	12 years ago
Julius Volz	8939e0723a	Make LevelDB chunk size a flag.	12 years ago
Julius Volz	ce4f560e48	Encapsulate fingerprint frontier checks in renderView().	12 years ago
Julius Volz	1a1cba1bb2	Address outstanding PR comments.	12 years ago
Matt T. Proud	62b5d7ce20	Oops.	12 years ago
Matt T. Proud	1e0d740f2a	Conditionalize LevelDB index retrievals. The LevelDB index retrievals could be repeated in a given operation batch if multiple queued mutations affect the same (Label Name) singles and (Label Name, Label Value) doubles. This is wasteful and inefficient, as a single retrieval suffices. Thusly this commit retrieves the canonical index mappings if the said mapping has not been looked up in a given batch.	12 years ago
Matt T. Proud	34a921e16d	Checkpoint.	12 years ago
Matt T. Proud	8cc5cdde0b	checkpoint.	12 years ago
Matt T. Proud	d5380897c3	Cleanups and adds performance regression.	12 years ago
Matt T. Proud	f39b9c3c8e	Checkpoint.	12 years ago
Matt T. Proud	41068c2e84	Checkpoint.	12 years ago
Matt T. Proud	13ae29b304	Initial in-memory arena implementation. It is unbounded, and nothing uses it except for a gating flag in main.	12 years ago
Matt T. Proud	efbe0e8a12	Interface simplification. GetMetricForFingerprint(model.Fingerprint) (*Metric, error) -> GetMetricForFingerprint(model.Fingerprint) (Metric, error)	12 years ago
Matt T. Proud	f1245e8dda	Interface simplifications. GetFingerprintsForLabelName ([]*Fingerprint, error) -> GetFingerprintsForLabelName ([]Fingerprint, error)	12 years ago
Matt T. Proud	e8a733b525	Interface simplifications. GetFingerprintsForLabelSet ([]*Fingerprint, error) -> GetFingerprintsForLabelSet ([]Fingerprint, error)	12 years ago
Matt T. Proud	f03091b139	Interface simplifications: GetRangeValues From pointers to copies.	12 years ago
Matt T. Proud	14788cf4f3	Interface simplifications. GetBoundaryValues() from pointers to values.	12 years ago
Matt T. Proud	56f069b3ec	Interface simplifications: GetValueAtTime(). Pointer arguments to copies.	12 years ago
Matt T. Proud	1e2d6c7418	GetFingerprintsForLabelName simplifications. ``MetricPersistence.GetFingerprintsForLabelName(l *model.LabelName)`` -> ``MetricPersistence.GetFingerprintsForLabelName(l model.LabelName)``	12 years ago
Matt T. Proud	900bb988c1	Simplifications of GetFingerprintsForLabelSet. ``MetricPersistence.GetFingerprintsForLabelSet(s *model.LabelSet)`` -> ``MetricPersistence.GetFingerprintsForLabelSet(s model.LabelSet)``.	12 years ago
Matt T. Proud	4fbcea73f5	MetricPersistence.AppendSample signature changes. ``MetricPersistence.AppendSample(*model.Sample)`` -> ``MetricPersistence.AppendSample(model.Sample)``.	12 years ago
Matt T. Proud	4502b49524	Swap out fingerprinting infrastructure. All old database entries should be deleted. :-(	12 years ago

... 3 4 5 6 7 ...

478 Commits (9ca47869ed54832228e6891751b5b950217d63e2)