prometheus

Commit Graph

Author	SHA1	Message	Date
Oleg Zaytsev	cd7d0b69a2	Check nil err first when committing (#12625 ) The most common case is to have a nil error when appending series, so let's check that first instead of checking the 3 error types first. Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>	1 year ago
cui fliter	f26dfc95e6	fix struct name in comment (#12624 ) Signed-off-by: cui fliter <imcusg@gmail.com>	1 year ago
Łukasz Mierzwa	3c80963e81	Use a linked list for memSeries.headChunk (#11818 ) Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append() is called on memSeries it might decide that a new headChunk is needed to use for given append() call. If that happens it will first mmap existing head chunk and only after that happens it will create a new empty headChunk and continue appending our sample to it. Since appending samples uses write lock on memSeries no other read or write can happen until any append is completed. When we have an append() that must create a new head chunk the whole memSeries is blocked until mmapping of existing head chunk finishes. Mmapping itself uses a lock as it needs to be serialised, which means that the more chunks to mmap we have the longer each chunk might wait for it to be mmapped. If there's enough chunks that require mmapping some memSeries will be locked for long enough that it will start affecting queries and scrapes. Queries might timeout, since by default they have a 2 minute timeout set. Scrapes will be blocked inside append() call, which means there will be a gap between samples. This will first affect range queries or calls using rate() and such, since the time range requested in the query might have too few samples to calculate anything. To avoid this we need to remove mmapping from append path, since mmapping is blocking. But this means that when we cut a new head chunk we need to keep the old one around, so we can mmap it later. This change makes memSeries.headChunk a linked list, memSeries.headChunk still points to the 'open' head chunk that receives new samples, while older, yet to be mmapped, chunks are linked to it. Mmapping is done on a schedule by iterating all memSeries one by one. Thanks to this we control when mmapping is done, since we trigger it manually, which reduces the risk that it will have to compete for mmap locks with other chunks. Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>	1 year ago
Robert Fratto	886945cda7	tsdb/agent: ensure that new series get written to WAL on rollback (#12592 ) If a new series is introduced in a storage.Appender instance, that series should be written to the WAL once the storage.Appender is closed, even on Rollback. Previously, new series would only be written to the WAL when calling Commit. However, because the series is stored in memory regardless, subsequent calls to Commit may write samples to the WAL which reference a series ID which that was never written. Related to #11589. It's likely that this fix also resolves this issue, but we need more testing from users to see if the problem persists after this fix; there may be more cases where samples get written to the WAL in Prometheus Agent mode without the corresponding series record. Signed-off-by: Robert Fratto <robertfratto@gmail.com>	1 year ago
George Krajcsovits	6cd2d1621f	Hide histogram chunk append and reset header internals (#12352 ) tsdb: Hide histogram chunk append and reset header internals Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>	1 year ago
György Krajcsovits	d4e355243a	tsdbutil/ChunkFromSamplesGeneric should not panic Add error handling instead. Prepares for #12352 Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>	1 year ago
cui fliter	096ceca44f	remove repetitive words (#12556 ) Signed-off-by: cui fliter <imcusg@gmail.com>	1 year ago
beorn7	0e3f35324b	scrape: Enable ingestion of multiple exemplars per sample This has become a requirement for native histograms, as a single histogram sample commonly has many buckets, so that providing many exemplars makes sense. Since OM text doesn't support native histograms yet, the test had to be expanded to also support protobuf test cases. Signed-off-by: beorn7 <beorn@grafana.com>	1 year ago
Justin Lei	32d87282ad	Add Zstandard compression option for wlog (#11666 ) Snappy remains as the default compression but there is now a flag to switch the compression algorithm. Signed-off-by: Justin Lei <justin.lei@grafana.com>	1 year ago
Julien Pivotto	bf5bf1a4b3	TSDB: Remove usused import of sort Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>	1 year ago
Merrick Clay	70e41fc5ac	improve incorrect doc comment Signed-off-by: Merrick Clay <merrick.e.clay@gmail.com>	1 year ago
Bryan Boreham	ce153e3fff	Replace sort.Sort with faster slices.SortFunc The generic version is more efficient. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	1 year ago
Marc Tudurí	4851ced266	tsdb: Support native histograms in snapshot on shutdown (#12258 ) Signed-off-by: Marc Tuduri <marctc@protonmail.com>	1 year ago
Patrick Oyarzun	68e5937474	Apply relevant label matchers in LabelValues before fetching extra postings (#12274 ) * Apply matchers when fetching label values Signed-off-by: Patrick Oyarzun <patrick.oyarzun@grafana.com> * Avoid extra copying of label values Signed-off-by: Patrick Oyarzun <patrick.oyarzun@grafana.com> --------- Signed-off-by: Patrick Oyarzun <patrick.oyarzun@grafana.com>	1 year ago
Bryan Boreham	5255bf06ad	Replace sort.Slice with faster slices.SortFunc The generic version is more efficient. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	1 year ago
Marco Pracucci	35069910f5	Fix infinite loop in index Writer when a series contains duplicated label names Signed-off-by: Marco Pracucci <marco@pracucci.com>	1 year ago
Marco Pracucci	031d22df9e	Fix race condition in ChunkDiskMapper.Truncate() (#12500 ) * Fix race condition in ChunkDiskMapper.Truncate() Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added unit test Signed-off-by: Marco Pracucci <marco@pracucci.com> * Update tsdb/chunks/head_chunks.go Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com> Signed-off-by: Marco Pracucci <marco@pracucci.com> --------- Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>	1 year ago
Nidhey Nitin Indurkar	a8772a4178	Feat: Get block by id directly on promtool analyze & get latest block if ID not provided (#12031 ) * feat: analyze latest block or block by ID in CLI (promtool) Signed-off-by: nidhey27 <nidhey.indurkar@infracloud.io> * address remarks Signed-off-by: nidhey60@gmail.com <nidhey.indurkar@infracloud.io> * address latest review comments Signed-off-by: nidhey60@gmail.com <nidhey.indurkar@infracloud.io> --------- Signed-off-by: nidhey27 <nidhey.indurkar@infracloud.io> Signed-off-by: nidhey60@gmail.com <nidhey.indurkar@infracloud.io>	2 years ago
Alan Protasio	73078bf738	Opmizing Group Regex (#12375 ) Signed-off-by: Alan Protasio <alanprot@gmail.com>	2 years ago
Justin Lei	e73d8b2084	Also pass chunkOpts into appendPreprocessor Signed-off-by: Justin Lei <justin.lei@grafana.com>	2 years ago
Justin Lei	4c4454e4c9	Group args to append to memSeries in chunkOpts Signed-off-by: Justin Lei <justin.lei@grafana.com>	2 years ago
Justin Lei	89af351730	Remove samplesPerChunk from memSeries (#12390 ) Signed-off-by: Justin Lei <justin.lei@grafana.com>	2 years ago
zenador	37e5249e33	Use DefaultSamplesPerChunk in tsdb (#12387 ) Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>	2 years ago
Baskar Shanmugam	905a0bd63a	Added 'limit' query parameter support to /api/v1/status/tsdb endpoint (#12336 ) * Added 'topN' query parameter support to /api/v1/status/tsdb endpoint Signed-off-by: Baskar Shanmugam <baskar.shanmugam.career@gmail.com> * Updated query parameter for tsdb status to 'limit' Signed-off-by: Baskar Shanmugam <baskar.shanmugam.career@gmail.com> * Corrected Stats() parameter name from topN to limit Signed-off-by: Baskar Shanmugam <baskar.shanmugam.career@gmail.com> * Fixed p.Stats CI failure Signed-off-by: Baskar Shanmugam <baskar.shanmugam.career@gmail.com> --------- Signed-off-by: Baskar Shanmugam <baskar.shanmugam.career@gmail.com>	2 years ago
Alan Protasio	8c5d4b4add	Opmize MatchNotEqual (#12377 ) Signed-off-by: Alan Protasio <alanprot@gmail.com>	2 years ago
Matthieu MOREL	c8e7f95a3c	ci(lint): enable predeclared linter Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2 years ago
George Krajcsovits	92d6980360	Fix populateWithDelChunkSeriesIterator and gauge histograms (#12330 ) Use AppendableGauge to detect corrupt chunk with gauge histograms. Detect if first sample is a gauge but the chunk is not set up to contain gauge histograms. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>	2 years ago
Baskar Shanmugam	f731a90a7f	Fix LabelValueStats in posting stats (#12342 ) Problem: LabelValueStats - This will provide a list of the label names and memory used in bytes. It is calculated by adding the length of all values for a given label name. But internally Prometheus stores the name and the value independently for each series. Solution: MemPostings struct maintains the values to seriesRef map which is used to get the number of series which contains the label values. Using that LabelValueStats is calculated as: seriesCnt * len(value name) Signed-off-by: Baskar Shanmugam <baskar.shanmugam.career@gmail.com>	2 years ago
Xiaochao Dong	80b7f73d26	Copy tombstone intervals to avoid race (#12245 ) Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>	2 years ago
Callum Styan	0d2108ad79	[tsdb] re-implement WAL watcher to read via a "notification" channel (#11949 ) * WIP implement WAL watcher reading via notifications over a channel from the TSDB code Signed-off-by: Callum Styan <callumstyan@gmail.com> * Notify via head appenders Commit (finished all WAL logging) rather than on each WAL Log call Signed-off-by: Callum Styan <callumstyan@gmail.com> * Fix misspelled Notify plus add a metric for dropped Write notifications Signed-off-by: Callum Styan <callumstyan@gmail.com> * Update tests to handle new notification pattern Signed-off-by: Callum Styan <callumstyan@gmail.com> * this test maybe needs more time on windows? Signed-off-by: Callum Styan <callumstyan@gmail.com> * does this test need more time on windows as well? Signed-off-by: Callum Styan <callumstyan@gmail.com> * read timeout is already a time.Duration Signed-off-by: Callum Styan <callumstyan@gmail.com> * remove mistakenly commited benchmark data files Signed-off-by: Callum Styan <callumstyan@gmail.com> * address some review feedback Signed-off-by: Callum Styan <callumstyan@gmail.com> * fix missed changes from previous commit Signed-off-by: Callum Styan <callumstyan@gmail.com> * Fix issues from wrapper function Signed-off-by: Callum Styan <callumstyan@gmail.com> * try fixing race condition in test by allowing tests to overwrite the read ticker timeout instead of calling the Notify function Signed-off-by: Callum Styan <callumstyan@gmail.com> * fix linting Signed-off-by: Callum Styan <callumstyan@gmail.com> --------- Signed-off-by: Callum Styan <callumstyan@gmail.com>	2 years ago
György Krajcsovits	c6618729c9	Fix HistogramAppender.Appendable array out of bound error The code did not handle spans with 0 length properly. Spans with length zero are now skipped in the comparison. Span index check not done against length-1, since length is a unit32, thus subtracting 1 leads to 2^32, not -1. Fixes and unit tests for both integer and float histograms added. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>	2 years ago
Alan Protasio	c0f1abb574	MatchNotRegexp optimization Signed-off-by: Alan Protasio <alanprot@gmail.com>	2 years ago
Robert Fratto	9e4e2a4a51	wlog: use filepath for getting checkpoint number This changes usage of path to be replaced with path/filepath, allowing for filepath.Base to properly return the base directory on systems where `/` is not the standard path separator. This resolves an issue on Windows where intermediate folders containing a `.` were incorrectly considered to be a part of the checkpoint name. Related to grafana/agent#3826. Signed-off-by: Robert Fratto <robertfratto@gmail.com>	2 years ago
Bryan Boreham	0ab9553611	tsdb: drop deleted series from the WAL sooner (#12297 ) `head.deleted` holds the WAL segment in use at the time each series was removed from the head. At the end of `truncateWAL()` we will delete all segments up to `last`, so we can drop any series that were last seen in a segment at or before that point. (same change in Prometheus Agent too) Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2 years ago
cui fliter	276ca6a883	fix some comments Signed-off-by: cui fliter <imcusg@gmail.com>	2 years ago
beorn7	5b53aa1108	style: Replace `else if` cascades with `switch` Wiser coders than myself have come to the conclusion that a `switch` statement is almost always superior to a statement that includes any `else if`. The exceptions that I have found in our codebase are just these two: * The `if else` is followed by an additional statement before the next condition (separated by a `;`). * The whole thing is within a `for` loop and `break` statements are used. In this case, using `switch` would require tagging the `for` loop, which probably tips the balance. Why are `switch` statements more readable? For one, fewer curly braces. But more importantly, the conditions all have the same alignment, so the whole thing follows the natural flow of going down a list of conditions. With `else if`, in contrast, all conditions but the first are "hidden" behind `} else if `, harder to spot and (for no good reason) presented differently from the first condition. I'm sure the aforemention wise coders can list even more reasons. In any case, I like it so much that I have found myself recommending it in code reviews. I would like to make it a habit in our code base, without making it a hard requirement that we would test on the CI. But for that, there has to be a role model, so this commit eliminates all `if else` occurrences, unless it is autogenerated code or fits one of the exceptions above. Signed-off-by: beorn7 <beorn@grafana.com>	2 years ago
beorn7	c3c7d44d84	lint: Adjust to the lint warnings raised by current versions of golint-ci We haven't updated golint-ci in our CI yet, but this commit prepares for that. There are a lot of new warnings, and it is mostly because the "revive" linter got updated. I agree with most of the new warnings, mostly around not naming unused function parameters (although it is justified in some cases for documentation purposes – while things like mocks are a good example where not naming the parameter is clearer). I'm pretty upset about the "empty block" warning to include `for` loops. It's such a common pattern to do something in the head of the `for` loop and then have an empty block. There is still an open issue about this: https://github.com/mgechev/revive/issues/810 I have disabled "revive" altogether in files where empty blocks are used excessively, and I have made the effort to add individual `// nolint:revive` where empty blocks are used just once or twice. It's borderline noisy, though, but let's go with it for now. I should mention that none of the "empty block" warnings for `for` loop bodies were legitimate. Signed-off-by: beorn7 <beorn@grafana.com>	2 years ago
Đurica Yuri Nikolić	b028112331	Making the number of CPU cores used for sorting postings lists editable (#12247 ) Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>	2 years ago
Justin Lei	c3e6b85631	Reverse test changes Signed-off-by: Justin Lei <justin.lei@grafana.com>	2 years ago
Justin Lei	052993414a	Add storage.tsdb.samples-per-chunk flag Signed-off-by: Justin Lei <justin.lei@grafana.com>	2 years ago
Matthieu MOREL	fb3eb21230	enable gocritic, unconvert and unused linters Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2 years ago
beorn7	817a2396cb	Name float values as "floats", not as "values" In the past, every sample value was a float, so it was fine to call a variable holding such a float "value" or "sample". With native histograms, a sample might have a histogram value. And a histogram value is still a value. Calling a float value just "value" or "sample" or "V" is therefore misleading. Over the last few commits, I already renamed many variables, but this cleans up a few more places where the changes are more invasive. Note that we do not to attempt naming in the JSON APIs or in the protobufs. That would be quite a disruption. However, internally, we can call variables as we want, and we should go with the option of avoiding misunderstandings. Signed-off-by: beorn7 <beorn@grafana.com>	2 years ago
beorn7	630bcb494b	storage: Use separate sample types for histogram vs. float Previously, we had one “polymorphous” `sample` type in the `storage` package. This commit breaks it up into `fSample`, `hSample`, and `fhSample`, each still implementing the `tsdbutil.Sample` interface. This reduces allocations in `sampleRing.Add` but inflicts the penalty of the interface wrapper, which makes things worse in total. This commit therefore just demonstrates the step taken. The next commit will tackle the interface overhead problem. Signed-off-by: beorn7 <beorn@grafana.com>	2 years ago
Alex Le	01d0dda4fc	Rename PopulateBlockFunc to BlockPopulator Signed-off-by: Alex Le <leqiyue@amazon.com>	2 years ago
Arve Knudsen	cca7178a12	tsdb: Improve a couple of histogram documentation comments Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>	2 years ago
Justin Lei	83f43982c9	Add support for native histograms to concreteSeriesIterator Signed-off-by: Justin Lei <justin.lei@grafana.com>	2 years ago
Justin Lei	73ff91d182	Test fixes Signed-off-by: Justin Lei <justin.lei@grafana.com>	2 years ago
Justin Lei	c770ba8047	Add comment linking to PR Signed-off-by: Justin Lei <justin.lei@grafana.com>	2 years ago
Justin Lei	79db04eb12	Adjust samplesPerChunk from 120 to 220 Signed-off-by: Justin Lei <justin.lei@grafana.com>	2 years ago
Alex Le	1936868e9d	Allow populate block logic in compact to be overriden outside Prometheus (#11711 ) Signed-off-by: Alex Le <leqiyue@amazon.com> Signed-off-by: Alex Le <emoc1989@gmail.com>	2 years ago

... 2 3 4 5 6 ...

971 Commits (d1abc3f2557660728dc6a34eb2da1f32461b6665)