prometheus

Commit Graph

Author	SHA1	Message	Date
Matthieu MOREL	e2ede285a2	refactor: move from io/ioutil to io and os packages (#10528 ) * refactor: move from io/ioutil to io and os packages * use fs.DirEntry instead of os.FileInfo after os.ReadDir Signed-off-by: MOREL Matthieu <matthieu.morel@cnp.fr>	3 years ago
Sunil Thaha	a484a83d4a	fix: panic when checkpoint directory is empty (#9687 ) Calling `wal.NewSegmentBufReader()` without any segments would cause a `panic` resulting in prometheus crashing. This patch fixes the panic by making segmentBufReader return a EOF if there are not segments. This also means an empty checkpoint directory which should never be the case unless it has been tampered with (or has issues due to the underlying filesystem e.g. NFS) would be ignored by Prometheus and would continue to run instead of the current behaviour which is to panic. Fixes: https://github.com/prometheus/prometheus/issues/9605 Signed-off-by: Sunil Thaha <sthaha@redhat.com>	3 years ago
beorn7	a1e595edac	Fix two trivial lint warnings Not sure why those show up for me locally but not if run by the CI. Signed-off-by: beorn7 <beorn@grafana.com>	3 years ago
johncming	b882d2b7c7	tsdb/wal: Avoid writing closed channel. (#9566 ) Signed-off-by: johncming <johncming@yahoo.com>	3 years ago
Mateusz Gozdek	1a6c2283a3	Format Go source files using 'gofumpt -w -s -extra' Part of #9557 Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>	3 years ago
Julien Pivotto	73255e15f6	Address golint failures from revive Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	3 years ago
Ganesh Vernekar	ee7e0071d1	Snapshot in-memory chunks on shutdown for faster restarts (#7229 ) Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	3 years ago
Levi Harrison	b5f6f8fb36	Switched to go-kit/log Signed-off-by: Levi Harrison <git@leviharrison.dev>	4 years ago
Chris Marchbanks	7c7dafc321	Do not snappy encode if record is too large (#8790 ) Snappy cannot encode records larger than ~3.7 GB and will panic if an encoding is attempted. Check to make sure that the record is smaller than this before encoding. In the future, we could improve this behavior to still compress large records (or break them up into smaller records), but this avoids the panic for users with very large single scrape targets. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	4 years ago
Chris Marchbanks	bedcd88343	Compress records before checking segment size (#8501 ) Right now a new segment might be created unnecessarily if the uncompressed record would not fit, but after compression (typically reducing record size in half) it would. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	4 years ago
Marco Pracucci	63be30dcee	Fixed WAL corruption on partial writes within a page (#8125 ) * Fixed WAL corruption on partial writes Signed-off-by: Marco Pracucci <marco@pracucci.com> * Renamed variable Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed test Signed-off-by: Marco Pracucci <marco@pracucci.com> * Addressed review comments Signed-off-by: Marco Pracucci <marco@pracucci.com>	4 years ago
Julien Pivotto	4e5b1722b3	Move away from testutil, refactor imports (#8087 ) Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	4 years ago
Ganesh Vernekar	2255b6f62f	Refactor WAL.Segments method to be part of the wal package (#6477 ) Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	4 years ago
Marek Slabicki	8224ddec23	Capitalizing first letter of all log lines (#7043 ) Signed-off-by: Marek Slabicki <thaniri@gmail.com>	5 years ago
Brad Walker	3348930df5	Replace fileutil.ReadDir with ioutil.ReadDir (#7029 ) (#7033 ) * tsdb: Replace fileutil.ReadDir with ioutil.ReadDir (#7029) Signed-off-by: Brad Walker <brad@bradmwalker.com> * tsdb: Remove fileutil.ReadDir (#7029) Signed-off-by: Brad Walker <brad@bradmwalker.com>	5 years ago
Ben Kochie	269e7c8091	Fix golint issues. Signed-off-by: Ben Kochie <superq@gmail.com>	5 years ago
Ganesh Vernekar	e64a149984	Close Head in DBReadOnly.FlushWAL (#7022 ) Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	5 years ago
zhulongcheng	e813f60fd6	tsdb: fix sequence check for WAL segments (#7032 ) Signed-off-by: zhulongcheng <zhulongcheng.dev@gmail.com>	5 years ago
李国忠	261cbab8e9	remove Unused parameter 'reg' in wal.Open function (#6941 ) Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>	5 years ago
johncming	c30abf1e2b	tsdb/wal: remove unused argument. Signed-off-by: johncming <johncming@yahoo.com>	5 years ago
John McBride	669592a2c4	Exports metric for WAL write errors (#6647 ) * Exports metric for WAL write errors Signed-off-by: John McBride <jpmmcbride@gmail.com> * Correct name for counter Signed-off-by: John McBride <jpmmcbride@gmail.com> * Move WAL write failure to wal.go Signed-off-by: John McBride <jpmmcbride@gmail.com> * WAL write fail metric moved to Log for external consumers Signed-off-by: John McBride <jpmmcbride@gmail.com>	5 years ago
Dipack P Panjabi	ce7bab04dd	Compute WAL size and use it during retention size checks (#5886 ) * Compute WAL size and account for it when applying the retention settings. Signed-off-by: Dipack P Panjabi <dpanjabi@hudson-trading.com>	5 years ago
johncming	612f9cb361	tsdb/wal: pull out wal metrics separately as tsdb.DB (#5957 ) Signed-off-by: johncming <johncming@yahoo.com>	5 years ago
Erfan Besharat	9336c01dfd	Add methods to fetch page's buf data in tsdb WAL (#5967 ) * move the WAL page buf reset in its own func Signed-off-by: Erfan Besharat <erbesharat@gmail.com>	5 years ago
陈谭军	50d453b3c3	fix-up tsdb-typo (#5954 ) Signed-off-by: chentanjun <2799194073@qq.com>	5 years ago
johncming	7d43feb03f	tsdb/wal: some small refactoring for easier reading (#5930 ) Signed-off-by: johncming <johncming@yahoo.com>	5 years ago
Ganesh Vernekar	5ecef3542d	Cleanup after merging tsdb into prometheus Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	5 years ago
Ganesh Vernekar	7cf09b0395	Moving tsdb into its own subdirectory Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	5 years ago
YaoZengzeng	104566ae53	enhancement: 1. don't flush the page if it's last fragment of the record 2. if it's last record of the bacth, flush the page after it written into the page Signed-off-by: YaoZengzeng <yaozengzeng@zju.edu.cn>	5 years ago
Krasi Georgiev	6f9bbc7253	Open db in Read only mode (#588 ) * Added db read only open mode and use it for the tsdb cli. Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>	5 years ago
Chris Marchbanks	b40cc43958	Provide option to compress WAL records (#609 ) In running Prometheus instances, compressing the records was shown to reduce disk usage by half while incurring a negligible CPU cost. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	6 years ago
beorn7	90a7612df3	Make objectives of Summaries explicit With the next release of client_golang, Summaries will not have objectives by default. As it turns out, for prometheus_tsdb_head_gc_duration_seconds and prometheus_tsdb_wal_truncate_duration_seconds, the objective-less default makes more sense then the current default. To make sure we do the right thing before and after the upcoming release of client_golang, I have set the objectives explicitly wherever that was not the case so far: - prometheus_tsdb_head_gc_duration_seconds and prometheus_tsdb_wal_truncate_duration_seconds now have no objectives explicitly. - prometheus_tsdb_wal_fsync_duration_seconds now explicitly uses the previous default objectives. Signed-off-by: beorn7 <beorn@grafana.com>	6 years ago
Brian Brazil	be4edbe174	Start a new WAL segement on head truncation. (#605 ) This reduces disk space usage to not be a minimum of 3 128MB files in small setups. This will possibly also help debug wal data issues, by making things a bit more deterministic. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	6 years ago
Callum Styan	562e93e8e6	Always create a new clean segment when starting the WAL. (#608 ) * Always create a new clean segment when starting the WAL. * Ensure we flush the last page after repairing and before recreating the new segment in Repair. Signed-off-by: Callum Styan <callumstyan@gmail.com>	6 years ago
Callum Styan	bce663e1d9	Export the current segment index as a metic. (#601 ) * Export the current segment index as a metic. Signed-off-by: Callum Styan <callumstyan@gmail.com>	6 years ago
Krasi Georgiev	96a87845cc	fix wal panic when page flush fails. (#582 ) * fix wal panic when page flush fails. New records should be added to the page only when the last flush succeeded. Otherwise the page would be full and panics when trying to add a new record. Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>	6 years ago
Krasi Georgiev	5512826f13	make Close methods for the querier safe to call more than once. (#581 ) * make Close methods for the querier safe to call more than once. Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>	6 years ago
Krasi Georgiev	8eeb70fee1	remove Fsync workaround for macos. (#574 ) since golang 1.12 no special handling is required for file.Sync() @pborzenkov thanks for the pointer. Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>	6 years ago
Goutham Veeramachaneni	10d395259b	Avoid creation of 0 sized segments. (#527 ) If the corrupt segment is full, then we set donePages on open, `c59ed492b2/wal/wal.go (L235-L243)` Then when we try to repair, we set the segment to be a new segment but we don't update the donePages: `c59ed492b2/wal/wal.go (L334)` We we try to log to this, because donePages is full, we will never log anything to this segment and create a new one: `c59ed492b2/wal/wal.go (L486)` This does not cause issues because we simply concatenate the segments on read, there by transparently skipping this `0b` segment.	6 years ago
Tom Wilkie	77d5a7d47a	LiveReader can get into an infinite loop on corrupt WALs. (#524 ) Make WAL live tailer return EOF when the there is a half-written record at the end of the file. Previously, this would cause an infinite loop as we ignored EOFs when filling the buffer. We now differentiate between EOFs that read >0 bytes, and EOFs that didn't. Add some more unit tests for tailing a corrupt WAL, and unify interfaces Reader and LiveReader for the purposes of testing. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	6 years ago
Tom Wilkie	bc3b0bd429	Test to corrupt segments mid-WAL, repair and check we can read the correct number of records. (#528 ) Test to corrupt segments mid-WAL, repair and check we can read the correct number of records. Make segmentBufReader pad short segments with zeros, and only advance curr segment index after fully reading segment.	6 years ago
Callum Styan	89ee5aaed4	clarify which segments are deleted when we find a corrupted segment (#522 ) Signed-off-by: Callum Styan <callumstyan@gmail.com>	6 years ago
Callum Styan	3929359302	add live reader for WAL (#481 ) * add live reader for WAL Signed-off-by: Callum Styan <callumstyan@gmail.com>	6 years ago
Krasi Georgiev	8d991bdc1e	Delete temp checkpoint folder on error. (#415 )	6 years ago
glutamatt	22e3aeb107	Add WALSegmentSize as an option of tsdb creation (#450 ) Expose `WALSegmentSize` option to allow overriding the `DefaultOptions.WALSegmentSize`.	6 years ago
Krasi Georgiev	2962202ed3	fix windows tests (#469 ) Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>	6 years ago
Krasi Georgiev	48efdf8b81	refactor NewSegmentsRangeReader to take multi WAL ranges (#449 ) * refactor NewSegmentsRangeReader to take multi WAL ranges In case of an error when checkpointing the WAL the error doesn't show the exact WAL index that is corrupter. this is because it uses MultiReader to read multiply WAL files. This refactoring allows the NewSegmentsRangeReader to take more than a single WAL range and it reads all of the ranges by iterating each one. this changes the logs from create checkpoint: read segments: corruption after 4841144384 bytes:... to create checkpoint: read segments: corruption in segment data/wal/00017351 at 123142208: ... Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>	6 years ago
Krasi Georgiev	0493efb7c5	repair wal when the record cannot be decoded (#453 ) * repair wal when the record cannot be decoded Currently repair is run only when the error happens in the reader. A corruption can occur after the record is read and when it is decoded. This change wraps the error at decoding as a CorruptionErr as this error is expected to trigger a repair. Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>	6 years ago
Krasi Georgiev	24520727a4	return an error when the last wal segment record is torn. (#451 ) * return an error when the last wal segment record is torn. this ensures that a repair will be run when the last record in a segment is torn. Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>	6 years ago
Krasi Georgiev	3385571ddf	buffer-panic when reading a record after recPageTerm (#429 ) Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>	6 years ago

28 Commits (ce3bc818a877b02585eb27a3d7cbfbb3502e5d88)