Commit Graph

21 Commits (b0e1ea7c6c27f0a0c5cdfab01225f9d5e6eb589b)

Author SHA1 Message Date
beorn7 4fcc73a04c storage: Recover from corrupted indices for archived series
An unopenable archived_fingerprint_to_timerange is simply deleted and
will be rebuilt during crash recovery (wich can then take quite some time).

An unopenable archived_fingerprint_to_metric is not deleted but
instructions to the user are logged. A deletion has to be done by the
user explicitly as it means losing all archived series (and a repair
with a 3rd party tool might still be possible).
2017-04-06 19:26:39 +02:00
beorn7 0ea5801e47 Handle errors caused by data corruption more gracefully
This requires all the panic calls upon unexpected data to be converted
into errors returned. This pollute the function signatures quite
lot. Well, this is Go...

The ideas behind this are the following:

- panic only if it's a programming error. Data corruptions happen, and
  they are not programming errors.

- If we detect a data corruption, we "quarantine" the series,
  essentially removing it from the database and putting its data into
  a separate directory for forensics.

- Failure during writing to a series file is not considered corruption
  automatically. It will call setDirty, though, so that a
  crashrecovery upon the next restart will commence and check for
  that.

- Series quarantining and setDirty calls are logged and counted in
  metrics, but are hidden from the user of the interfaces in
  interface.go, whith the notable exception of Append(). The reasoning
  is that we treat corruption by removing the corrupted series, i.e. a
  query for it will return no results on its next call anyway, so
  return no results right now. In the case of Append(), we want to
  tell the user that no data has been appended, though.

Minor side effects:

- Now consistently using filepath.* instead of path.*.

- Introduced structured logging where I touched it. This makes things
  less consistent, but a complete change to structured logging would
  be out of scope for this PR.
2016-03-02 23:02:34 +01:00
Julius Volz af513468eb Fix some dead code, missing error checks, shadowings.
I applied
https://medium.com/@jgautheron/quality-pipeline-for-go-projects-497e34d6567
and was greeted with a deluge of warnings, most of which were not
applicable or really fixable realistically. These are some of the first
ones I decided to fix.
2015-09-14 12:21:34 +02:00
Fabian Reinartz c9d396f476 Replace metric.LabelPair with model.LabelPair 2015-08-22 13:32:13 +02:00
Fabian Reinartz 306e8468a0 Switch from client_golang/model to common/model 2015-08-21 13:33:38 +02:00
beorn7 699946bf32 Fix chunk desc loading.
If all samples in consecutive chunks have the same timestamp, the way
we used to load chunks will fail. With this change, the persist
watermark is used to load the right amount of chunkDescs from disk.

This bug is a possible reason for the rare storage corruption we have
observed.
2015-07-16 13:09:20 +02:00
Fabian Reinartz b105e26f4d storage: remove global flags 2015-06-15 19:01:06 +02:00
Bjoern Rabenstein ab386d1f5d Declare storage.local.index-cache-size.* default values as tweaked. 2015-01-29 13:04:54 +01:00
Bjoern Rabenstein 5859b74f1b Clean up license issues.
- Move CONTRIBUTORS.md to the more common AUTHORS.
- Added the required NOTICE file.
- Changed "Prometheus Team" to "The Prometheus Authors".
- Reverted the erroneous changes to the Apache License.
2015-01-21 20:07:45 +01:00
Bjoern Rabenstein 14bda4180c Changes after pair code review.
Change-Id: Ib72d40f8e9027818cfbbd32a7a7201eebda07455
2014-11-25 17:12:59 +01:00
Bjoern Rabenstein 904acd43da Add crash recovery.
Fix the behavior if preload for non-existent series is requested.

Instead of returning an error (which triggers a panic further up),
simply count those incidents. They can happen regularly, we just want
to know if they happen too frequently because that would mean the
indexing is behind or broken.

Change-Id: I4b2d1b93c4146eeea897d188063cb9574a270f8b
2014-11-25 17:09:43 +01:00
Bjoern Rabenstein c7aad110fb Add an indexing queue and batch the ops.
Some other improvements on the way, in particular codec -> codable
renaming and addition of LookupSet methods.

Change-Id: I978f8f3f84ca8e4d39a9d9f152ae0ad274bbf4e2
2014-11-25 17:07:44 +01:00
Bjoern Rabenstein 71206dbc06 More code cleanups.
Add license text everywhere.
And others....

Change-Id: I11ccde267a2ef7eb366c4788ba7aeae14ba7545c
2014-11-25 17:07:44 +01:00
Bjoern Rabenstein f5f9f3514a Major code cleanup.
- Make it go-vet and golint clean.
- Add comments, TODOs, etc.

Change-Id: If1392d96f3d5b4cdde597b10c8dff1769fcfabe2
2014-11-25 17:02:53 +01:00
Bjoern Rabenstein bbf49200ab Implement methods in persistence.go.
Change-Id: I804cdd0b30420e171825fd86fe1281eca0d5e638
2014-11-25 17:02:23 +01:00
Bjoern Rabenstein 5a128a04a9 Major reorganization of the storage.
Most important, the heads file will now persist all the chunk descs,
too. Implicitly, it will serve as the persisted form of the
fp-to-series map.

Change-Id: Ic867e78f2714d54c3b5733939cc5aef43f7bd08d
2014-11-25 17:02:01 +01:00
Bjoern Rabenstein 4770cf76a4 Make index package more self-contained.
Moved interna from diskPersistence into the indexer.
TotalIndexer now called diskIndexer.

Change-Id: I6c8c62cb171f12bbd8a5474773af7786d71ba388
2014-11-25 17:02:01 +01:00
Bjoern Rabenstein 89f10e8eb2 Move to using the standard library interfaces for encoding/decoding.
BinaryMarshaler instead of encodable.
BinaryUnmarshaler instead of decodable.

Left 'codable' in place for lack of a better word.

Change-Id: I8a104be7d6db916e8dbc47ff95e6ff73b845ac22
2014-11-25 17:02:01 +01:00
Bjoern Rabenstein af77d5ef0b Added a few missing implementations in index.go.
Also, added closing of persistence and mem storage.

Change-Id: Iacf0d22c3520dd2584d9546984c1f8a5ed6cd54e
2014-11-25 17:02:01 +01:00
Julius Volz cca7ebe906 Some more cleanups / obsolete code removals.
Change-Id: I584144ceeeedafdb114266d8a6d2513e67b1d010
2014-11-25 17:02:00 +01:00
Julius Volz 7e85711df0 Beginnings of a tiered index implementation.
This reintroduces a LevelDB-based metrics index.

Change-Id: I4111540301c52255a07b2f570761707a32f72c05
2014-11-25 17:02:00 +01:00