* Make head block ULIDs descriptive
As far as I understand, these ULIDs aren't persisted anywhere, so it
should be safe to change them.
When debugging an issue, seeing an ULID like
`2ZBXFNYVVFDXFPGSB1CHFNYQTZ` or `33DXR7JA39CHDKMQ9C40H6YVVF` isn't very
helpful, so I propose to make them readable in their ULID string
version.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Set a different ULID for RangeHead
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
---------
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Fix and improve ingesting exemplars for native histograms.
See code comment for a detailed explanation of the algorithm.
Note that this changes the current behavior for all kind of samples slightly: We now allow exemplars with the same timestamp as during the last scrape if the value or the labels have changed.
Also note that we now do not ingest exemplars without timestamps for native histograms anymore.
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
---------
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: zenador <zenador@users.noreply.github.com>
Co-authored-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
* Labels: reduce allocations when creating from TSDB
When reading the WAL, by passing references into the buffer we can avoid
copying strings under `-tags stringlabels`.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Fix panic during tsdb Commit
Fixes the following
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x19deb45]
goroutine 651118930 [running]:
github.com/prometheus/prometheus/tsdb.(*headAppender).Commit(0xc19100f7c0)
/drone/src/vendor/github.com/prometheus/prometheus/tsdb/head_append.go:855 +0x245
github.com/prometheus/prometheus/tsdb.dbAppender.Commit({{0x35bd6f0?, 0xc19100f7c0?}, 0xc000fa4c00?})
/drone/src/vendor/github.com/prometheus/prometheus/tsdb/db.go:1159 +0x2f
We theorize that the panic happened due the the series referenced by the
exemplar being removed between AppendExemplar and Commit due to being idle.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
When samples are committed in the head, they are also written to the WAL.
The order of WAL records should be sample then exemplar, but this was
not the case for native histogram samples. This PR fixes that.
The problem with the wrong order is that remote write reads the WAL and
sends the recorded timeseries in the WAL order, which means exemplars
arrived before histogram samples. If the receiving side is Prometheus
TSDB and the series has not existed before then the exemplar does not
currently create the series. Which means the exemplar is rejected and lost.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Drop context argument from tsdb/index.Symbols.Lookup since lookup
should be fast and the context checking is a performance hit.
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
* Remove NewPossibleNonCounterInfo until it can be made more efficient, and avoid creating empty annotations as much as possible
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
When reading the WAL this method is called with buffers from a pool, on
multiple goroutines. Pre-allocating sufficient size avoids slow growth
and many reallocations in `append`.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
On a 32 bit architecture the size of int is 32 bits. Thus converting from
int64, uint64 can overflow it and flip the sign.
Try for yourself in playground:
package main
import "fmt"
func main() {
x := int64(0x1F0000001)
y := int64(1)
z := int32(x - y) // numerically this is 0x1F0000000
fmt.Printf("%v\n", z)
}
Prints -268435456 as if x was smaller.
Followup to #12650
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* Additionally wrap WBL replay error
Although WBL replay is already wrapped with errLoadWbl,
there are other errors that can happen during a WBL replay.
We should not try to repair WAL in those cases.
This commit additionally wraps the final error in Head.Init again
with errLoadWbl so that WBL replay errors can be identified properly.
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: Jesus Vazquez <jesusvzpg@gmail.com>
Co-authored-by: Jesus Vazquez <jesusvzpg@gmail.com>
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Additionally wrap WBL replay error
Although WBL replay is already wrapped with errLoadWbl,
there are other errors that can happen during a WBL replay.
We should not try to repair WAL in those cases.
This commit additionally wraps the final error in Head.Init again
with errLoadWbl so that WBL replay errors can be identified properly.
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: Jesus Vazquez <jesusvzpg@gmail.com>
Co-authored-by: Jesus Vazquez <jesusvzpg@gmail.com>
Reverts change from https://github.com/prometheus/prometheus/pull/12906
The benchmarks show that it's slower when intersecting, which is a
common usage for ListPostings (when intersecting matchers from Head)
(old is before #12906, new is #12906):
│ old │ new │
│ sec/op │ sec/op vs base │
Intersect/LongPostings1-16 20.54µ ± 1% 21.11µ ± 1% +2.76% (p=0.000 n=20)
Intersect/LongPostings2-16 51.03m ± 1% 52.40m ± 2% +2.69% (p=0.000 n=20)
Intersect/ManyPostings-16 194.2m ± 3% 332.1m ± 1% +71.00% (p=0.000 n=20)
geomean 5.882m 7.161m +21.74%
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
It's implicit, but should be explicit. It is invalid to call At() after
a failed call to Next() or Seek().
Following up on https://github.com/prometheus/prometheus/pull/12906
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
The Next() call of ListPostings() was updating two values, while we can
just update the position. This is up to 30% faster for high number of
Postings.
goos: linux
goarch: amd64
pkg: github.com/prometheus/prometheus/tsdb/index
cpu: 11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz
│ old │ new │
│ sec/op │ sec/op vs base │
ListPostings/count=100-16 819.2n ± 0% 732.6n ± 0% -10.58% (p=0.000 n=20)
ListPostings/count=1000-16 2.685µ ± 1% 2.017µ ± 0% -24.88% (p=0.000 n=20)
ListPostings/count=10000-16 21.43µ ± 1% 14.81µ ± 0% -30.91% (p=0.000 n=20)
ListPostings/count=100000-16 209.4µ ± 1% 143.3µ ± 0% -31.55% (p=0.000 n=20)
ListPostings/count=1000000-16 2.086m ± 1% 1.436m ± 1% -31.18% (p=0.000 n=20)
geomean 29.02µ 21.41µ -26.22%
We're talking about microseconds here, but they just keep adding.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>