prometheus

Commit Graph

Author	SHA1	Message	Date
beorn7	6fcd225aee	promql(native histograms): Introduce exponential interpolation The linear interpolation (assuming that observations are uniformly distributed within a bucket) is a solid and simple assumption in lack of any other information. However, the exponential bucketing used by standard schemas of native histograms has been chosen to cover the whole range of observations in a way that bucket populations are spread out over buckets in a reasonably way for typical distributions encountered in real-world scenarios. This is the origin of the idea implemented here: If we divide a given bucket into two (or more) smaller exponential buckets, we "most naturally" expect that the samples in the original buckets will split among those smaller buckets in a more or less uniform fashion. With this assumption, we end up with an "exponential interpolation", which therefore appears to be a better match for histograms with exponential bucketing. This commit leaves the linear interpolation in place for NHCB, but changes the interpolation for exponential native histograms to exponential. This affects `histogram_quantile` and `histogram_fraction` (because the latter is more or less the inverse of the former). The zero bucket has to be treated specially because the assumption above would lead to an "interpolation to zero" (the bucket density approaches infinity around zero, and with the postulated uniform usage of buckets, we would end up with an estimate of zero for all quantiles ending up in the zero bucket). We simply fall back to linear interpolation within the zero bucket. At the same time, this commit makes the call to stick with the assumption that the zero bucket only contains positive observations for native histograms without negative buckets (and vice versa). (This is an assumption relevant for interpolation. It is a mostly academic point, as the zero bucket is supposed to be very small anyway. However, in cases where it _is_ relevantly broad, the assumption helps a lot in practice.) This commit also updates and completes the documentation to match both details about interpolation. As a more high level note: The approach here attempts to strike a balance between a more simplistic approach without any assumption, and a more involved approach with more sophisticated assumptions. I will shortly describe both for reference: The "zero assumption" approach would be to not interpolate at all, but _always_ return the harmonic mean of the bucket boundaries of the bucket the quantile ends up in. This has the advantage of minimizing the maximum possible relative error of the quantile estimation. (Depending on the exact definition of the relative error of an estimation, there is also an argument to return the arithmetic mean of the bucket boundaries.) While limiting the maximum possible relative error is a good property, this approach would throw away the information if a quantile is closer to the upper or lower end of the population within a bucket. This can be valuable trending information in a dashboard. With any kind of interpolation, the maximum possible error of a quantile estimation increases to the full width of a bucket (i.e. it more than doubles for the harmonic mean approach, and precisely doubles for the arithmetic mean approach). However, in return the _expectation value_ of the error decreases. The increase of the theoretical maximum only has practical relevance for pathologic distributions. For example, if there are thousand observations within a bucket, they could _all_ be at the upper bound of the bucket. If the quantile calculation picks the 1st observation in the bucket as the relevant one, an interpolation will yield a value close to the lower bucket boundary, while the true quantile value is close to the upper boundary. The "fancy interpolation" approach would be one that analyses the _actual_ distribution of samples in the histogram. A lot of statistics could be applied based on the information we have available in the histogram. This would include the population of neighboring (or even all) buckets in the histogram. In general, the resolution of a native histogram should be quite high, and therefore, those "fancy" approaches would increase the computational cost quite a bit with very little practical benefits (i.e. just tiny corrections of the estimated quantile value). The results are also much harder to reason with. Signed-off-by: beorn7 <beorn@grafana.com>	2 months ago
Bryan Boreham	a1af3c27d4	refactor: extract almost.Equal() to new package To avoid a circular reference between promql and promqltest. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	7 months ago
zenador	81862aabd7	[nhcb branch] Add basic unit tests for native histograms with custom buckets converted from classic histograms (#13794 ) * modify unit test framework to automatically generate native histograms with custom buckets from classic histogram series * add very basic tests for classic histogram converted into native histogram with custom bounds * fix histogram_quantile for native histograms with custom buckets * make loading with nhcb explicit * evaluate native histograms with custom buckets on queries with explicit keyword * use regex replacer * use temp histogram struct for automatically loading converted nhcb Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com> Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>	7 months ago
machine424	f477e0539a	Move from golang.org/x/exp/slices into slices now that we only support Go >= 1.21 Prevent adding back golang.org/x/exp/slices. Signed-off-by: machine424 <ayoubmrini424@gmail.com>	9 months ago
zenador	ccfe14d7e7	PromQL: ignore small errors for bucketQuantile (#13153 ) promql: Improve histogram_quantile calculation for classic buckets Tiny differences between classic buckets are most likely caused by floating point precision issues. With this commit, relative changes below a certain threshold are ignored. This makes the result of histogram_quantile more meaningful, and also avoids triggering the _input to histogram_quantile needed to be fixed for monotonicity_ annotations in unactionable cases. This commit also adds explanation of the new adjustment and of the monotonicity annotation to the documentation of `histogram_quantile`. --------- Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>	1 year ago
Jeanette Tan	0cbf0c1c68	Revise according to code review Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>	1 year ago
Jeanette Tan	feaa93da77	Add warning when monotonicity is forced in the input to histogram_quantile Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>	1 year ago
Goutham Veeramachaneni	86729d4d7b	Update exp package (#12650 )	1 year ago
beorn7	162612ea86	histograms: Improve comment Oversight during review of #12525. Signed-off-by: beorn7 <beorn@grafana.com>	1 year ago
Ziqi Zhao	42d9169ba1	enhance histogram_quantile to get min/max value Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>	1 year ago
Bryan Boreham	ce153e3fff	Replace sort.Sort with faster slices.SortFunc The generic version is more efficient. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	1 year ago
beorn7	5b53aa1108	style: Replace `else if` cascades with `switch` Wiser coders than myself have come to the conclusion that a `switch` statement is almost always superior to a statement that includes any `else if`. The exceptions that I have found in our codebase are just these two: * The `if else` is followed by an additional statement before the next condition (separated by a `;`). * The whole thing is within a `for` loop and `break` statements are used. In this case, using `switch` would require tagging the `for` loop, which probably tips the balance. Why are `switch` statements more readable? For one, fewer curly braces. But more importantly, the conditions all have the same alignment, so the whole thing follows the natural flow of going down a list of conditions. With `else if`, in contrast, all conditions but the first are "hidden" behind `} else if `, harder to spot and (for no good reason) presented differently from the first condition. I'm sure the aforemention wise coders can list even more reasons. In any case, I like it so much that I have found myself recommending it in code reviews. I would like to make it a habit in our code base, without making it a hard requirement that we would test on the CI. But for that, there has to be a role model, so this commit eliminates all `if else` occurrences, unless it is autogenerated code or fits one of the exceptions above. Signed-off-by: beorn7 <beorn@grafana.com>	2 years ago
beorn7	c0879d64cf	promql: Separate `Point` into `FPoint` and `HPoint` In other words: Instead of having a “polymorphous” `Point` that can either contain a float value or a histogram value, use an `FPoint` for floats and an `HPoint` for histograms. This seemingly small change has a _lot_ of repercussions throughout the codebase. The idea here is to avoid the increase in size of `Point` arrays that happened after native histograms had been added. The higher-level data structures (`Sample`, `Series`, etc.) are still “polymorphous”. The same idea could be applied to them, but at each step the trade-offs needed to be evaluated. The idea with this change is to do the minimum necessary to get back to pre-histogram performance for functions that do not touch histograms. Here are comparisons for the `changes` function. The test data doesn't include histograms yet. Ideally, there would be no change in the benchmark result at all. First runtime v2.39 compared to directly prior to this commit: ``` name old time/op new time/op delta RangeQuery/expr=changes(a_one[1d]),steps=1-16 391µs ± 2% 542µs ± 1% +38.58% (p=0.000 n=9+8) RangeQuery/expr=changes(a_one[1d]),steps=10-16 452µs ± 2% 617µs ± 2% +36.48% (p=0.000 n=10+10) RangeQuery/expr=changes(a_one[1d]),steps=100-16 1.12ms ± 1% 1.36ms ± 2% +21.58% (p=0.000 n=8+10) RangeQuery/expr=changes(a_one[1d]),steps=1000-16 7.83ms ± 1% 8.94ms ± 1% +14.21% (p=0.000 n=10+10) RangeQuery/expr=changes(a_ten[1d]),steps=1-16 2.98ms ± 0% 3.30ms ± 1% +10.67% (p=0.000 n=9+10) RangeQuery/expr=changes(a_ten[1d]),steps=10-16 3.66ms ± 1% 4.10ms ± 1% +11.82% (p=0.000 n=10+10) RangeQuery/expr=changes(a_ten[1d]),steps=100-16 10.5ms ± 0% 11.8ms ± 1% +12.50% (p=0.000 n=8+10) RangeQuery/expr=changes(a_ten[1d]),steps=1000-16 77.6ms ± 1% 87.4ms ± 1% +12.63% (p=0.000 n=9+9) RangeQuery/expr=changes(a_hundred[1d]),steps=1-16 30.4ms ± 2% 32.8ms ± 1% +8.01% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=10-16 37.1ms ± 2% 40.6ms ± 2% +9.64% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=100-16 105ms ± 1% 117ms ± 1% +11.69% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=1000-16 783ms ± 3% 876ms ± 1% +11.83% (p=0.000 n=9+10) ``` And then runtime v2.39 compared to after this commit: ``` name old time/op new time/op delta RangeQuery/expr=changes(a_one[1d]),steps=1-16 391µs ± 2% 547µs ± 1% +39.84% (p=0.000 n=9+8) RangeQuery/expr=changes(a_one[1d]),steps=10-16 452µs ± 2% 616µs ± 2% +36.15% (p=0.000 n=10+10) RangeQuery/expr=changes(a_one[1d]),steps=100-16 1.12ms ± 1% 1.26ms ± 1% +12.20% (p=0.000 n=8+10) RangeQuery/expr=changes(a_one[1d]),steps=1000-16 7.83ms ± 1% 7.95ms ± 1% +1.59% (p=0.000 n=10+8) RangeQuery/expr=changes(a_ten[1d]),steps=1-16 2.98ms ± 0% 3.38ms ± 2% +13.49% (p=0.000 n=9+10) RangeQuery/expr=changes(a_ten[1d]),steps=10-16 3.66ms ± 1% 4.02ms ± 1% +9.80% (p=0.000 n=10+9) RangeQuery/expr=changes(a_ten[1d]),steps=100-16 10.5ms ± 0% 10.8ms ± 1% +3.08% (p=0.000 n=8+10) RangeQuery/expr=changes(a_ten[1d]),steps=1000-16 77.6ms ± 1% 78.1ms ± 1% +0.58% (p=0.035 n=9+10) RangeQuery/expr=changes(a_hundred[1d]),steps=1-16 30.4ms ± 2% 33.5ms ± 4% +10.18% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=10-16 37.1ms ± 2% 40.0ms ± 1% +7.98% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=100-16 105ms ± 1% 107ms ± 1% +1.92% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=1000-16 783ms ± 3% 775ms ± 1% -1.02% (p=0.019 n=9+9) ``` In summary, the runtime doesn't really improve with this change for queries with just a few steps. For queries with many steps, this commit essentially reinstates the old performance. This is good because the many-step queries are the one that matter most (longest absolute runtime). In terms of allocations, though, this commit doesn't make a dent at all (numbers not shown). The reason is that most of the allocations happen in the sampleRingIterator (in the storage package), which has to be addressed in a separate commit. Signed-off-by: beorn7 <beorn@grafana.com>	2 years ago
beorn7	bf0847073d	histogram: Modify getBound to deal properly with infinity The bucket receiving math.MaxFloat64 observations now has math.MaxFloat64 as upper bound, while the bucket after it (the last possible bucket) has +Inf. This also adds a test for getBound and moves the getBound code to generic.go (where it should have been in the first place). Signed-off-by: beorn7 <beorn@grafana.com>	2 years ago
Björn Rabenstein	dccfb9db4e	histogram: Remove code replication via generics (#11361 ) * histogram: Simplify iterators We don't really need currLower and currUpper and can calculate it when needed (as already done for the floatBucketIterator). The calculation is cheap, while keeping those extra variables around costs RAM (potentially a lot with many iterators). * histogram: Convert Bucket/FloatBucket to one generic type * histogram: Move some bucket iterator code into generic base iterator * histogram: Remove cumulative iterator for FloatHistogram We added it in the past for completeness (Histogram has one), but it has never been used. Plus, even the cumulative iterator for Histogram is only there for test reasons. We can always add it back, and then maybe even using generics. Signed-off-by: beorn7 <beorn@grafana.com>	2 years ago
beorn7	a3a8f58bb3	promql: Add histogram_fraction function Signed-off-by: beorn7 <beorn@grafana.com>	2 years ago
beorn7	ffaabea91a	promql: Refine zero bucket treatment in histogramQuantile Essentially, this mirrors the existing behavior for negative buckets: If a histogram has only negative buckets, the upper bound of the zero bucket is assumed to be zero. Furthermore, it makes sure that the zero bucket boundaries are not modified if a histogram that has no buckets at all but samples in the zero bucket. Also, add an TODO to vet if we really want this behavior. Signed-off-by: beorn7 <beorn@grafana.com>	2 years ago
beorn7	106e20cde5	Histogram: Fix and simplify histogram_quantile For conventional histograms, we need to gather all the individual bucket timeseries at a data point to do the quantile calculation. The code so far mirrored this behavior for the new native histograms. However, since a single data point contains all the buckets alreade, that's actually not needed. This PR simplifies the code while still detecting a mix of conventional and native histograms. The weird signature calculation for the conventional histograms is getting even weirder because of that. If this PR turns out to do the right thing, I will implement a proper fix for the signature calculation upstream. Signed-off-by: beorn7 <beorn@grafana.com>	3 years ago
jyz0309	e40deb1086	address comment Signed-off-by: jyz0309 <45495947@qq.com>	3 years ago
jyz0309	02e032884a	add doc Signed-off-by: jyz0309 <45495947@qq.com>	3 years ago
jyz0309	7f32a5d0d6	add NaN case Signed-off-by: jyz0309 <45495947@qq.com>	3 years ago
beorn7	947810b0f2	promql: Tweak histogramQuantile - Simplify the code a bit. - Cover more corner cases. - Remove TODO for negative buckets. (I think they are handled. Tests will reveal if not.) Signed-off-by: beorn7 <beorn@grafana.com>	3 years ago
beorn7	a6acdfe346	histograms: Doc comment and naming improvements Signed-off-by: beorn7 <beorn@grafana.com>	3 years ago
Ganesh Vernekar	4a43349aca	`histogram_quantile` for sparse histograms (#9935 ) * MergeFloatBucketIterator for []FloatBucketIterator Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * histogram_quantile for histograms Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Fix histogram_quantile Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Unit test and enhancements Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Iterators to iterate buckets in reverse and all buckets together including zero bucket Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Consider all buckets for histogram_quantile and fix the implementation Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Remove unneeded code Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Fix lint Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	3 years ago
beorn7	c954cd9d1d	Move packages out of deprecated pkg directory This creates a new `model` directory and moves all data-model related packages over there: exemplar labels relabel rulefmt textparse timestamp value All the others are more or less utilities and have been moved to `util`: gate logging modetimevfs pool runtime Signed-off-by: beorn7 <beorn@grafana.com>	3 years ago
Linas Medžiūnas	7eaffa7180	Fix off-by-one error in funcHistogramQuantile / ensureMonotonic (#7393 ) * Fix off-by-one error in funcHistogramQuantile / ensureMonotonic * Additional coverage for nonmonotonic histogram buckets Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com>	5 years ago
B++	d6374ae1b6	Return NaN for histogram_quantile when buckets have 0 observations (#7318 ) Signed-off-by: jberny <f.bernardi89@gmail.com>	5 years ago
ethan	8928094b56	func name ref correct "qauntile" -> "quantile" (#5834 ) Signed-off-by: ethan <guangming.wang@daocloud.io>	5 years ago
Brian Brazil	c66aeb3fff	In histogram_quantile merge buckets with equivalent le values (#5158 ) This makes things generally more resilient, and will help with OpenMetrics transitions (and inconsistencies). Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	6 years ago
Mario Trangoni	0e2aa35771	promql: fix unconvert issues (#4040 ) See, $ gometalinter --vendor --disable-all --enable=unconvert --deadline 6m ./... promql/engine.go:1396:26⚠️ unnecessary conversion (unconvert) promql/engine.go:1396:40⚠️ unnecessary conversion (unconvert) promql/engine.go:1398:26⚠️ unnecessary conversion (unconvert) promql/engine.go:1398:40⚠️ unnecessary conversion (unconvert) promql/engine.go:1427:26⚠️ unnecessary conversion (unconvert) promql/engine.go:1427:40⚠️ unnecessary conversion (unconvert) promql/engine.go:1429:26⚠️ unnecessary conversion (unconvert) promql/engine.go:1429:40⚠️ unnecessary conversion (unconvert) promql/engine.go:1505:50⚠️ unnecessary conversion (unconvert) promql/engine.go:1573:46⚠️ unnecessary conversion (unconvert) promql/engine.go:1578:46⚠️ unnecessary conversion (unconvert) promql/engine.go:1591:80⚠️ unnecessary conversion (unconvert) promql/engine.go:1602:94⚠️ unnecessary conversion (unconvert) promql/engine.go:1630:18⚠️ unnecessary conversion (unconvert) promql/engine.go:1631:24⚠️ unnecessary conversion (unconvert) promql/engine.go:1634:18⚠️ unnecessary conversion (unconvert) promql/engine.go:1635:34⚠️ unnecessary conversion (unconvert) promql/functions.go:302:42⚠️ unnecessary conversion (unconvert) promql/functions.go:315:42⚠️ unnecessary conversion (unconvert) promql/functions.go:334:26⚠️ unnecessary conversion (unconvert) promql/functions.go:395:31⚠️ unnecessary conversion (unconvert) promql/functions.go:406:31⚠️ unnecessary conversion (unconvert) promql/functions.go:454:27⚠️ unnecessary conversion (unconvert) promql/functions.go:701:46⚠️ unnecessary conversion (unconvert) promql/functions.go:701:78⚠️ unnecessary conversion (unconvert) promql/functions.go:730:43⚠️ unnecessary conversion (unconvert) promql/functions.go:1220:23⚠️ unnecessary conversion (unconvert) promql/functions.go:1249:23⚠️ unnecessary conversion (unconvert) promql/quantile.go:107:54⚠️ unnecessary conversion (unconvert) promql/quantile.go:182:16⚠️ unnecessary conversion (unconvert) promql/quantile.go:182:64⚠️ unnecessary conversion (unconvert) Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>	7 years ago
Jack Neely	896f951e68	Force buckets in a histogram to be monotonic for quantile estimation (#2610 ) * Force buckets in a histogram to be monotonic for quantile estimation The assumption that bucket counts increase monotonically with increasing upperBound may be violated during: * Recording rule evaluation of histogram_quantile, especially when rate() has been applied to the underlying bucket timeseries. * Evaluation of histogram_quantile computed over federated bucket timeseries, especially when rate() has been applied This is because scraped data is not made available to RR evalution or federation atomically, so some buckets are computed with data from the N most recent scrapes, but the other buckets are missing the most recent observations. Monotonicity is usually guaranteed because if a bucket with upper bound u1 has count c1, then any bucket with a higher upper bound u > u1 must have counted all c1 observations and perhaps more, so that c >= c1. Randomly interspersed partial sampling breaks that guarantee, and rate() exacerbates it. Specifically, suppose bucket le=1000 has a count of 10 from 4 samples but the bucket with le=2000 has a count of 7, from 3 samples. The monotonicity is broken. It is exacerbated by rate() because under normal operation, cumulative counting of buckets will cause the bucket counts to diverge such that small differences from missing samples are not a problem. rate() removes this divergence.) bucketQuantile depends on that monotonicity to do a binary search for the bucket with the qth percentile count, so breaking the monotonicity guarantee causes bucketQuantile() to return undefined (nonsense) results. As a somewhat hacky solution until the Prometheus project is ready to accept the changes required to make scrapes atomic, we calculate the "envelope" of the histogram buckets, essentially removing any decreases in the count between successive buckets. * Fix up comment docs for ensureMonotonic * ensureMonotonic: Use switch statement Use switch statement rather than if/else for better readability. Process the most frequent cases first.	8 years ago
Fabian Reinartz	9ea10d5265	promql: use labels.Builder to modify labels	8 years ago
Fabian Reinartz	c6cd998905	promql: use local labels, add conversion	8 years ago
Fabian Reinartz	ff504af2aa	promql: undo accidental exports	8 years ago
Fabian Reinartz	ac5d3bc05e	promql: scalar T/V and Point	8 years ago
Fabian Reinartz	a62df87022	promql: rename vector	8 years ago
Fabian Reinartz	15a931dbdb	promql: migrate model types, use tsdb interfaces	8 years ago
Brian Brazil	0303ccc6a7	Add quantile aggregator.	8 years ago
Brian Brazil	15f9fe0a45	Factor out quantile fucntion.	8 years ago
Fabian Reinartz	d6b8da8d43	Switch promql types to common/model	9 years ago
Fabian Reinartz	306e8468a0	Switch from client_golang/model to common/model	9 years ago
Brian Brazil	f34de493d5	Add increase() function, to replace delta(..., 1). This calculates how much a counter increases over a given period of time, which is the area under the curve of it's rate. increase(x[5m]) is equivilent to rate(x[5m]) * 300.	10 years ago
Fabian Reinartz	5602328c7c	Refactor query evaluation. This copies the evaluation logic from the current rules/ package. The new engine handles the execution process from query string to final result. It provides query timeout and cancellation and general flexibility for future changes. functions.go: Add evaluation implementation. Slight changes to in/out data but not to the processing logic. quantile.go: No changes. analyzer.go: No changes. engine.go: Actually new part. Mainly consists of evaluation methods which were not changed. setup_test.go: Copy of rules/helpers_test.go to setup test storage. promql_test.go: Copy of rules/rules_test.go.	10 years ago
Brian Brazil	941f585164	Avoid +InfYs and similar, just display +Inf.	10 years ago
beorn7	9e85ab0eef	Apply the new signature/fingerprinting functions from client_golang. This requires the new version of client_golang (vendoring will follow in the next commit), which changes the fingerprinting for clientmodel.Metric.	10 years ago
beorn7	17443d288b	Avoid copying of the COWMetric if we already have the metric available.	10 years ago
beorn7	9e7c3e3bcd	Add the histogram_quantile function. Since we are now getting really deep into floating point calculation, the tests had to take into account the precision loss. Since the rule tests are based on direct line matching in the output, implementing the "almost equal" semantics was pretty cumbersome, but here we are.	10 years ago

46 Commits (88818c9cb313a669ba3202fe42c2f030c4f2e52f)