mirror of https://github.com/prometheus/prometheus
Force buckets in a histogram to be monotonic for quantile estimation (#2610)
* Force buckets in a histogram to be monotonic for quantile estimation The assumption that bucket counts increase monotonically with increasing upperBound may be violated during: * Recording rule evaluation of histogram_quantile, especially when rate() has been applied to the underlying bucket timeseries. * Evaluation of histogram_quantile computed over federated bucket timeseries, especially when rate() has been applied This is because scraped data is not made available to RR evalution or federation atomically, so some buckets are computed with data from the N most recent scrapes, but the other buckets are missing the most recent observations. Monotonicity is usually guaranteed because if a bucket with upper bound u1 has count c1, then any bucket with a higher upper bound u > u1 must have counted all c1 observations and perhaps more, so that c >= c1. Randomly interspersed partial sampling breaks that guarantee, and rate() exacerbates it. Specifically, suppose bucket le=1000 has a count of 10 from 4 samples but the bucket with le=2000 has a count of 7, from 3 samples. The monotonicity is broken. It is exacerbated by rate() because under normal operation, cumulative counting of buckets will cause the bucket counts to diverge such that small differences from missing samples are not a problem. rate() removes this divergence.) bucketQuantile depends on that monotonicity to do a binary search for the bucket with the qth percentile count, so breaking the monotonicity guarantee causes bucketQuantile() to return undefined (nonsense) results. As a somewhat hacky solution until the Prometheus project is ready to accept the changes required to make scrapes atomic, we calculate the "envelope" of the histogram buckets, essentially removing any decreases in the count between successive buckets. * Fix up comment docs for ensureMonotonic * ensureMonotonic: Use switch statement Use switch statement rather than if/else for better readability. Process the most frequent cases first.pull/2619/head
parent
283756c503
commit
896f951e68
Loading…
Reference in new issue