Update sent timestamp when write irrecoverably fails.

We have an alert that fires when prometheus_remote_storage_highest_timestamp_in_seconds - prometheus_remote_storage_queue_highest_sent_timestamp_seconds
becomes too high. But we have an agent that fires this when the remote "rate-limits" the user.

This is because prometheus_remote_storage_queue_highest_sent_timestamp_seconds doesn't get updated
when the remote sends a 429.

I think we should update the metrics, and the change I made makes sense. Because if the requests fails
because of connectivity issues, etc. we will never exit the `sendWriteRequestWithBackoff` function. It only
exits the function when there is a non-recoverable error, like a bad status code, and in that case, I think
the metric needs to be updated.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
pull/10102/head
Goutham Veeramachaneni 3 years ago
parent a961062c37
commit 1af81dc5c9
No known key found for this signature in database
GPG Key ID: F1C217E8E9023CAD

@ -1311,12 +1311,11 @@ func (s *shards) sendSamplesWithBackoff(ctx context.Context, samples []prompb.Ti
} }
err = sendWriteRequestWithBackoff(ctx, s.qm.cfg, s.qm.logger, attemptStore, onRetry) err = sendWriteRequestWithBackoff(ctx, s.qm.cfg, s.qm.logger, attemptStore, onRetry)
if err != nil {
return err
}
s.qm.metrics.sentBytesTotal.Add(float64(reqSize)) s.qm.metrics.sentBytesTotal.Add(float64(reqSize))
s.qm.metrics.highestSentTimestamp.Set(float64(highest / 1000)) s.qm.metrics.highestSentTimestamp.Set(float64(highest / 1000))
return nil
return err
} }
func sendWriteRequestWithBackoff(ctx context.Context, cfg config.QueueConfig, l log.Logger, attempt func(int) error, onRetry func()) error { func sendWriteRequestWithBackoff(ctx context.Context, cfg config.QueueConfig, l log.Logger, attempt func(int) error, onRetry func()) error {

Loading…
Cancel
Save