问题
I am looking at this article
# TYPE prometheus_http_request_duration_seconds histogram
prometheus_http_request_duration_seconds_bucket{handler="/",le="0.1"} 25547
prometheus_http_request_duration_seconds_bucket{handler="/",le="0.2"} 26688
prometheus_http_request_duration_seconds_bucket{handler="/",le="0.4"} 27760
prometheus_http_request_duration_seconds_bucket{handler="/",le="1"} 28641
prometheus_http_request_duration_seconds_bucket{handler="/",le="3"} 28782
I am confused on why
histogram_quantile(0.9,
rate(prometheus_http_request_duration_seconds_bucket[5m])
)
doesn't give you the quantile of rate with unit observe event / second
but instead give the quantile of request duration with unit second / observe event
rate(prometheus_http_request_duration_seconds_bucket[5m]
should give you number of observe event in certain bucket / second
average over 5 minute
I would imagine histogram_quantile
would then give you the rate quantiles
I must be understanding something incorrectly
回答1:
The rate()
function is here to specify the time windows for the quantile calculation as indicated in the histogram_quantile() function. It translates as "over the last 5 minutes, what is the maximum http response time experienced by 90% of my users ?"
The histogram_quantile()
function interpolates quantile values by assuming a linear distribution within a bucket, le
giving the max time of observation. A bucket is a counter measuring the number of occurrence of observation since the start of the process. rate()
makes the link by computing the number of occurrence of observations per second (on average) from which can be interpolated the response time (on average) over the time window.
You are right that it is not a 100% accurate measure because of the average but the function is making a lot of assumptions and the choice of buckets is already introducing bias.
I guess you could use irate()
to compute the instantaneous quantiles but chances are it would be more noisy.
回答2:
This and here is the code for the historgram_quantile in prometheus.
Take an example,
assumed the original bucket is :
[50][100][150][200][200] with corresponding upperbound 5s,10s,15s,20s,+Inf.
then the rate(xx[5m]) returned a bucket like this:
[20/5*60][40/5*60][60/5*60][80/5*60][80/5*60]
histogram_quantile will delegate the returned bucket to another function bucketQuantile.
It used the rough following logic to compute the percentile:
1) get the total rank of the percentile
such as 90ile is 0.9 * total counts = 0.9 * (80/5*60)
2) compute the value of 90ile
last upperbound before the total rank position is 15 secs;
current upperbound of the total rank is 20 secs;
the count in the bucket that 90ile position belongs is (80/5*60)-(60/5*60);
the internal rank in that single bucket of 90ile position is (0.9 * 80/5*60)-(60/5*60);
finally, the value of 90ile is: 15 sec + (internal rank / that bucket count) * (20sec-15sec) = 15 + 3 * ( (0.9 * 80/5*60)-(60/5*60) / (80/5*60)-(60/5*60) ) =
15 + 3 * ( (0.9*80 - 60)/(80-60) ) = 15 + 3 * ( 12/20) = 15+3*0.6= 16.8 sec
That's it, you can see the denominator 5*60 is actually no effect in the computation. so the rate() func is just lent to specify the time window 5 minutes.
来源:https://stackoverflow.com/questions/60962520/how-to-get-the-quantile-of-rate-in-prometheus