问题
I don't understand the following behavior with quantile
. With type=2
it should average at discontinuities, but this doesn't seem to happen always. If I create a list of 100 numbers and look at the percentiles, then shouldn't I take the average at every percentile? This behavior happens for some, but not for all (i.e. 7th percentile).
quantile(seq(1, 100, 1), 0.05, type=2)
# 5%
# 5.5
quantile(seq(1, 100, 1), 0.06, type=2)
# 6%
# 6.5
quantile(seq(1, 100, 1), 0.07, type=2)
# 7%
# 8
quantile(seq(1, 100, 1), 0.08, type=2)
# 8%
# 8.5
Is this related to floating point issues?
100*0.06 == 6
#TRUE
100*0.07 == 7
#FALSE
sprintf("%.20f", 100*0.07)
#"7.00000000000000088818"
回答1:
As far as I can tell, it is related to floating points as 0.07 is not exactly representable with floating points.
p <- seq(0, 0.1, by = 0.001)
q <- quantile(seq(1, 100, 1), p, type=2)
plot(p, q, type = "b")
abline(v = 0.07, col = "grey")
If you think of the quantile (type 2) as a function of p, you will never evaluate the function at exactly 0.07, hence your results.Try e.g. decreasing by
in the above. In that sense, the function returns exactly as expected. In practice with continuous data, I cannot imagine it would be of any consequence (but that is a poor argument I know).
来源:https://stackoverflow.com/questions/61393855/issue-with-quantile-type-2