问题
I was trying to understand lower/upper percentiles calculation in pandas and got a bit confused. Here is the sample code and output for it.
test = pd.Series([7, 15, 36, 39, 40, 41])
test.describe()
output:
I am interested in only 25%, 75% percentiles. I wonder which method does pandas use to calculate them?
Referring to https://en.wikipedia.org/wiki/Quartile the article, results are different as following:
So what statistical/mathematical method does pandas uses to calculate percentile?
回答1:
As I mentioned in the comments, I finally figured out how it works by trying from pandas.core.algorithms import quantile
using quantile
function as @Abdou suggested.
I am not that good to explain it only by typing, therefore I will do it only on the given example for 25% and 75% for this example only. Here is the brief (maybe poor) explanation:
For the example list [7, 15, 36, 39, 40, 41]
quantiles are following way:
7 -> 0%
15 -> 20%
36 -> 40%
39 -> 60%
40 -> 80%
41 -> 100%
Since we want to find 25% percentile, it will be between 15 and 36, moreover, it is 20% + 5% = 15 + (36-15)/4 = 15 + 5.5 = 20.5.
(36-15)/4 is used, because the distance between 15 and 36 is 40% - 20% = 20%, so we divide it by 4 to get 5%.
The same way we can find 75%.
60% + 15% = 39 + 3*(40-39)/4 = 39.75
That's it. I am really sorry for poor explanation
回答2:
It does a [series.quantile(x) for x in percentiles]
where percentiles is percentiles = np.array([0.25, 0.5, 0.75])
if it s not provided.
You can see that in pandas/pandas/core/generic.py
So it is using : http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.quantile.html
来源:https://stackoverflow.com/questions/41744275/which-method-does-pandas-use-for-percentile