I am trying to calculate quantile for a column values manually, but not able to find the correct quantile value manually using the formula when compared to result output from Pandas. I looked around for different solutions, but did not find the right answer
In [54]: df
Out[54]:
data1 data2 key1 key2
0 -0.204708 1.393406 a one
1 0.478943 0.092908 a two
2 1.965781 1.246435 a one
In [55]: grouped = df.groupby('key1')
In [56]: grouped['data1'].quantile(0.9)
Out[56]:
key1
a 1.668413
using the formula to find it manually,n is 3 as there are 3 values in data1 column
quantile(n+1)
applying the values of df1 column
=0.9(n+1)
=0.9(4)
= 3.6
so 3.6th position is 1.965781, so how does pandas gives 1.668413 ?
The function quantile
will assign percentages based on the range of your data.
In your case:
- -0.204708 would be considered the 0th percentile,
- 0.478943 would be considered the 50th percentile and
- 1.965781 would be considered the 100th percentile.
So you could calculate the 90th percentile the following way (using linear interpolation between the 50th and 100th percentile:
>>import numpy as np
>>x =np.array([-0.204708,1.965781,0.478943])
>>ninetieth_percentile = (x[1] - x[2])/0.5*0.4+x[2]
>>ninetieth_percentile
1.6684133999999999
Note the values 0.5 and 0.4 come from the fact that two points of your data span 50% of the data and 0.4 represents the amount above the 50% you wish to find (0.5+0.4 = 0.9). Hope this makes sense.
来源:https://stackoverflow.com/questions/44887383/python-pandas-quantile-calculation-manually