Python Pandas - Quantile calculation manually

家住魔仙堡 提交于 2019-12-06 08:33:37

问题


I am trying to calculate quantile for a column values manually, but not able to find the correct quantile value manually using the formula when compared to result output from Pandas. I looked around for different solutions, but did not find the right answer

In [54]: df

Out[54]:
    data1   data2       key1    key2
0 -0.204708 1.393406    a       one
1 0.478943  0.092908    a       two
2 1.965781  1.246435    a       one

In [55]: grouped = df.groupby('key1')
In [56]: grouped['data1'].quantile(0.9)
Out[56]:
key1
a 1.668413

using the formula to find it manually,n is 3 as there are 3 values in data1 column

quantile(n+1)

applying the values of df1 column

=0.9(n+1) 
=0.9(4)
= 3.6

so 3.6th position is 1.965781, so how does pandas gives 1.668413 ?


回答1:


The function quantile will assign percentages based on the range of your data.

In your case:

  • -0.204708 would be considered the 0th percentile,
  • 0.478943 would be considered the 50th percentile and
  • 1.965781 would be considered the 100th percentile.

So you could calculate the 90th percentile the following way (using linear interpolation between the 50th and 100th percentile:

>>import numpy as np

>>x =np.array([-0.204708,1.965781,0.478943])
>>ninetieth_percentile = (x[1] - x[2])/0.5*0.4+x[2]
>>ninetieth_percentile    
1.6684133999999999

Note the values 0.5 and 0.4 come from the fact that two points of your data span 50% of the data and 0.4 represents the amount above the 50% you wish to find (0.5+0.4 = 0.9). Hope this makes sense.



来源:https://stackoverflow.com/questions/44887383/python-pandas-quantile-calculation-manually

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!