Pandas dataframe groupby to calculate population standard deviation

后端 未结 2 1707
野趣味
野趣味 2021-02-20 15:02

I am trying to use groupby and np.std to calculate a standard deviation, but it seems to be calculating a sample standard deviation (with a degrees of freedom equal to 1).

相关标签:
2条回答
  • 2021-02-20 15:26

    For degree of freedom = 0

    (This means that bins with one number will end up with std=0 instead of NaN)

    import numpy as np
    
    
    def std(x): 
        return np.std(x)
    
    
    df.groupby('A').agg(['mean', 'max', std])
    
    0 讨论(0)
  • 2021-02-20 15:40

    You can pass additional args to np.std in the agg function:

    In [202]:
    
    df.groupby('A').agg(np.std, ddof=0)
    
    Out[202]:
         B  values
    A             
    1  0.5     2.5
    2  0.5     2.5
    
    In [203]:
    
    df.groupby('A').agg(np.std, ddof=1)
    
    Out[203]:
              B    values
    A                    
    1  0.707107  3.535534
    2  0.707107  3.535534
    
    0 讨论(0)
提交回复
热议问题