Groupby Pandas DataFrame and calculate mean and stdev of one column and add the std as a new column with reset_index

前端 未结 1 1430
余生分开走
余生分开走 2020-12-09 16:26

I have a Pandas DataFrame as below:

   a      b      c      d
0  Apple  3      5      7
1  Banana 4      4      8
2  Cherry 7      1      3
3  Apple  3               


        
相关标签:
1条回答
  • 2020-12-09 16:45

    You could use a groupby-agg operation:

    In [38]: result = df.groupby(['a'], as_index=False).agg(
                          {'c':['mean','std'],'b':'first', 'd':'first'})
    

    and then rename and reorder the columns:

    In [39]: result.columns = ['a','c','e','b','d']
    
    In [40]: result.reindex(columns=sorted(result.columns))
    Out[40]: 
            a  b    c  d         e
    0   Apple  3  4.5  7  0.707107
    1  Banana  4  4.0  8       NaN
    2  Cherry  7  1.0  3       NaN
    

    Pandas computes the sample std by default. To compute the population std:

    def pop_std(x):
        return x.std(ddof=0)
    
    result = df.groupby(['a'], as_index=False).agg({'c':['mean',pop_std],'b':'first', 'd':'first'})
    
    result.columns = ['a','c','e','b','d']
    result.reindex(columns=sorted(result.columns))
    

    yields

            a  b    c  d    e
    0   Apple  3  4.5  7  0.5
    1  Banana  4  4.0  8  0.0
    2  Cherry  7  1.0  3  0.0
    
    0 讨论(0)
提交回复
热议问题