Panda dataframe conditional .mean() depending on values in certain column

前端 未结 2 1767
终归单人心
终归单人心 2020-12-31 14:08

I\'m trying to create a new column which returns the mean of values from an existing column in the same df. However the mean should be computed based on a grouping in three

相关标签:
2条回答
  • 2020-12-31 14:32

    Here's one way to do it

    In [19]: def cust_mean(grp):
       ....:     grp['mean'] = grp['option_value'].mean()
       ....:     return grp
       ....:
    
    In [20]: o2.groupby(['YEAR', 'daytype', 'hourtype']).apply(cust_mean)
    Out[20]:
       YEAR daytype hourtype  scenario  option_value       mean
    0  2015     SAT     of_h         0      0.134499  28.282946
    1  2015     SUN     of_h         1     63.019250  63.019250
    2  2015      WD     of_h         2     52.113516  52.113516
    3  2015      WD     pk_h         3     43.126513  43.126513
    4  2015     SAT     of_h         4     56.431392  28.282946
    

    So, what was going wrong with your attempt?

    It returns an aggregate with different shape from the original dataframe.

    In [21]: o2.groupby(['YEAR', 'daytype', 'hourtype'])['option_value'].mean()
    Out[21]:
    YEAR  daytype  hourtype
    2015  SAT      of_h        28.282946
          SUN      of_h        63.019250
          WD       of_h        52.113516
                   pk_h        43.126513
    Name: option_value, dtype: float64
    

    Or use transform

    In [1461]: o2['premium'] = (o2.groupby(['YEAR', 'daytype', 'hourtype'])['option_value']
                                  .transform('mean'))
    
    In [1462]: o2
    Out[1462]:
       YEAR daytype hourtype  scenario  option_value    premium
    0  2015     SAT     of_h         0      0.134499  28.282946
    1  2015     SUN     of_h         1     63.019250  63.019250
    2  2015      WD     of_h         2     52.113516  52.113516
    3  2015      WD     pk_h         3     43.126513  43.126513
    4  2015     SAT     of_h         4     56.431392  28.282946
    
    0 讨论(0)
  • 2020-12-31 14:36

    You can do it the way you intended by tweaking your code in the following way:

    o2 = o2.set_index(['YEAR', 'daytype', 'hourtype'])
    
    o2['premium'] = o2.groupby(level=['YEAR', 'daytype', 'hourtype'])['option_value'].mean()
    

    Why the original error? As explained by John Galt, the data coming out of groupby().mean() is not the same shape (length) as the original DataFrame.

    Pandas can handle this cleverly if you first start with the 'grouping columns' in the index. Then it knows how to propogate the mean data correctly.

    John's solution follows the same logic, because groupby naturally puts the grouping columns in the index during execution.

    0 讨论(0)
提交回复
热议问题