Pandas aggregation ignoring NaN's

前端 未结 2 1773
臣服心动
臣服心动 2021-02-03 10:27

I aggregate my Pandas dataframe: data. Specifically, I want to get the average and sum amounts by tuples of [origin and type]

相关标签:
2条回答
  • 2021-02-03 11:00

    It might be too late but anyways it might be useful for others.

    Try apply function:

    import numpy as np
    import pandas as pd
    
    def nan_agg(x):
        res = {}
    
        res['nansum'] = x.loc[ not x['amount'].isnull(), :]['amount'].sum()
        res['nanmean'] = x.loc[ not x['amount'].isnull(), :]['amount'].mean()
    
        return pd.Series(res, index=['nansum', 'nanmean'])
    
    result = data.groupby(groupbyvars).apply(nan_agg).reset_index() 
    
    0 讨论(0)
  • 2021-02-03 11:21

    Use numpy's nansum and nanmean:

    from numpy import nansum
    from numpy import nanmean
    data.groupby(groupbyvars).agg({'amount': [ nansum, nanmean]}).reset_index() 
    

    As a workaround for older version of numpy, and also a way to fix your last try:

    When you do pd.Series.sum(skipna=True) you actually call the method. If you want to use it like this you want to define a partial. So if you don't have nanmean, let's define s_na_mean and use that:

    from functools import partial
    s_na_mean = partial(pd.Series.mean, skipna = True)
    
    0 讨论(0)
提交回复
热议问题