I aggregate my Pandas dataframe: data
. Specifically, I want to get the average and sum amount
s by tuples of [origin
and type
]
It might be too late but anyways it might be useful for others.
Try apply function:
import numpy as np
import pandas as pd
def nan_agg(x):
res = {}
res['nansum'] = x.loc[ not x['amount'].isnull(), :]['amount'].sum()
res['nanmean'] = x.loc[ not x['amount'].isnull(), :]['amount'].mean()
return pd.Series(res, index=['nansum', 'nanmean'])
result = data.groupby(groupbyvars).apply(nan_agg).reset_index()
Use numpy's nansum and nanmean:
from numpy import nansum
from numpy import nanmean
data.groupby(groupbyvars).agg({'amount': [ nansum, nanmean]}).reset_index()
As a workaround for older version of numpy, and also a way to fix your last try:
When you do pd.Series.sum(skipna=True)
you actually call the method. If you want to use it like this you want to define a partial. So if you don't have nanmean
, let's define s_na_mean
and use that:
from functools import partial
s_na_mean = partial(pd.Series.mean, skipna = True)