Pandas aggregation ignoring NaN's

前端未结

关注

 2  1775

I aggregate my Pandas dataframe: data. Specifically, I want to get the average and sum amounts by tuples of [origin and type]

相关标签:

2条回答

青春惊慌失措

2021-02-03 11:00

It might be too late but anyways it might be useful for others.

Try apply function:

import numpy as np
import pandas as pd

def nan_agg(x):
    res = {}

    res['nansum'] = x.loc[ not x['amount'].isnull(), :]['amount'].sum()
    res['nanmean'] = x.loc[ not x['amount'].isnull(), :]['amount'].mean()

    return pd.Series(res, index=['nansum', 'nanmean'])

result = data.groupby(groupbyvars).apply(nan_agg).reset_index()

0 讨论(0)

孤城傲影

2021-02-03 11:21
Use numpy's nansum and nanmean:
```
from numpy import nansum
from numpy import nanmean
data.groupby(groupbyvars).agg({'amount': [ nansum, nanmean]}).reset_index() 
```
As a workaround for older version of numpy, and also a way to fix your last try:

When you do pd.Series.sum(skipna=True) you actually call the method. If you want to use it like this you want to define a partial. So if you don't have nanmean, let's define s_na_mean and use that:
```
from functools import partial
s_na_mean = partial(pd.Series.mean, skipna = True)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...