pandas groupby: can I select an agg function by one level of a column MultiIndex?

问题

I have a pandas DataFrame with a MultiIndex of columns:

columns=pd.MultiIndex.from_tuples(
    [(c, i) for c in ['a', 'b'] for i in range(3)])
df = pd.DataFrame(np.random.randn(4, 6),
                  index=[0, 0, 1, 1],
                  columns=columns)
print(df)

          a                             b                    
          0         1         2         0         1         2
0  0.582804  0.753118 -0.900950 -0.914657 -0.333091 -0.965912
0  0.498002 -0.842624  0.155783  0.559730 -0.300136 -1.211412
1  0.727019  1.522160  1.679025  1.738350  0.593361  0.411907
1  1.253759 -0.806279 -2.177582 -0.099210 -0.839822 -0.211349

I want to group by the index, and use the 'min' aggregation on the a columns, and the 'sum' aggregation on the b columns.

I know I can do this by creating a dict that specifies the agg function for each column:

agg_dict = {'a': 'min', 'b': 'sum'}
full_agg_dict = {(c, i): agg_dict[c] for c in ['a', 'b'] for i in range(3)}
print(df.groupby(level=0).agg(full_agg_dict))

          a                             b                    
          0         1         2         0         1         2
0  0.498002 -0.842624 -0.900950 -0.354927 -0.633227 -2.177324
1  0.727019 -0.806279 -2.177582  1.639140 -0.246461  0.200558

Is there a simpler way? It seems like there should be a way to do this with agg_dict without using full_agg_dict.

回答1:

I would use your approach as well. But here's another way that (should) work:

(df.stack(level=1)
   .groupby(level=[0,1])
   .agg({'a':'min','b':'sum'})
   .unstack(-1)
)

For some reason groupby(level=[0,1] doesn't work for me, so I came up with:

(df.stack(level=1)
   .reset_index()
   .groupby(['level_0','level_1'])
   .agg({'a':'min','b':'sum'})
   .unstack('level_1')
)

来源：https://stackoverflow.com/questions/57810172/pandas-groupby-can-i-select-an-agg-function-by-one-level-of-a-column-multiindex

标签

python

pandas

pandas-groupby