问题
I have a pandas DataFrame with a MultiIndex of columns:
columns=pd.MultiIndex.from_tuples(
[(c, i) for c in ['a', 'b'] for i in range(3)])
df = pd.DataFrame(np.random.randn(4, 6),
index=[0, 0, 1, 1],
columns=columns)
print(df)
a b
0 1 2 0 1 2
0 0.582804 0.753118 -0.900950 -0.914657 -0.333091 -0.965912
0 0.498002 -0.842624 0.155783 0.559730 -0.300136 -1.211412
1 0.727019 1.522160 1.679025 1.738350 0.593361 0.411907
1 1.253759 -0.806279 -2.177582 -0.099210 -0.839822 -0.211349
I want to group by the index, and use the 'min' aggregation on the a
columns, and the 'sum' aggregation on the b
columns.
I know I can do this by creating a dict that specifies the agg function for each column:
agg_dict = {'a': 'min', 'b': 'sum'}
full_agg_dict = {(c, i): agg_dict[c] for c in ['a', 'b'] for i in range(3)}
print(df.groupby(level=0).agg(full_agg_dict))
a b
0 1 2 0 1 2
0 0.498002 -0.842624 -0.900950 -0.354927 -0.633227 -2.177324
1 0.727019 -0.806279 -2.177582 1.639140 -0.246461 0.200558
Is there a simpler way? It seems like there should be a way to do this with agg_dict
without using full_agg_dict
.
回答1:
I would use your approach as well. But here's another way that (should) work:
(df.stack(level=1)
.groupby(level=[0,1])
.agg({'a':'min','b':'sum'})
.unstack(-1)
)
For some reason groupby(level=[0,1]
doesn't work for me, so I came up with:
(df.stack(level=1)
.reset_index()
.groupby(['level_0','level_1'])
.agg({'a':'min','b':'sum'})
.unstack('level_1')
)
来源:https://stackoverflow.com/questions/57810172/pandas-groupby-can-i-select-an-agg-function-by-one-level-of-a-column-multiindex