I understand that passing a function as a group key calls the function once per index value with the return values being used as the group names. What I can\'t figure out is how
To group by a > 1, you can define your function like:
>>> def GroupColFunc(df, ind, col):
... if df[col].loc[ind] > 1:
... return 'Group1'
... else:
... return 'Group2'
...
An then call it like
>>> people.groupby(lambda x: GroupColFunc(people, x, 'a')).sum()
a b c d e
Group2 -2.384614 -0.762208 3.359299 -1.574938 -2.65963
Or you can do it only with anonymous function:
>>> people.groupby(lambda x: 'Group1' if people['b'].loc[x] > people['a'].loc[x] else 'Group2').sum()
a b c d e
Group1 -3.280319 -0.007196 1.525356 0.324154 -1.002439
Group2 0.895705 -0.755012 1.833943 -1.899092 -1.657191
As said in documentation, you can also group by passing Series providing a label -> group name mapping:
>>> mapping = np.where(people['b'] > people['a'], 'Group1', 'Group2')
>>> mapping
Joe Group2
Steve Group1
Wes Group2
Jim Group1
Travis Group1
dtype: string48
>>> people.groupby(mapping).sum()
a b c d e
Group1 -3.280319 -0.007196 1.525356 0.324154 -1.002439
Group2 0.895705 -0.755012 1.833943 -1.899092 -1.657191