Here is my code
from pandas import DataFrame, Series
import pandas as pd
import numpy as np
income = DataFrame({\'name\': [\'Adam\', \'Bill\', \'Chris\', \'Dave\
Since groupby
preserves order of rows within each group, you sort income
before groupby
. Then, pick up the firsts using head
:
grouped=income.sort('income', ascending=False).groupby([ageBin])
highestIncome = income.ix[grouped.head(1).index]
#highestIncome is no longer ordered by age.
#If you want to recover this, sort it again.
highestIncome.sort('age', inplace=True)
By the way, beware that the reference manual does not mention that groupby
will preserve the order. I think most clean solution would be fix pandas's idxmax
to work. For me, it is a little bit strange why idxmax
does not work while max
works.
Just apply a lambda function on the groups like so:
grouped.apply(lambda x: x.max())
grouped['income'].agg(lambda x : x.idxmax())
Out[]:
age
(20, 30] 1
(30, 40] NaN
(40, 50] 2
(50, 60] 4
Name: income, dtype: float64
and then you can do the following to get the data
income.ix[result.values].dropna()