I have a series that looks like this:
delivery
2007-04-26 706 23
2007-04-27 705 10
706 1089
708
Your example code doesn't work because the idxmax is executed after the groupby operation (so on the whole dataframe)
I'm not sure how to use idxmax on multilevel indexes, so here's a simple workaround.
Setting up data :
import pandas as pd
d= {'Date': ['2007-04-26', '2007-04-27', '2007-04-27', '2007-04-27',
'2007-04-27', '2007-04-28', '2007-04-28'],
'DeliveryNb': [706, 705, 708, 450, 283, 45, 89],
'DeliveryCount': [23, 10, 1089, 82, 34, 100, 11]}
df = pd.DataFrame.from_dict(d, orient='columns').set_index('Date')
print df
output
DeliveryCount DeliveryNb
Date
2007-04-26 23 706
2007-04-27 10 705
2007-04-27 1089 708
2007-04-27 82 450
2007-04-27 34 283
2007-04-28 100 45
2007-04-28 11 89
creating custom function :
The trick is to use the reset_index() method (so you easily get the integer index of the group)
def func(df):
idx = df.reset_index()['DeliveryCount'].idxmax()
return df['DeliveryNb'].iloc[idx]
applying it :
g = df.groupby(df.index)
g.apply(func)
result :
Date
2007-04-26 706
2007-04-27 708
2007-04-28 45
dtype: int64
If you have the following dataframe (you can always reset the index if needed with : df = df.reset_index()
:
Date Del_Count Del_Nb
0 1/1 14 19 <
1 11 17
2 2/2 25 29 <
3 21 27
4 22 28
5 3/3 34 36
6 37 37
7 31 39 <
To find the max per Date and extract the relevant Del_Count you can use:
df = df.ix[df.groupby(['Date'], sort=False)['Del_Nb'].idxmax()][['Date','Del_Count','Del_Nb']]
Which would yeild:
Date Del_Count Del_Nb
0 1/1 14 19
2 2/2 25 29
7 3/3 31 39
Suppose you have this series:
delivery
2001-01-02 0 2
1 3
6 2
7 2
9 3
2001-01-03 3 2
6 1
7 1
8 3
9 1
dtype: int64
If you want one delivery per date with the maximum value, you could use idxmax
:
dates = series.index.get_level_values(0)
series.loc[series.groupby(dates).idxmax()]
yields
delivery
2001-01-02 1 3
2001-01-03 8 3
dtype: int64
If you want all deliveries per date with the maximum value, use transform to generate a boolean mask:
mask = series.groupby(dates).transform(lambda x: x==x.max()).astype('bool')
series.loc[mask]
yields
delivery
2001-01-02 1 3
9 3
2001-01-03 8 3
dtype: int64
This is the code I used to generate series
:
import pandas as pd
import numpy as np
np.random.seed(1)
N = 20
rng = pd.date_range('2001-01-02', periods=N//2, freq='4H')
rng = np.random.choice(rng, N, replace=True)
rng.sort()
df = pd.DataFrame(np.random.randint(10, size=(N,)), columns=['delivery'], index=rng)
series = df.groupby([df.index.date, 'delivery']).size()