Python pandas idxmax for multiple indexes in a dataframe

后端 未结 3 914
小蘑菇
小蘑菇 2021-02-10 00:52

I have a series that looks like this:

            delivery
2007-04-26  706           23
2007-04-27  705           10
            706         1089
            708         


        
相关标签:
3条回答
  • 2021-02-10 01:03

    Your example code doesn't work because the idxmax is executed after the groupby operation (so on the whole dataframe)

    I'm not sure how to use idxmax on multilevel indexes, so here's a simple workaround.

    Setting up data :

    import pandas as pd
    d= {'Date': ['2007-04-26', '2007-04-27', '2007-04-27', '2007-04-27',
                 '2007-04-27', '2007-04-28', '2007-04-28'], 
            'DeliveryNb': [706, 705, 708, 450, 283, 45, 89],
            'DeliveryCount': [23, 10, 1089, 82, 34, 100, 11]}
    
    df = pd.DataFrame.from_dict(d, orient='columns').set_index('Date')
    print df
    

    output

                DeliveryCount  DeliveryNb
    Date                                 
    2007-04-26             23         706
    2007-04-27             10         705
    2007-04-27           1089         708
    2007-04-27             82         450
    2007-04-27             34         283
    2007-04-28            100          45
    2007-04-28             11          89
    

    creating custom function :

    The trick is to use the reset_index() method (so you easily get the integer index of the group)

    def func(df):
        idx = df.reset_index()['DeliveryCount'].idxmax()
        return df['DeliveryNb'].iloc[idx]
    

    applying it :

    g = df.groupby(df.index)
    g.apply(func)
    

    result :

    Date
    2007-04-26    706
    2007-04-27    708
    2007-04-28     45
    dtype: int64
    
    0 讨论(0)
  • 2021-02-10 01:20

    If you have the following dataframe (you can always reset the index if needed with : df = df.reset_index() :

      Date  Del_Count  Del_Nb
    0  1/1      14      19   <
    1           11      17
    2  2/2      25      29   <
    3           21      27
    4           22      28
    5  3/3      34      36
    6           37      37
    7           31      39   <
    

    To find the max per Date and extract the relevant Del_Count you can use:

    df = df.ix[df.groupby(['Date'], sort=False)['Del_Nb'].idxmax()][['Date','Del_Count','Del_Nb']]
    

    Which would yeild:

     Date  Del_Count  Del_Nb
    0  1/1         14      19
    2  2/2         25      29
    7  3/3         31      39
    
    0 讨论(0)
  • 2021-02-10 01:24

    Suppose you have this series:

                delivery
    2001-01-02  0           2
                1           3
                6           2
                7           2
                9           3
    2001-01-03  3           2
                6           1
                7           1
                8           3
                9           1
    dtype: int64
    

    If you want one delivery per date with the maximum value, you could use idxmax:

    dates = series.index.get_level_values(0)
    series.loc[series.groupby(dates).idxmax()]
    

    yields

                delivery
    2001-01-02  1           3
    2001-01-03  8           3
    dtype: int64
    

    If you want all deliveries per date with the maximum value, use transform to generate a boolean mask:

    mask = series.groupby(dates).transform(lambda x: x==x.max()).astype('bool')
    series.loc[mask]
    

    yields

                delivery
    2001-01-02  1           3
                9           3
    2001-01-03  8           3
    dtype: int64
    

    This is the code I used to generate series:

    import pandas as pd
    import numpy as np
    
    np.random.seed(1)
    N = 20
    rng = pd.date_range('2001-01-02', periods=N//2, freq='4H')
    rng = np.random.choice(rng, N, replace=True)
    rng.sort()
    df = pd.DataFrame(np.random.randint(10, size=(N,)), columns=['delivery'], index=rng)
    series = df.groupby([df.index.date, 'delivery']).size()
    
    0 讨论(0)
提交回复
热议问题