Python pandas idxmax for multiple indexes in a dataframe

后端 未结 3 923
小蘑菇
小蘑菇 2021-02-10 00:52

I have a series that looks like this:

            delivery
2007-04-26  706           23
2007-04-27  705           10
            706         1089
            708         


        
3条回答
  •  旧时难觅i
    2021-02-10 01:24

    Suppose you have this series:

                delivery
    2001-01-02  0           2
                1           3
                6           2
                7           2
                9           3
    2001-01-03  3           2
                6           1
                7           1
                8           3
                9           1
    dtype: int64
    

    If you want one delivery per date with the maximum value, you could use idxmax:

    dates = series.index.get_level_values(0)
    series.loc[series.groupby(dates).idxmax()]
    

    yields

                delivery
    2001-01-02  1           3
    2001-01-03  8           3
    dtype: int64
    

    If you want all deliveries per date with the maximum value, use transform to generate a boolean mask:

    mask = series.groupby(dates).transform(lambda x: x==x.max()).astype('bool')
    series.loc[mask]
    

    yields

                delivery
    2001-01-02  1           3
                9           3
    2001-01-03  8           3
    dtype: int64
    

    This is the code I used to generate series:

    import pandas as pd
    import numpy as np
    
    np.random.seed(1)
    N = 20
    rng = pd.date_range('2001-01-02', periods=N//2, freq='4H')
    rng = np.random.choice(rng, N, replace=True)
    rng.sort()
    df = pd.DataFrame(np.random.randint(10, size=(N,)), columns=['delivery'], index=rng)
    series = df.groupby([df.index.date, 'delivery']).size()
    

提交回复
热议问题