With Pandas in Python, select the highest value row for each group

后端 未结 2 672
礼貌的吻别
礼貌的吻别 2021-01-25 05:29

With Pandas, for the following data set

author1,category1,10.00
author1,category2,15.00
author1,category3,12.00
author2,category1,5.00
author2,category2,6.00
aut         


        
相关标签:
2条回答
  • 2021-01-25 06:10

    Since you want to retrieve category column as well, a standard .agg on column val won't give you what you want. (also, since there are two values in author3 are 7, the approach by @Padraic Cunningham using.max() will only return one instance instead of both) You can define a customized apply function to accomplish your task.

    import pandas as pd
    
    # your data, assume columns names are: author, cat, val
    # ===============================
    print(df)
    
    
        author        cat  val
    0  author1  category1   10
    1  author1  category2   15
    2  author1  category3   12
    3  author2  category1    5
    4  author2  category2    6
    5  author2  category3    4
    6  author2  category4    9
    7  author3  category1    7
    8  author3  category2    4
    9  author3  category3    7
    
    # processing
    # ====================================
    def func(group):
        return group.loc[group['val'] == group['val'].max()]
    
    df.groupby('author', as_index=False).apply(func).reset_index(drop=True)
    
    
        author        cat  val
    0  author1  category2   15
    1  author2  category4    9
    2  author3  category1    7
    3  author3  category3    7    
    
    0 讨论(0)
  • 2021-01-25 06:14
    import pandas as pd
    
    df = pd.read_csv("in.csv", names=("Author","Cat","Val"))
    
    print(df.groupby(['Author'])['Val'].max())
    

    To get the df:

    inds = df.groupby(['Author'])['Val'].transform(max) == df['Val']
    df = df[inds]
    df.reset_index(drop=True, inplace=True)
    print(df)
        Author        Cat  Val
    0  author1  category2   15
    1  author2  category4    9
    2  author3  category1    7
    3  author3  category3    7
    
    0 讨论(0)
提交回复
热议问题