Find names of top-n highest-value columns in each pandas dataframe row

后端 未结 2 862
自闭症患者
自闭症患者 2020-12-01 09:53

I have the following dataframe:

  id     p1 p2 p3 p4
  1      0  9  1  4
  2      0  2  3  4
  3      1  3 10  7
  4      1  5  3  1
  5      2  3  7 10


        
相关标签:
2条回答
  • 2020-12-01 10:21

    You can use:

    df = df.set_index('id').apply(lambda x: pd.Series(x.sort_values(ascending=False)
           .iloc[:3].index, 
          index=['top1','top2','top3']), axis=1).reset_index()
    print (df)
       id top1 top2 top3
    0   1   p2   p4   p3
    1   2   p4   p3   p2
    2   3   p3   p4   p2
    3   4   p2   p3   p4
    4   5   p4   p3   p2
    
    0 讨论(0)
  • 2020-12-01 10:37

    You could use np.argsort to find the indices of the n largest items for each row:

    import numpy as np
    import pandas as pd
    
    df = pd.DataFrame({'id': [1, 2, 3, 4, 5],
     'p1': [0, 0, 1, 1, 2],
     'p2': [9, 2, 3, 5, 3],
     'p3': [1, 3, 10, 3, 7],
     'p4': [4, 4, 7, 1, 10]})
    df = df.set_index('id')
    
    nlargest = 3
    order = np.argsort(-df.values, axis=1)[:, :nlargest]
    result = pd.DataFrame(df.columns[order], 
                          columns=['top{}'.format(i) for i in range(1, nlargest+1)],
                          index=df.index)
    
    print(result)
    

    yields

       top1 top2 top3
    id               
    1    p2   p4   p3
    2    p4   p3   p2
    3    p3   p4   p2
    4    p2   p3   p1
    5    p4   p3   p2
    
    0 讨论(0)
提交回复
热议问题