Get both the top-n values and the names of columns they occur in, within each row in dataframe

后端 未结 1 1209
挽巷
挽巷 2021-01-23 10:45

I have a dataframe like in this one:

df = pd.DataFrame({\'a\':[1,2,1],\'b\':[4,6,0],\'c\':[0,4,8]})
+---+---+---+
| a | b | c |
+---+---+---+
| 1 | 4 | 0 |
+---+         


        
相关标签:
1条回答
  • 2021-01-23 11:06

    Here are two ways, both adapt from @unutbu's answer to "Find names of top-n highest-value columns in each pandas dataframe row"

    1) Use Python Decorate-Sort-Undecorate with a .apply(lambda ...) on each row to insert the column names, do the np.argsort, keep the top-n, reformat the answer. (I think this is cleaner).

    import numpy as np
    
    # Apply Decorate-Sort row-wise to our df, and slice the top-n columns within each row...
    
    sort_decr2_topn = lambda row, nlargest=2:
        sorted(pd.Series(zip(df.columns, row)), key=lambda cv: -cv[1]) [:nlargest]
    
    tmp = df.apply(sort_decr2_topn, axis=1)
    
    0    [(b, 4), (a, 1)]
    1    [(b, 6), (c, 4)]
    2    [(c, 8), (a, 1)]
    
    # then your result (as a pandas DataFrame) is...
    np.array(tmp)
    array([[('b', 4), ('a', 1)],
           [('b', 6), ('c', 4)],
           [('c', 8), ('a', 1)]], dtype=object)
    # ... or as a list of rows is
    tmp.values.tolist()
    #... and you can insert the row-indices 0,1,2 with 
    zip(tmp.index, tmp.values.tolist())
    [(0, [('b', 4), ('a', 1), ('c', 0)]), (1, [('b', 6), ('c', 4), ('a', 2)]), (2, [('c', 8), ('a', 1), ('b', 0)])]
    

    2) Get the matrix of topnlocs as follows, then use it both to reindex into df.columns, and df.values, and combine that output:

    import numpy as np
    
    nlargest = 2
    topnlocs = np.argsort(-df.values, axis=1)[:, 0:nlargest]
    # ... now you can use topnlocs to reindex both into df.columns, and df.values, then reformat/combine them somehow
    # however it's painful trying to apply that NumPy array of indices back to df or df.values,
    

    See How to get away with a multidimensional index in pandas

    0 讨论(0)
提交回复
热议问题