Finding highest values in each row in a data frame for python

狂风中的少年 提交于 2020-02-28 04:43:30

问题


I'd like to find the highest values in each row and return the column header for the value in python. For example, I'd like to find the top two in each row:

df =  
       A    B    C    D  
       5    9    8    2  
       4    1    2    3  

I'd like my for my output to look like this:

df =        
       B    C  
       A    D

回答1:


You can use a dictionary comprehension to generate the largest_n values in each row of the dataframe. I transposed the dataframe and then applied nlargest to each of the columns. I used .index.tolist() to extract the desired top_n columns. Finally, I transposed this result to get the dataframe back into the desired shape.

top_n = 2
>>> pd.DataFrame({n: df.T[col].nlargest(top_n).index.tolist() 
                  for n, col in enumerate(df.T)}).T
   0  1
0  B  C
1  A  D



回答2:


I decided to go with an alternative way: Apply the pd.Series.nlargest() function to each row.

Path to Solution

>>> df.apply(pd.Series.nlargest, axis=1, n=2)
     A    B    C    D
0  NaN  9.0  8.0  NaN
1  4.0  NaN  NaN  3.0

This gives us the highest values for each row, but keeps the original columns, resulting in ugly NaN values where a column is not everywhere part of the top n values. Actually, we want to receive the index of the nlargest() result.

>>> df.apply(lambda s, n: s.nlargest(n).index, axis=1, n=2)
0    Index(['B', 'C'], dtype='object')
1    Index(['A', 'D'], dtype='object')
dtype: object

Almost there. Only thing left is to convert the Index objects into Series.

Solution

df.apply(lambda s, n: pd.Series(s.nlargest(n).index), axis=1, n=2)
   0  1
0  B  C
1  A  D

Note that I'm not using the Index.to_series() function since I do not want to preserve the original index.



来源:https://stackoverflow.com/questions/34518634/finding-highest-values-in-each-row-in-a-data-frame-for-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!