Finding top N columns for each row in data frame

后端未结

关注

 5  2106

盖世英雄少女心 2021-01-05 01:24

given a data frame with one descriptive column and X numeric columns, for each row I\'d like to identify the top N columns with the higher values and save it as rows on a ne

5条回答

臣服心动 (楼主)

2021-01-05 01:52

Let's assume

N = 3

First of all I will create matrix of input fields and for each field remember what was original option for this cell:

matrix = [[(j, 'option' + str(i)) for j in df['option' + str(i)]] for i in range(1,6)]

The result of this line will be:

[
 [(1, 'option1'), (5, 'option1'), (3, 'option1'), (7, 'option1'), (9, 'option1'), (3, 'option1')],
 [(8, 'option2'), (4, 'option2'), (5, 'option2'), (6, 'option2'), (9, 'option2'), (2, 'option2')],
 [(9, 'option3'), (9, 'option3'), (1, 'option3'), (3, 'option3'), (9, 'option3'), (5, 'option3')],
 [(3, 'option4'), (8, 'option4'), (3, 'option4'), (5, 'option4'), (7, 'option4'), (0, 'option4')],
 [(2, 'option5'), (3, 'option5'), (4, 'option5'), (9, 'option5'), (4, 'option5'), (2, 'option5')]
]

Then we can easly transform matrix using zip function, sort result rows by first element of tuple and take N first items:

transformed = [sorted(l, key=lambda x: x[0], reverse=True)[:N] for l in zip(*matrix)]

List transformed will look like:

[
 [(9, 'option3'), (8, 'option2'), (3, 'option4')],
 [(9, 'option3'), (8, 'option4'), (5, 'option1')],
 [(5, 'option2'), (4, 'option5'), (3, 'option1')],
 [(9, 'option5'), (7, 'option1'), (6, 'option2')],
 [(9, 'option1'), (9, 'option2'), (9, 'option3')],
 [(5, 'option3'), (3, 'option1'), (2, 'option2')]
]

The last step will be joining column index and result tuple by:

for id, top in zip(df['index'], transformed):
    for option in top:
        print id + ',' + option[1]
    print ''

0 讨论(0)

查看其它5个回答