given a data frame with one descriptive column and X numeric columns, for each row I\'d like to identify the top N columns with the higher values and save it as rows on a ne
Let's assume
N = 3
First of all I will create matrix of input fields and for each field remember what was original option for this cell:
matrix = [[(j, 'option' + str(i)) for j in df['option' + str(i)]] for i in range(1,6)]
The result of this line will be:
[
[(1, 'option1'), (5, 'option1'), (3, 'option1'), (7, 'option1'), (9, 'option1'), (3, 'option1')],
[(8, 'option2'), (4, 'option2'), (5, 'option2'), (6, 'option2'), (9, 'option2'), (2, 'option2')],
[(9, 'option3'), (9, 'option3'), (1, 'option3'), (3, 'option3'), (9, 'option3'), (5, 'option3')],
[(3, 'option4'), (8, 'option4'), (3, 'option4'), (5, 'option4'), (7, 'option4'), (0, 'option4')],
[(2, 'option5'), (3, 'option5'), (4, 'option5'), (9, 'option5'), (4, 'option5'), (2, 'option5')]
]
Then we can easly transform matrix using zip function, sort result rows by first element of tuple and take N first items:
transformed = [sorted(l, key=lambda x: x[0], reverse=True)[:N] for l in zip(*matrix)]
List transformed will look like:
[
[(9, 'option3'), (8, 'option2'), (3, 'option4')],
[(9, 'option3'), (8, 'option4'), (5, 'option1')],
[(5, 'option2'), (4, 'option5'), (3, 'option1')],
[(9, 'option5'), (7, 'option1'), (6, 'option2')],
[(9, 'option1'), (9, 'option2'), (9, 'option3')],
[(5, 'option3'), (3, 'option1'), (2, 'option2')]
]
The last step will be joining column index and result tuple by:
for id, top in zip(df['index'], transformed):
for option in top:
print id + ',' + option[1]
print ''