Finding top N columns for each row in data frame

后端 未结 5 2106
盖世英雄少女心
盖世英雄少女心 2021-01-05 01:24

given a data frame with one descriptive column and X numeric columns, for each row I\'d like to identify the top N columns with the higher values and save it as rows on a ne

5条回答
  •  臣服心动
    2021-01-05 01:52

    Let's assume

    N = 3
    

    First of all I will create matrix of input fields and for each field remember what was original option for this cell:

    matrix = [[(j, 'option' + str(i)) for j in df['option' + str(i)]] for i in range(1,6)]
    

    The result of this line will be:

    [
     [(1, 'option1'), (5, 'option1'), (3, 'option1'), (7, 'option1'), (9, 'option1'), (3, 'option1')],
     [(8, 'option2'), (4, 'option2'), (5, 'option2'), (6, 'option2'), (9, 'option2'), (2, 'option2')],
     [(9, 'option3'), (9, 'option3'), (1, 'option3'), (3, 'option3'), (9, 'option3'), (5, 'option3')],
     [(3, 'option4'), (8, 'option4'), (3, 'option4'), (5, 'option4'), (7, 'option4'), (0, 'option4')],
     [(2, 'option5'), (3, 'option5'), (4, 'option5'), (9, 'option5'), (4, 'option5'), (2, 'option5')]
    ]
    

    Then we can easly transform matrix using zip function, sort result rows by first element of tuple and take N first items:

    transformed = [sorted(l, key=lambda x: x[0], reverse=True)[:N] for l in zip(*matrix)]
    

    List transformed will look like:

    [
     [(9, 'option3'), (8, 'option2'), (3, 'option4')],
     [(9, 'option3'), (8, 'option4'), (5, 'option1')],
     [(5, 'option2'), (4, 'option5'), (3, 'option1')],
     [(9, 'option5'), (7, 'option1'), (6, 'option2')],
     [(9, 'option1'), (9, 'option2'), (9, 'option3')],
     [(5, 'option3'), (3, 'option1'), (2, 'option2')]
    ]
    

    The last step will be joining column index and result tuple by:

    for id, top in zip(df['index'], transformed):
        for option in top:
            print id + ',' + option[1]
        print ''
    

提交回复
热议问题