For each row return the column name of the largest value

前端 未结 8 2281
礼貌的吻别
礼貌的吻别 2020-11-21 07:06

I have a roster of employees, and I need to know at what department they are in most often. It is trivial to tabulate employee ID against department name, but it is trickier

8条回答
  •  -上瘾入骨i
    2020-11-21 07:35

    Based on the above suggestions, the following data.table solution worked very fast for me:

    library(data.table)
    
    set.seed(45)
    DT <- data.table(matrix(sample(10, 10^7, TRUE), ncol=10))
    
    system.time(
      DT[, col_max := colnames(.SD)[max.col(.SD, ties.method = "first")]]
    )
    #>    user  system elapsed 
    #>    0.15    0.06    0.21
    DT[]
    #>          V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 col_max
    #>       1:  7  4  1  2  3  7  6  6  6   1      V1
    #>       2:  4  6  9 10  6  2  7  7  1   3      V4
    #>       3:  3  4  9  8  9  9  8  8  6   7      V3
    #>       4:  4  8  8  9  7  5  9  2  7   1      V4
    #>       5:  4  3  9 10  2  7  9  6  6   9      V4
    #>      ---                                       
    #>  999996:  4  6 10  5  4  7  3  8  2   8      V3
    #>  999997:  8  7  6  6  3 10  2  3 10   1      V6
    #>  999998:  2  3  2  7  4  7  5  2  7   3      V4
    #>  999999:  8 10  3  2  3  4  5  1  1   4      V2
    #> 1000000: 10  4  2  6  6  2  8  4  7   4      V1
    

    And also comes with the advantage that can always specify what columns .SD should consider by mentioning them in .SDcols:

    DT[, MAX2 := colnames(.SD)[max.col(.SD, ties.method="first")], .SDcols = c("V9", "V10")]
    

    In case we need the column name of the smallest value, as suggested by @lwshang, one just needs to use -.SD:

    DT[, col_min := colnames(.SD)[max.col(-.SD, ties.method = "first")]]
    

提交回复
热议问题