For each row return the column name of the largest value

前端未结

关注

 8  2281

礼貌的吻别 2020-11-21 07:06

I have a roster of employees, and I need to know at what department they are in most often. It is trivial to tabulate employee ID against department name, but it is trickier

8条回答

-上瘾入骨i (楼主)

2020-11-21 07:35

Based on the above suggestions, the following data.table solution worked very fast for me:

library(data.table)

set.seed(45)
DT <- data.table(matrix(sample(10, 10^7, TRUE), ncol=10))

system.time(
  DT[, col_max := colnames(.SD)[max.col(.SD, ties.method = "first")]]
)
#>    user  system elapsed 
#>    0.15    0.06    0.21
DT[]
#>          V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 col_max
#>       1:  7  4  1  2  3  7  6  6  6   1      V1
#>       2:  4  6  9 10  6  2  7  7  1   3      V4
#>       3:  3  4  9  8  9  9  8  8  6   7      V3
#>       4:  4  8  8  9  7  5  9  2  7   1      V4
#>       5:  4  3  9 10  2  7  9  6  6   9      V4
#>      ---                                       
#>  999996:  4  6 10  5  4  7  3  8  2   8      V3
#>  999997:  8  7  6  6  3 10  2  3 10   1      V6
#>  999998:  2  3  2  7  4  7  5  2  7   3      V4
#>  999999:  8 10  3  2  3  4  5  1  1   4      V2
#> 1000000: 10  4  2  6  6  2  8  4  7   4      V1

And also comes with the advantage that can always specify what columns .SD should consider by mentioning them in .SDcols:

DT[, MAX2 := colnames(.SD)[max.col(.SD, ties.method="first")], .SDcols = c("V9", "V10")]

In case we need the column name of the smallest value, as suggested by @lwshang, one just needs to use -.SD:

DT[, col_min := colnames(.SD)[max.col(-.SD, ties.method = "first")]]

0 讨论(0)

查看其它8个回答