Subset rows corresponding to max value by group using data.table

后端 未结 1 1230
轮回少年
轮回少年 2020-11-21 15:50

Assume I have a data.table containing some baseball players:

library(plyr)
library(data.table)

bdt <- as.data.table(baseball)
<
相关标签:
1条回答
  • 2020-11-21 15:57

    Here's the fast data.table way:

    bdt[bdt[, .I[g == max(g)], by = id]$V1]
    

    This avoids constructing .SD, which is the bottleneck in your expressions.

    edit: Actually, the main reason the OP is slow is not just that it has .SD in it, but the fact that it uses it in a particular way - by calling [.data.table, which at the moment has a huge overhead, so running it in a loop (when one does a by) accumulates a very large penalty.

    0 讨论(0)
提交回复
热议问题