Subset by group with data.table compared to aggregate a data.table

北战南征 提交于 2021-02-08 09:29:27

问题


This is a follow up question to Subset by group with data.table using the same data.table:

library(data.table)

bdt <- as.data.table(baseball)

# Aggregating and loosing information on other columns
dt1 <- bdt[ , .(max_g = max(g)), by = id]
# Aggregating and keeping information on other columns
dt2 <- bdt[bdt[, .I[g == max(g)], by = id]$V1]

Why do dt1 and dt2 differ in number of rows? Isn't dt2 supposed to have the same result just without loosing the respective information in the other columns?


回答1:


As @Frank pointed out:

bdt[ , .(max_g = max(g)), by = id] provides you with the maximum value, while

bdt[bdt[ , .I[g == max(g)], by = id]$V1] identifies all rows that have this maximum.

See What is the difference between arg max and max? for a mathematical explanation and try this slim version in R:

library(data.table)
bdt <- as.data.table(baseball)

dt <- bdt[id == "woodge01"][order(-g)]
dt[ , .(max = max(g)), by = id]
dt[ dt[ , .I[g == max(g)], by = id]$V1 ]


来源:https://stackoverflow.com/questions/42005303/subset-by-group-with-data-table-compared-to-aggregate-a-data-table

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!