问题
This is a follow up question to Subset by group with data.table using the same data.table:
library(data.table)
bdt <- as.data.table(baseball)
# Aggregating and loosing information on other columns
dt1 <- bdt[ , .(max_g = max(g)), by = id]
# Aggregating and keeping information on other columns
dt2 <- bdt[bdt[, .I[g == max(g)], by = id]$V1]
Why do dt1
and dt2
differ in number of rows?
Isn't dt2 supposed to have the same result just without loosing the respective information in the other columns?
回答1:
As @Frank pointed out:
bdt[ , .(max_g = max(g)), by = id]
provides you with the maximum value, while
bdt[bdt[ , .I[g == max(g)], by = id]$V1]
identifies all rows that have this maximum.
See What is the difference between arg max and max? for a mathematical explanation and try this slim version in R:
library(data.table)
bdt <- as.data.table(baseball)
dt <- bdt[id == "woodge01"][order(-g)]
dt[ , .(max = max(g)), by = id]
dt[ dt[ , .I[g == max(g)], by = id]$V1 ]
来源:https://stackoverflow.com/questions/42005303/subset-by-group-with-data-table-compared-to-aggregate-a-data-table