In a dataset with multiple observations for each subject I want to take a subset with only the maximum data value for each record. For example, with a following dataset:
One more base R solution:
merge(aggregate(pt ~ Subject, max, data = group), group)
Subject pt Event
1 1 5 2
2 2 17 2
3 3 5 2
Another option is slice
library(dplyr)
group %>%
group_by(Subject) %>%
slice(which.max(pt))
# Subject pt Event
# <dbl> <dbl> <dbl>
#1 1 5 2
#2 2 17 2
#3 3 5 2
The most intuitive method is to use group_by and top_n function in dplyr
group %>% group_by(Subject) %>% top_n(1, pt)
The result you get is
Source: local data frame [3 x 3]
Groups: Subject [3]
Subject pt Event
(dbl) (dbl) (dbl)
1 1 5 2
2 2 17 2
3 3 5 2
do.call(rbind, lapply(split(group,as.factor(group$Subject)), function(x) {return(x[which.max(x$pt),])}))
Using Base R
by
is a version of tapply
for data frames:
res <- by(group, group$Subject, FUN=function(df) df[which.max(df$pt),])
It returns an object of class by
so we convert it to data frame:
do.call(rbind, b)
Subject pt Event
1 1 5 2
2 2 17 2
3 3 5 2
Another data.table
option:
library(data.table)
setDT(group)
group[group[order(-pt), .I[1L], Subject]$V1]
Or another (less readable but slightly faster):
group[group[, rn := .I][order(Subject, -pt), {
rn[c(1L, 1L + which(diff(Subject)>0L))]
}]]
timing code:
library(data.table)
nr <- 1e7L
ng <- nr/4L
set.seed(0L)
DT <- data.table(Subject=sample(ng, nr, TRUE), pt=1:nr)#rnorm(nr))
DT2 <- copy(DT)
microbenchmark::microbenchmark(times=3L,
mtd0 = {a0 <- DT[DT[, .I[which.max(pt)], by=Subject]$V1]},
mtd1 = {a1 <- DT[DT[order(-pt), .I[1L], Subject]$V1]},
mtd2 = {a2 <- DT2[DT2[, rn := .I][
order(Subject, -pt), rn[c(TRUE, diff(Subject)>0L)]
]]},
mtd3 = {a3 <- unique(DT[order(Subject, -pt)], by="Subject")}
)
fsetequal(a0[order(Subject)], a1[order(Subject)])
#[1] TRUE
fsetequal(a0[order(Subject)], a2[, rn := NULL][order(Subject)])
#[1] TRUE
fsetequal(a0[order(Subject)], a3[order(Subject)])
#[1] TRUE
timings:
Unit: seconds
expr min lq mean median uq max neval
mtd0 3.256322 3.335412 3.371439 3.414502 3.428998 3.443493 3
mtd1 1.733162 1.748538 1.786033 1.763915 1.812468 1.861022 3
mtd2 1.136307 1.159606 1.207009 1.182905 1.242359 1.301814 3
mtd3 1.123064 1.166161 1.228058 1.209257 1.280554 1.351851 3