In a dataset with multiple observations for each subject I want to take a subset with only the maximum data value for each record. For example, with a following dataset:
In base you can use ave
to get max
per group and compare this with pt
and get a logical vector to subset the data.frame
.
group[group$pt == ave(group$pt, group$Subject, FUN=max),]
# Subject pt Event
#3 1 5 2
#7 2 17 2
#9 3 5 2
Or compare it already in the function.
group[as.logical(ave(group$pt, group$Subject, FUN=function(x) x==max(x))),]
#group[ave(group$pt, group$Subject, FUN=function(x) x==max(x))==1,] #Variant
# Subject pt Event
#3 1 5 2
#7 2 17 2
#9 3 5 2
A dplyr
solution:
library(dplyr)
ID <- c(1,1,1,2,2,2,2,3,3)
Value <- c(2,3,5,2,5,8,17,3,5)
Event <- c(1,1,2,1,2,1,2,2,2)
group <- data.frame(Subject=ID, pt=Value, Event=Event)
group %>%
group_by(Subject) %>%
summarize(max.pt = max(pt))
This yields the following data frame:
Subject max.pt
1 1 5
2 2 17
3 3 5
I wasn't sure what you wanted to do about the Event column, but if you want to keep that as well, how about
isIDmax <- with(dd, ave(Value, ID, FUN=function(x) seq_along(x)==which.max(x)))==1
group[isIDmax, ]
# ID Value Event
# 3 1 5 2
# 7 2 17 2
# 9 3 5 2
Here we use ave
to look at the "Value" column for each "ID". Then we determine which value is the maximal and then turn that into a logical vector we can use to subset the original data.frame.
If you want the biggest pt value for a subject, you could simply use:
pt_max = as.data.frame(aggregate(pt~Subject, group, max))