I am trying to make a data frame with the maximum over records by a factor. I would like a data frame with 4 rows (one for each G) with the max for X in that group and the c
library(dplyr)
Data %>%
group_by(G) %>%
filter(X==max(X))
If you don't want to include ties, then
Data %>%
group_by(G) %>%
arrange(desc(X)) %>%
slice(1)
library(data.table)
set.seed(1)
Data<-data.frame(X=rnorm(200), Y=rnorm(200), G=rep(c(1,2,3,4), each=50))
setDT(Data)[,list(X=max(X),Y=Y[which.max(X)]),by=G]
G X Y
1: 1 1.595281 -0.3309078
2: 2 2.401618 0.9510128
3: 3 2.087167 0.9160193
4: 4 2.307978 -0.3887222
You can use by
and reference the rownames
of the row returned by which.max
:
Data[by(Data, Data$G, function(dat) rownames(dat)[which.max(dat$X)] ),]
# X Y G
#4 1.595281 -0.3309078 1
#61 2.401618 0.9510128 2
#147 2.087167 0.9160193 3
#171 2.307978 -0.3887222 4
(This assumes set.seed(1)
for reproducibility's sake)