How to select rows by group with the minimum value and containing NAs in R

前端 未结 4 590
悲哀的现实
悲哀的现实 2021-01-20 01:55

Here is an example:

set.seed(123)    
data<-data.frame(X=rep(letters[1:3], each=4),Y=sample(1:12,12),Z=sample(1:100, 12))
data[data==3]<-NA


        
相关标签:
4条回答
  • 2021-01-20 02:22

    There is a data.table way

    library(data.table)
    set.seed(123)    
    data<-data.frame(X=rep(letters[1:3], each=4),Y=sample(1:12,12),Z=sample(1:100, 12))
    data[data==3]<-NA
    data <- data.table(data)
    data[data[,.I[which.min(Y)], by = "X"][,V1]]
    
    0 讨论(0)
  • 2021-01-20 02:26

    This does not select the rows using an index but returns the values you want...

    ddply(data, .(X), summarise, min=min(Y, na.rm=T))
    
      X min
    1 a   5
    2 b   1
    3 c   4
    

    EDIT AFTER COMMENT: To select the whole rows you may:

    ddply(data, .(X), function(x) arrange(x, Y)[1, ])
    
      X Y  Z
    1 a 4 68
    2 b 1  4
    3 c 2 64
    

    Or

    data$index <- 1L:nrow(data)
    i <- by(data, data$X, function(x) x$index[which.min(x$Y)] )
    data[i, ]
    
       X Y  Z index
    1  a 4 68     1
    6  b 1  4     6
    10 c 2 64    10
    
    0 讨论(0)
  • 2021-01-20 02:31

    Using subset to for each letter may be this can help

    data<-data.frame(X=rep(letters[1:3], each=4),Y=sample(1:12,12))
    dataA <- subset(data, data$X=="a")
    min(dataA$Y, na.rm=TRUE)
    
    0 讨论(0)
  • 2021-01-20 02:37

    Using the data.table package, this is trivial:

    library(data.table)
    
    d <- data.table(data)
    d[, min(Y, na.rm=TRUE), by=X]
    

    You can also use plyr and its ddply function:

    library(plyr)
    
    ddply(data, .(X), summarise, min(Y, na.rm=TRUE))
    

    Or using base R:

    aggregate(X ~ ., data=data, FUN=min)
    

    Based on the edits, I would use data.table for sure:

    d[, .SD[which.min(Y)], by=X]
    

    However, there are solutions using base R or other packages.

    0 讨论(0)
提交回复
热议问题