Take the subsets of a data.frame with the same feature and select a single row from each subset

前端 未结 3 945
执念已碎
执念已碎 2020-12-11 12:50

Suppose I have a matrix in R as follows:

ID Value
1 10
2 5
2 8
3 15
4 7
4 9
...

What I need is a random sample where every element is repre

相关标签:
3条回答
  • The idea is reorder the rows randomly and then remove duplicates in that order.

    df <- read.table(text="ID Value
    1 10
    2 5
    2 8
    3 15
    4 7
    4 9", header=TRUE)
    
    df2 <- df[sample(nrow(df)), ]
    df2[!duplicated(df2$ID), ]
    
    0 讨论(0)
  • 2020-12-11 13:17

    tapply across the rownames and grab a sample of 1 in each ID group:

    dat[tapply(rownames(dat),dat$ID,FUN=sample,1),]
    
    #  ID Value
    #1  1    10
    #3  2     8
    #4  3    15
    #6  4     9
    

    If your data is truly a matrix and not a data.frame, you can work around this too, with:

    dat[tapply(as.character(seq(nrow(dat))),dat$ID,FUN=sample,1),]
    

    Don't be tempted to remove the as.character, as sample will give unintended results when there is only one value passed to it. E.g.

    replicate(10, sample(4,1) )
    #[1] 1 1 4 2 1 2 2 2 3 4
    
    0 讨论(0)
  • 2020-12-11 13:24

    You can do that with dplyr like so:

    library(dplyr)
    df %>% group_by(ID) %>% sample_n(1)
    
    0 讨论(0)
提交回复
热议问题