Randomly select groups (and all cases per group) in R?

后端 未结 2 1015
孤街浪徒
孤街浪徒 2021-01-15 02:02

I have an R dataframe with two levels of data: id and year. Within groups defined by id, the years increase (entire dataset has the sa

相关标签:
2条回答
  • 2021-01-15 02:26
    subset(df, id %in% sample(levels(df$id), 20))
    

    that's assuming your data frame is called df and that your id is a factor (use unique instead of levels if it's not)

    0 讨论(0)
  • 2021-01-15 02:51

    This is pretty straight forward if you use sample and then index. Here's a made up example that looks similar to what you've presented. It's really only two lines of code and could be done in one if you wanted.

    dat <- data.frame(id=paste0(LETTERS[1:8], rep(1:1250, 8)), 
       year=as.factor(as.character(sample(c(1990:2012, 20000, T)))), 
       var1=rnorm(20000), var2=rnorm(20000))
    
    #a look at the data
    head(dat)
    
    #sample 20 id's randomly
    (ids <- sample(unique(dat$id), 20))
    
    #narrow your data set
    dat2 <- dat[dat$id %in% ids, ]
    
    0 讨论(0)
提交回复
热议问题