Randomly select groups (and all cases per group) in R?

后端未结

关注

 2  1015

I have an R dataframe with two levels of data: id and year. Within groups defined by id, the years increase (entire dataset has the sa

相关标签:

2条回答

攒了一身酷

2021-01-15 02:26
```
subset(df, id %in% sample(levels(df$id), 20))
```
that's assuming your data frame is called df and that your id is a factor (use unique instead of levels if it's not)
0 讨论(0)
发布评论:

提交评论
- 加载中...

误落风尘

2021-01-15 02:51

This is pretty straight forward if you use sample and then index. Here's a made up example that looks similar to what you've presented. It's really only two lines of code and could be done in one if you wanted.

dat <- data.frame(id=paste0(LETTERS[1:8], rep(1:1250, 8)), 
   year=as.factor(as.character(sample(c(1990:2012, 20000, T)))), 
   var1=rnorm(20000), var2=rnorm(20000))

#a look at the data
head(dat)

#sample 20 id's randomly
(ids <- sample(unique(dat$id), 20))

#narrow your data set
dat2 <- dat[dat$id %in% ids, ]

0 讨论(0)