I have an R dataframe with two levels of data: id
and year
. Within groups defined by id
, the years increase (entire dataset has the sa
subset(df, id %in% sample(levels(df$id), 20))
that's assuming your data frame is called df
and that your id
is a factor (use unique
instead of levels
if it's not)
This is pretty straight forward if you use sample
and then index. Here's a made up example that looks similar to what you've presented. It's really only two lines of code and could be done in one if you wanted.
dat <- data.frame(id=paste0(LETTERS[1:8], rep(1:1250, 8)),
year=as.factor(as.character(sample(c(1990:2012, 20000, T)))),
var1=rnorm(20000), var2=rnorm(20000))
#a look at the data
head(dat)
#sample 20 id's randomly
(ids <- sample(unique(dat$id), 20))
#narrow your data set
dat2 <- dat[dat$id %in% ids, ]