I am trying to sample a data frame from a given data frame such that there are enough samples from each of the levels of a variable. This can be achieved by separating the d
I think what you want is to subset the data frame passed in x
using sample
:
ddply(data1,.(a),function(x) x[sample(nrow(x),20,replace = FALSE),])
But, of course, you still need to take care that the size of the sample for each piece (in this case 20) is at least as big as the smallest subset of your data based on the levels of a
.
It would seem that if you want to sample a category that has less than 20 rows, you'd need replace=TRUE
...
This might do the trick:
ddply(data1,'a',function(x) x[sample.int(NROW(x),20,replace=TRUE),])