Consider the following data:
contesto x y perc
1 M01 81.370 255.659 22
2 M02 85.814 242.688 16
3 M03 73.204 240.526 33
Adding to the answer above, you can also use the rep
formulation with data.table.
Seems to be a tiny bit slower than @Troy's data.table answer above, but still much faster than data.frame rep
. The advantage is it's much more convenient if you have a lot of columns to repeat; list(x=rep(x,perc), y=rep(y,perc))
will be cumbersome given columns x,y,z,a,b,c,d...
Benchmarks:
system.time(rep.with.by<-with(bigtable, bigtable[rep(1:nrow(bigtable), perc),]))
# user system elapsed
# 17.918 0.523 18.429
system.time(rep.with.dt<-data.table(bigtable)[,list(x=rep(x,perc),y=rep(y,perc))])
# user system elapsed
# 0.056 0.033 0.089
system.time(rep.with.dt2 <- data.table(bigtable)[rep(1:nrow(bigtable), perc),])
# user system elapsed
# 0.166 0.054 0.220