Consider the following data:
contesto x y perc
1 M01 81.370 255.659 22
2 M02 85.814 242.688 16
3 M03 73.204 240.526 33
I think you're doing it right, if your weights are # observations at each co-ordinate (or in proportion). The function seems to expect all the observations, and there's no way to dynamically update the ggplot object if you call it on your original dataset, because it's already modelled the density, and contains derived plot data.
You might want to use data.table
instead of with()
if your real data set is large, it's about 70 times faster. e.g. see here for 1m co-ords, with 1-20 repeats (>10m observations in this example). No performance relevance for 660 observations, though (and the plot will probably be your performance bottleneck with a large data set anyway).
bigtable<-data.frame(x=runif(10e5),y=runif(10e5),perc=sample(1:20,10e5,T))
system.time(rep.with.by<-with(bigtable, bigtable[rep(1:nrow(bigtable), perc),]))
#user system elapsed
#11.67 0.18 11.92
system.time(rep.with.dt<-data.table(bigtable)[,list(x=rep(x,perc),y=rep(y,perc))])
#user system elapsed
#0.12 0.05 0.18
# CHECK THEY'RE THE SAME
sum(rep.with.dt$x)==sum(rep.with.by$x)
#[1] TRUE
# OUTPUT ROWS
nrow(rep.with.dt)
#[1] 10497966
Adding to the answer above, you can also use the rep
formulation with data.table.
Seems to be a tiny bit slower than @Troy's data.table answer above, but still much faster than data.frame rep
. The advantage is it's much more convenient if you have a lot of columns to repeat; list(x=rep(x,perc), y=rep(y,perc))
will be cumbersome given columns x,y,z,a,b,c,d...
Benchmarks:
system.time(rep.with.by<-with(bigtable, bigtable[rep(1:nrow(bigtable), perc),]))
# user system elapsed
# 17.918 0.523 18.429
system.time(rep.with.dt<-data.table(bigtable)[,list(x=rep(x,perc),y=rep(y,perc))])
# user system elapsed
# 0.056 0.033 0.089
system.time(rep.with.dt2 <- data.table(bigtable)[rep(1:nrow(bigtable), perc),])
# user system elapsed
# 0.166 0.054 0.220