weight equivalent for geom_density2d

后端 未结 2 733
[愿得一人]
[愿得一人] 2021-01-31 11:58

Consider the following data:

   contesto       x       y perc
1       M01  81.370 255.659   22
2       M02  85.814 242.688   16
3       M03  73.204 240.526   33
         


        
相关标签:
2条回答
  • 2021-01-31 12:10

    I think you're doing it right, if your weights are # observations at each co-ordinate (or in proportion). The function seems to expect all the observations, and there's no way to dynamically update the ggplot object if you call it on your original dataset, because it's already modelled the density, and contains derived plot data.

    You might want to use data.table instead of with() if your real data set is large, it's about 70 times faster. e.g. see here for 1m co-ords, with 1-20 repeats (>10m observations in this example). No performance relevance for 660 observations, though (and the plot will probably be your performance bottleneck with a large data set anyway).

    bigtable<-data.frame(x=runif(10e5),y=runif(10e5),perc=sample(1:20,10e5,T))
    
    system.time(rep.with.by<-with(bigtable, bigtable[rep(1:nrow(bigtable), perc),]))
    #user  system elapsed 
    #11.67    0.18   11.92
    
    system.time(rep.with.dt<-data.table(bigtable)[,list(x=rep(x,perc),y=rep(y,perc))])
    #user  system elapsed 
    #0.12    0.05    0.18
    
    # CHECK THEY'RE THE SAME
    sum(rep.with.dt$x)==sum(rep.with.by$x)
    #[1] TRUE    
    
    # OUTPUT ROWS
    nrow(rep.with.dt)
    #[1] 10497966
    
    0 讨论(0)
  • 2021-01-31 12:15

    Adding to the answer above, you can also use the rep formulation with data.table.

    Seems to be a tiny bit slower than @Troy's data.table answer above, but still much faster than data.frame rep. The advantage is it's much more convenient if you have a lot of columns to repeat; list(x=rep(x,perc), y=rep(y,perc)) will be cumbersome given columns x,y,z,a,b,c,d...

    Benchmarks:

    system.time(rep.with.by<-with(bigtable, bigtable[rep(1:nrow(bigtable), perc),]))
    # user  system elapsed 
    # 17.918   0.523  18.429 
    
    system.time(rep.with.dt<-data.table(bigtable)[,list(x=rep(x,perc),y=rep(y,perc))])
    # user  system elapsed 
    # 0.056   0.033   0.089 
    
    system.time(rep.with.dt2 <- data.table(bigtable)[rep(1:nrow(bigtable), perc),])
    # user  system elapsed 
    # 0.166   0.054   0.220 
    
    0 讨论(0)
提交回复
热议问题