weight equivalent for geom_density2d

后端未结

关注

 2  752

Consider the following data:

   contesto       x       y perc
1       M01  81.370 255.659   22
2       M02  85.814 242.688   16
3       M03  73.204 240.526   33


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  有刺的猬        
                
              
                            
                2021-01-31 12:10
              
            
            
                                                                       
I think you're doing it right, if your weights are # observations at each co-ordinate (or in proportion). The function seems to expect all the observations, and there's no way to dynamically update the ggplot object if you call it on your original dataset, because it's already modelled the density, and contains derived plot data.

You might want to use data.table instead of with() if your real data set is large, it's about 70 times faster. e.g. see here for 1m co-ords, with 1-20 repeats (>10m observations in this example). No performance relevance for 660 observations, though (and the plot will probably be your performance bottleneck with a large data set anyway).

bigtable<-data.frame(x=runif(10e5),y=runif(10e5),perc=sample(1:20,10e5,T))

system.time(rep.with.by<-with(bigtable, bigtable[rep(1:nrow(bigtable), perc),]))
#user  system elapsed 
#11.67    0.18   11.92

system.time(rep.with.dt<-data.table(bigtable)[,list(x=rep(x,perc),y=rep(y,perc))])
#user  system elapsed 
#0.12    0.05    0.18

# CHECK THEY'RE THE SAME
sum(rep.with.dt$x)==sum(rep.with.by$x)
#[1] TRUE    

# OUTPUT ROWS
nrow(rep.with.dt)
#[1] 10497966

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  予麋鹿        
                
              
                            
                2021-01-31 12:15
              
            
            
                                                                       
Adding to the answer above, you can also use the rep formulation with data.table. 

Seems to be a tiny bit slower than @Troy's data.table answer above, but still much faster than data.frame rep. The advantage is it's much more convenient if you have a lot of columns to repeat; list(x=rep(x,perc), y=rep(y,perc)) will be cumbersome given columns x,y,z,a,b,c,d...

Benchmarks:

system.time(rep.with.by<-with(bigtable, bigtable[rep(1:nrow(bigtable), perc),]))
# user  system elapsed 
# 17.918   0.523  18.429 

system.time(rep.with.dt<-data.table(bigtable)[,list(x=rep(x,perc),y=rep(y,perc))])
# user  system elapsed 
# 0.056   0.033   0.089 

system.time(rep.with.dt2 <- data.table(bigtable)[rep(1:nrow(bigtable), perc),])
# user  system elapsed 
# 0.166   0.054   0.220 

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复