Including all permutations when using data.table[,,by=…]

后端未结

关注

 2  1941

I have a large data.table that I am collapsing to the month level using ,by.

There are 5 by vars, with # of levels: c(4,3,106,3,1380)


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  情歌与酒        
                
              
                            
                2021-01-18 23:14
              
            
            
                                                                       
Make a cartesian join of the unique values, and use that to join back to your results

dat.keys <- dat[,CJ(g1=unique(g1), g2=unique(g2), g3=unique(g3))]
setkey(datCollapsed, g1, g2, g3)
nrow(datCollapsed[dat.keys])  # effectively a left join of datCollapsed onto dat.keys
# [1] 625


Note that the missing values are NA right now, but you can easily change that to 0s if you want.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  失恋的感觉        
                
              
                            
                2021-01-18 23:31
              
            
            
                                                                       
I'd also go with a cross-join, but would use it in the i-slot of the original call to [.data.table:

keycols <- c("g1", "g2", "g3")                       ## Grouping columns
setkeyv(dat, keycols)                                ## Set dat's key
ii <- do.call(CJ, sapply(dat[, ..keycols], unique))  ## CJ() to form index
datCollapsed <- dat[ii, list(nv=.N)]                 ## Aggregate

## Check that it worked
nrow(datCollapsed)
# [1] 625
table(datCollapsed$nv)
#   0   1   2   3   4   5   6 
# 135 191 162  82  39  13   3 


This approach is referred to as a "by-without-by" and, as documented in ?data.table, it is just as efficient and fast as passing the grouping instructions in via the by argument:


  Advanced: Aggregation for a subset of known groups is
  particularly efficient when passing those groups in i. When
  i is a data.table, DT[i,j] evaluates j for each row
  of i. We call this by without by or grouping by i. 
  Hence, the self join DT[data.table(unique(colA)),j] is
  identical to DT[,j,by=colA].

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复