data.table: Sum by all existing combinations in table

前端未结

关注

 2  654

I have a data.table out like this (in reality it is much larger):

out <-      code weights group
        1:    2   0.387      1
        2:


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  -上瘾入骨i        
                
              
                            
                2021-01-15 02:35
              
            
            
                                                                       
Using CJ (cross join) you can add the missing combinations:

library(data.table)
setkey(out, code, group)
out[CJ(code, group, unique = TRUE)
    ][, lapply(.SD, sum), by = .(code, group)
      ][is.na(weights), weights := 0]


gives:


   code group weights
1:    1     1   0.399
2:    1     2   0.212
3:    1     3   0.474
4:    2     1   1.997
5:    2     2   0.373
6:    2     3   0.569
7:    3     1   0.000
8:    3     2   1.323
9:    3     3   0.316





Or with xtabs as @alexis_laz showed in the comments:

xtabs(weights ~ group + code, out)


which gives:


     code
group     1     2     3
    1 0.399 1.997 0.000
    2 0.212 0.373 1.323
    3 0.474 0.569 0.316



If you want to get this output in a long-form dataframe, you can wrap the xtabs code in the melt function of the reshape2 (or data.table) package:

library(reshape2)
res <- melt(xtabs(weights ~ group + code, out))


which gives:


> class(res)
[1] "data.frame"
> res
  group code value
1     1    1 0.399
2     2    1 0.212
3     3    1 0.474
4     1    2 1.997
5     2    2 0.373
6     3    2 0.569
7     1    3 0.000
8     2    3 1.323
9     3    3 0.316





You could also do this with a combination of dplyr and tidyr:

library(dplyr)
library(tidyr)
out %>%
  complete(code, group, fill = list(weights=0)) %>%
  group_by(code, group) %>% 
  summarise(sum(weights))

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  执念已碎        
                
              
                            
                2021-01-15 02:49
              
            
            
                                                                       
I had a similar problem, and CJ did not work for some reason. A relatively simple solution I ended up using is first calling dcast and then melt (similar to the xtable solution above)- this also conveniently lets you specify the fill value for the missing combinations.

sum.dt <- dcast(out, code ~ group, value.var = 'weights', 
                fun.aggregate = sum, fill = 0)
sum.dt <- melt(sum.dt, id.vars = 'code', variable.name = 'group')


This gives

> sum.dt
   code group value
1:    1     1 0.399
2:    2     1 1.997
3:    3     1 0.000
4:    1     2 0.212
5:    2     2 0.373
6:    3     2 1.322
7:    1     3 0.474
8:    2     3 0.569
9:    3     3 0.316

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复