data.table: Sum by all existing combinations in table

前端 未结 2 654
温柔的废话
温柔的废话 2021-01-15 02:02

I have a data.table out like this (in reality it is much larger):

out <-      code weights group
        1:    2   0.387      1
        2:            


        
相关标签:
2条回答
  • 2021-01-15 02:35

    Using CJ (cross join) you can add the missing combinations:

    library(data.table)
    setkey(out, code, group)
    out[CJ(code, group, unique = TRUE)
        ][, lapply(.SD, sum), by = .(code, group)
          ][is.na(weights), weights := 0]
    

    gives:

       code group weights
    1:    1     1   0.399
    2:    1     2   0.212
    3:    1     3   0.474
    4:    2     1   1.997
    5:    2     2   0.373
    6:    2     3   0.569
    7:    3     1   0.000
    8:    3     2   1.323
    9:    3     3   0.316
    

    Or with xtabs as @alexis_laz showed in the comments:

    xtabs(weights ~ group + code, out)
    

    which gives:

         code
    group     1     2     3
        1 0.399 1.997 0.000
        2 0.212 0.373 1.323
        3 0.474 0.569 0.316
    

    If you want to get this output in a long-form dataframe, you can wrap the xtabs code in the melt function of the reshape2 (or data.table) package:

    library(reshape2)
    res <- melt(xtabs(weights ~ group + code, out))
    

    which gives:

    > class(res)
    [1] "data.frame"
    > res
      group code value
    1     1    1 0.399
    2     2    1 0.212
    3     3    1 0.474
    4     1    2 1.997
    5     2    2 0.373
    6     3    2 0.569
    7     1    3 0.000
    8     2    3 1.323
    9     3    3 0.316
    

    You could also do this with a combination of dplyr and tidyr:

    library(dplyr)
    library(tidyr)
    out %>%
      complete(code, group, fill = list(weights=0)) %>%
      group_by(code, group) %>% 
      summarise(sum(weights))
    
    0 讨论(0)
  • 2021-01-15 02:49

    I had a similar problem, and CJ did not work for some reason. A relatively simple solution I ended up using is first calling dcast and then melt (similar to the xtable solution above)- this also conveniently lets you specify the fill value for the missing combinations.

    sum.dt <- dcast(out, code ~ group, value.var = 'weights', 
                    fun.aggregate = sum, fill = 0)
    sum.dt <- melt(sum.dt, id.vars = 'code', variable.name = 'group')
    

    This gives

    > sum.dt
       code group value
    1:    1     1 0.399
    2:    2     1 1.997
    3:    3     1 0.000
    4:    1     2 0.212
    5:    2     2 0.373
    6:    3     2 1.322
    7:    1     3 0.474
    8:    2     3 0.569
    9:    3     3 0.316
    
    0 讨论(0)
提交回复
热议问题