data.table: Sum by all existing combinations in table

前端 未结 2 651
温柔的废话
温柔的废话 2021-01-15 02:02

I have a data.table out like this (in reality it is much larger):

out <-      code weights group
        1:    2   0.387      1
        2:            


        
2条回答
  •  -上瘾入骨i
    2021-01-15 02:35

    Using CJ (cross join) you can add the missing combinations:

    library(data.table)
    setkey(out, code, group)
    out[CJ(code, group, unique = TRUE)
        ][, lapply(.SD, sum), by = .(code, group)
          ][is.na(weights), weights := 0]
    

    gives:

       code group weights
    1:    1     1   0.399
    2:    1     2   0.212
    3:    1     3   0.474
    4:    2     1   1.997
    5:    2     2   0.373
    6:    2     3   0.569
    7:    3     1   0.000
    8:    3     2   1.323
    9:    3     3   0.316
    

    Or with xtabs as @alexis_laz showed in the comments:

    xtabs(weights ~ group + code, out)
    

    which gives:

         code
    group     1     2     3
        1 0.399 1.997 0.000
        2 0.212 0.373 1.323
        3 0.474 0.569 0.316
    

    If you want to get this output in a long-form dataframe, you can wrap the xtabs code in the melt function of the reshape2 (or data.table) package:

    library(reshape2)
    res <- melt(xtabs(weights ~ group + code, out))
    

    which gives:

    > class(res)
    [1] "data.frame"
    > res
      group code value
    1     1    1 0.399
    2     2    1 0.212
    3     3    1 0.474
    4     1    2 1.997
    5     2    2 0.373
    6     3    2 0.569
    7     1    3 0.000
    8     2    3 1.323
    9     3    3 0.316
    

    You could also do this with a combination of dplyr and tidyr:

    library(dplyr)
    library(tidyr)
    out %>%
      complete(code, group, fill = list(weights=0)) %>%
      group_by(code, group) %>% 
      summarise(sum(weights))
    

提交回复
热议问题