Including all permutations when using data.table[,,by=…]

后端 未结 2 1942
长情又很酷
长情又很酷 2021-01-18 22:40

I have a large data.table that I am collapsing to the month level using ,by.

There are 5 by vars, with # of levels: c(4,3,106,3,1380)

2条回答
  •  失恋的感觉
    2021-01-18 23:31

    I'd also go with a cross-join, but would use it in the i-slot of the original call to [.data.table:

    keycols <- c("g1", "g2", "g3")                       ## Grouping columns
    setkeyv(dat, keycols)                                ## Set dat's key
    ii <- do.call(CJ, sapply(dat[, ..keycols], unique))  ## CJ() to form index
    datCollapsed <- dat[ii, list(nv=.N)]                 ## Aggregate
    
    ## Check that it worked
    nrow(datCollapsed)
    # [1] 625
    table(datCollapsed$nv)
    #   0   1   2   3   4   5   6 
    # 135 191 162  82  39  13   3 
    

    This approach is referred to as a "by-without-by" and, as documented in ?data.table, it is just as efficient and fast as passing the grouping instructions in via the by argument:

    Advanced: Aggregation for a subset of known groups is particularly efficient when passing those groups in i. When i is a data.table, DT[i,j] evaluates j for each row of i. We call this by without by or grouping by i. Hence, the self join DT[data.table(unique(colA)),j] is identical to DT[,j,by=colA].

提交回复
热议问题