Summarize the self-join index while avoiding cartesian product in R data.table

前端 未结 3 540
南旧
南旧 2021-01-15 07:09

With a 2-column data.table, I\'d like to summarize the pairwise relationships in column 1 by summing the number of shared elements in column 2. In other words,

3条回答
  •  被撕碎了的回忆
    2021-01-15 07:39

    If you can split your Y's into groups that don't have a large intersection of X's, you could do the computation by those groups first, resulting in a smaller intermediate table:

    d[, grp := Y <= 3] # this particular split works best for OP data
    d[, .SD[.SD, allow = T][, .N, by = .(X, i.X)], by = grp][,
        .(N = sum(N)), by = .(X, i.X)]
    

    The intermediate table above has only 16 rows, as opposed to 26. Unfortunately I can't think of an easy way to create such grouping automatically.

提交回复
热议问题