How to sumif across two tables?

后端 未结 1 2021
自闭症患者
自闭症患者 2021-02-19 11:51

I have two tables that I need to do a sumif across. Table 1 contains time periods, i.e. year and quarter at year end (i.e. 4, 8, 12 etc.).

相关标签:
1条回答
  • 2021-02-19 12:08

    Nice question. What basically you are trying to do is to join by Name, Year and Quarter <= Quarter, while summing all the matched Amount values. This is both possible using the new non-equi joins (which were introduced in the latest stable version of data.table v-1.10.0) and foverlaps (while the latter will be probably sub-optimal)

    Non-Equi joins:

    x2[x1, # for each value in `x1` find all the matching values in `x2`
       .(Amount = sum(Amount)), # Sum all the matching values in `Amount`
       on = .(Name, Year, Quarter <= Quarter), # join conditions
       by = .EACHI] # Do the summing per each match in `i`
    #    Name Year Quarter Amount
    # 1: LOB1 2000       4  10000
    # 2: LOB1 2000       8  22500
    # 3: LOB1 2000      12  19500
    # 4: LOB1 2000      16  55500
    # 5: LOB1 2000      20  64500
    # 6: LOB1 2000      24  72000
    # 7: LOB1 2000      28  72000
    # 8: LOB1 2000      32  72000
    # 9: LOB1 2000      36  72000
    

    As a side note, you can easily add Amount in place in x1 (proposed by @Frank):

    x1[, Amount := 
      x2[x1, sum(x.Amount), on = .(Name, Year, Quarter <= Quarter), by = .EACHI]$V1
    ]
    

    This might be convenient if you have more than just the three join columns in that table.


    foverlaps:

    You mentioned foverlaps, so in theory you could achieve the same using this function too. Though I'm afraid you will easily get out of memory. Using foverlaps, you will need to create a huge table where each value in x2 joined multiple times to each value in x1 and store everything in memory

    x1[, Start := 0] # Make sure that we always join starting from Q0
    x2[, Start := Quarter] # In x2 we want to join all possible rows each time 
    setkey(x2, Name, Year, Start, Quarter) # set keys
    ## Make a huge cartesian join by overlaps and then aggregate
    foverlaps(x1, x2)[, .(Amount = sum(Amount)), by = .(Name, Year, Quarter = i.Quarter)]
    #    Name Year Quarter Amount
    # 1: LOB1 2000       4  10000
    # 2: LOB1 2000       8  22500
    # 3: LOB1 2000      12  19500
    # 4: LOB1 2000      16  55500
    # 5: LOB1 2000      20  64500
    # 6: LOB1 2000      24  72000
    # 7: LOB1 2000      28  72000
    # 8: LOB1 2000      32  72000
    # 9: LOB1 2000      36  72000
    
    0 讨论(0)
提交回复
热议问题