How to do a data.table rolling join?

前端 未结 1 1365
死守一世寂寞
死守一世寂寞 2021-02-10 01:55

I have two data tables that I\'m trying to merge. One is data on company market values through time and the other is company dividend history through time. I\'m trying to find o

1条回答
  •  青春惊慌失措
    2021-02-10 02:39

    Instead of a rolling join, you may want to use an overlap join with the foverlaps function of data.table:

    # create an interval in the 'companies' datatable
    companies[, `:=` (start = compDate - days(90), end = compDate + days(15))]
    # create a second date in the 'dividends' datatable
    dividends[, Date2 := divDate]
    
    # set the keys for the two datatable
    setkey(companies, Sedol, start, end)
    setkey(dividends, Sedol, divDate, Date2)
    
    # create a vector of columnnames which can be removed afterwards
    deletecols <- c("Date2","start","end")
    
    # perform the overlap join and remove the helper columns
    res <- foverlaps(companies, dividends)[, (deletecols) := NULL]
    

    the result:

    > res
         Sedol DivID    divDate   DivAmnt companyID   compDate    MktCap
     1: 7A662B    NA               NA         6 2005-03-31  61.21061
     2: 7A662B     5 2005-06-29 0.7772631         7 2005-06-30  66.92951
     3: 7A662B     6 2005-06-30 1.1815343         7 2005-06-30  66.92951
     4: 7A662B    NA               NA         8 2005-09-30  78.33914
     5: 7A662B    NA               NA         9 2005-12-31  88.92473
     6: 7A662B    NA               NA        10 2006-03-31  87.85067
     7: 91772E     2 2005-01-13 0.2964291         1 2005-03-31 105.19249
     8: 91772E     3 2005-01-29 0.8472649         1 2005-03-31 105.19249
     9: 91772E    NA               NA         2 2005-06-30 108.74579
    10: 91772E     4 2005-10-01 1.2467408         3 2005-09-30 113.42261
    11: 91772E    NA               NA         4 2005-12-31 120.04491
    12: 91772E    NA               NA         5 2006-03-31 124.35588
    

    In the meantime the data.table authors have introduced non-equi joins (v1.9.8). You can also use that to solve this problem. Using a non-equi join you just need:

    companies[, `:=` (start = compDate - days(90), end = compDate + days(15))]
    dividends[companies, on = .(Sedol, divDate >= start, divDate <= end)]
    

    to get the intended result.


    Used data (the same as in the question, but without the creation of the keys):

    set.seed(1337)
    companies <- data.table(companyID = 1:10, Sedol = rep(c("91772E", "7A662B"), each = 5),
                            compDate = (as.Date("2005-04-01") + months(seq(0, 12, 3))) - days(1),
                            MktCap = c(100 + cumsum(rnorm(5,5)), 50 + cumsum(rnorm(5,1,5))))
    dividends <- data.table(DivID = 1:7, Sedol = c(rep('91772E', each = 4), rep('7A662B', each = 3)),
                            divDate = as.Date(c('2004-11-19','2005-01-13','2005-01-29','2005-10-01','2005-06-29','2005-06-30','2006-04-17')),
                            DivAmnt = rnorm(7, .8, .3))
    

    0 讨论(0)
提交回复
热议问题