I have two data tables that I\'m trying to merge. One is data on company market values through time and the other is company dividend history through time. I\'m trying to find o
Instead of a rolling join, you may want to use an overlap join with the foverlaps
function of data.table:
# create an interval in the 'companies' datatable
companies[, `:=` (start = compDate - days(90), end = compDate + days(15))]
# create a second date in the 'dividends' datatable
dividends[, Date2 := divDate]
# set the keys for the two datatable
setkey(companies, Sedol, start, end)
setkey(dividends, Sedol, divDate, Date2)
# create a vector of columnnames which can be removed afterwards
deletecols <- c("Date2","start","end")
# perform the overlap join and remove the helper columns
res <- foverlaps(companies, dividends)[, (deletecols) := NULL]
the result:
> res Sedol DivID divDate DivAmnt companyID compDate MktCap 1: 7A662B NA
NA 6 2005-03-31 61.21061 2: 7A662B 5 2005-06-29 0.7772631 7 2005-06-30 66.92951 3: 7A662B 6 2005-06-30 1.1815343 7 2005-06-30 66.92951 4: 7A662B NA NA 8 2005-09-30 78.33914 5: 7A662B NA NA 9 2005-12-31 88.92473 6: 7A662B NA NA 10 2006-03-31 87.85067 7: 91772E 2 2005-01-13 0.2964291 1 2005-03-31 105.19249 8: 91772E 3 2005-01-29 0.8472649 1 2005-03-31 105.19249 9: 91772E NA NA 2 2005-06-30 108.74579 10: 91772E 4 2005-10-01 1.2467408 3 2005-09-30 113.42261 11: 91772E NA NA 4 2005-12-31 120.04491 12: 91772E NA NA 5 2006-03-31 124.35588
In the meantime the data.table authors have introduced non-equi joins (v1.9.8). You can also use that to solve this problem. Using a non-equi join you just need:
companies[, `:=` (start = compDate - days(90), end = compDate + days(15))]
dividends[companies, on = .(Sedol, divDate >= start, divDate <= end)]
to get the intended result.
Used data (the same as in the question, but without the creation of the keys):
set.seed(1337)
companies <- data.table(companyID = 1:10, Sedol = rep(c("91772E", "7A662B"), each = 5),
compDate = (as.Date("2005-04-01") + months(seq(0, 12, 3))) - days(1),
MktCap = c(100 + cumsum(rnorm(5,5)), 50 + cumsum(rnorm(5,1,5))))
dividends <- data.table(DivID = 1:7, Sedol = c(rep('91772E', each = 4), rep('7A662B', each = 3)),
divDate = as.Date(c('2004-11-19','2005-01-13','2005-01-29','2005-10-01','2005-06-29','2005-06-30','2006-04-17')),
DivAmnt = rnorm(7, .8, .3))