I would like to use a date from dataframe A to find any dates within 180 days of this date to select rows in dataframe B, with matching ID\'s.
eg.
If you have a big data, I would suggest using data.tables rolling join instead
Assuming these are your data sets
dfa <- read.table(text = "ID Date
42 '2012-07-21'
42 '2013-04-12'", header = TRUE)
dfb <- read.table(text = "ID Date
12 '2016-09-08'
35 '2008-02-02'
42 '2012-01-09'
42 '2013-03-13'", header = TRUE)
We will convert them to data.tables and convert the Date
column to IDate
class
library(data.table) #1.9.8+
setDT(dfa)[, Date := as.IDate(Date)]
setDT(dfb)[, Date := as.IDate(Date)]
Then, simply join away (you can do the rolling join both ways)
# You can perform another rolling join for `roll = -180` too
indx <- dfb[
dfa, # Per each row in dfa find a match in dfb
on = .(ID, Date), # The columns to join by
roll = 180, # Rolling window, can join again on -180 afterwards
which = TRUE, # Return the row index within `dfb` that been matched
mult = "first", # Multiple match handling- take only the first match
nomatch = 0L # Don't return unmatched indexes (NAs)
]
dfb[indx]
# ID Date
# 1: 42 2013-03-13
An alternative way achieving this, is to use data.tables non-equi join feature on Date +-180 (manually created) columns
# Create range columns
dfa[, c("Date_m_180", "Date_p_180") := .(Date - 180L, Date + 180L)]
# Join away
indx <- dfb[dfa,
on = .(ID, Date >= Date_m_180, Date <= Date_p_180),
which = TRUE,
mult = "first",
nomatch = 0L]
dfb[indx]
# ID Date
# 1: 42 2013-03-13
Both methods should handle large data sets almost instantly