I\'m trying to do a complexe non-equi join between two tables. I got inspired by a presentation in the last useR2016 (https://channel9.msdn.com/events/useR-international-R-
For "between" joins like this one, one could also use data.table::foverlaps
, which joins two data.table
's on ranges that overlap, instead of using non-equi joins.
Taking the same example, the following code would produce the desired outcome.
# foverlap tests the overlap of two ranges. Create a second column,
# dbh2, as the end point of the range.
dt1[, dbh2 := dbh]
# foverlap requires the second argument to be keyed
setkey(dt1, sp, dbh, dbh2)
# find rows where dbh falls between dbh_min and dbh_max, and drop unnecessary
# columns afterwards
foverlaps(dt2, dt1, by.x = c("sp", "dbh_min", "dbh_max"), by.y = key(dt1),
nomatch = 0)[
,
-c("dbh2", "dbh_min", "dbh_max")
]
# sp dbh gr_sp dhb_clas
# 1: SAB 10 RES s
# 2: SAB 12 RES s
# 3: SAB 16 RES m
# 4: SAB 22 RES l
# 5: EPN 12 RES s
# 6: EPN 16 RES m
# 7: BOP 10 DEC s
# 8: BOP 12 DEC s
# 9: BOP 14 DEC s
# 10: BOP 20 DEC m
# 11: BOP 26 DEC l
# 12: PET 12 DEC s
# 13: PET 16 DEC s
# 14: PET 18 DEC s
So I was very close. I had 2 problems, first a bad installation of the data.table package (Data table error could not find function ".") caused an obscure error.
After having fixed that, I got closer an found that :
dt1[dt2, on=.(sp=sp, dbh>=dbh_min, dbh<=dbh_max), nomatch=0]
gave me what I wanted with a bad dbh column. Inverting the command with:
dt2[dt1, on=.(sp=sp, dbh_min<=dbh, dbh_max>=dbh)]
fixed the problem with only one useless extra column.