Join two datasets based on an inequality condition

无人久伴 提交于 2019-12-08 09:00:28

问题


I have used the call below to "join" my datasets based on an inequality condition:

library(sqldf)

sqldf("select *
from dataset1 a,
dataset2 b
a.col1 <= b.col2")

However, is there a way I can do this without sqldf?

So far, I can only see merge functions that are based on simple joins on a particular common column.

Thanks!


回答1:


Non-equi (or conditional) joins were recently implemented in data.table, and available in the current development version, v1.9.7. See installation instructions here.

require(data.table) # v1.9.7+
setDT(dataset1) # convert to data.tables
setDT(dataset2)
dataset1[dataset2, on=.(col1 < col2), nomatch=0L]

For each row of dataset2, find matching row indices while joining on condition provided to the on argument, and return all columns for those matching rows.




回答2:


You could definitely do it in two steps utilizing merge.

Example (the exact details of the merge are up to you):

lessRows <- which(df1$col1 < df2$col2)
df3 <- merge(df1, df2)[lessRows, ]



回答3:


I've had that problem a few times and I think I got a solution using dplyr! It might not be the best in terms of efficiency, but it works. I'll suppose you have a constant variable in each case called 'dummy' (or alternatively, it can be another variable to join by). Moreover, I assume dataset1's columns are a_colj and those of dataset2 are b_colj:

dataset1 %>%
    inner_join(dataset2, by='dummy') %>%
    filter(a_col1 <= b_col2)


来源:https://stackoverflow.com/questions/32893022/join-two-datasets-based-on-an-inequality-condition

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!