Join two datasets based on an inequality condition

前端 未结 3 964
抹茶落季
抹茶落季 2021-01-16 11:21

I have used the call below to \"join\" my datasets based on an inequality condition:

library(sqldf)

sqldf(\"select *
from dataset1 a,
dataset2 b
a.col1 <         


        
相关标签:
3条回答
  • 2021-01-16 11:40

    Non-equi (or conditional) joins were recently implemented in data.table, and available in the current development version, v1.9.7. See installation instructions here.

    require(data.table) # v1.9.7+
    setDT(dataset1) # convert to data.tables
    setDT(dataset2)
    dataset1[dataset2, on=.(col1 < col2), nomatch=0L]
    

    For each row of dataset2, find matching row indices while joining on condition provided to the on argument, and return all columns for those matching rows.

    0 讨论(0)
  • 2021-01-16 11:53

    You could definitely do it in two steps utilizing merge.

    Example (the exact details of the merge are up to you):

    lessRows <- which(df1$col1 < df2$col2)
    df3 <- merge(df1, df2)[lessRows, ]
    
    0 讨论(0)
  • 2021-01-16 12:03

    I've had that problem a few times and I think I got a solution using dplyr! It might not be the best in terms of efficiency, but it works. I'll suppose you have a constant variable in each case called 'dummy' (or alternatively, it can be another variable to join by). Moreover, I assume dataset1's columns are a_colj and those of dataset2 are b_colj:

    dataset1 %>%
        inner_join(dataset2, by='dummy') %>%
        filter(a_col1 <= b_col2)
    
    0 讨论(0)
提交回复
热议问题