Non-equi join using data.table: column missing from the output

后端 未结 1 1837
陌清茗
陌清茗 2020-11-27 16:15

I am doing a left non-equi join using data.table:

OUTPUT <- DT2[DT1, on=.(DOB, FORENAME, SURNAME, POSTCODE, START_DATE <= MONTH, EXPIRY_DA         


        
相关标签:
1条回答
  • 2020-11-27 16:42

    In data.table, joins of the form x[i] traditionally uses values from i but uses column names from x. Even though this is different from SQL which returns both, this default makes a lot of sense for equi joins since we are interested in all rows from i and if they match then both data.tables have equal values anyway, and if they don't we need to keep those unmatched values from i in result.

    But for non-equi joins, since the values might not match exactly, i.e., can fall within a range, there might be cases where we will have to return outputs similar to SQL (or identify such cases and return the result user expects, similar to the case of equi joins). This hasn't been done yet, but I've placed a provision for it at the moment, which is to refer to the columns explicitly with a x. prefix. It is not convenient, I agree. Hopefully this'll be taken care of automatically soon.

    Here's how to get your result using x. prefix.

    ans <- DT2[DT1, .(CERT_NUMBER, FORENAME, SURNAME, x.START_DATE, x.EXPIRY_DATE, ID, MONTH), 
             on=.(FORENAME, SURNAME, START_DATE <= MONTH, EXPIRY_DATE >= MONTH)]
    

    IIRC there's an issue filed on the GitHub project page about this as well.

    0 讨论(0)
提交回复
热议问题