What does < stand for in data.table joins with on=

后端 未结 2 1734
走了就别回头了
走了就别回头了 2020-12-06 18:11

Joining the data tables:

X <- data.table(A = 1:4, B = c(1,1,1,1)) 
#    A B
# 1: 1 1
# 2: 2 1
# 3: 3 1
# 4: 4 1

Y <- data.table(A = 4)
#    A
# 1: 4
<         


        
相关标签:
2条回答
  • 2020-12-06 18:50

    You're partially correct. The missing piece of the puzzle is that (currently) when you perform any join, including a non-equi join with <, a single column is returned for the join colum (A in your example). This columns takes the values from the data.table on the right side of the join, in this case the values in A from Y.

    Here's an illustrated example:

    We're planning to change this behaviour in a future version of data.table so that both columns will be returned in the case of non-equi joins. See pull requests https://github.com/Rdatatable/data.table/pull/2706 and https://github.com/Rdatatable/data.table/pull/3093.

    0 讨论(0)
  • 2020-12-06 19:00

    When doing a non-equi join like X[Y, on = .(A < A)] data.table returns the A-column from Y (the i-data.table).

    To get the desired result, you could do:

    X[Y, on = .(A < A), .(A = x.A, B)]
    

    which gives:

       A B
    1: 1 1
    2: 2 1
    3: 3 1
    

    In the next release, data.table will return both A columns. See here for the discussion.

    0 讨论(0)
提交回复
热议问题