Translating SQL joins on foreign keys to R data.table syntax

后端 未结 1 1490
孤城傲影
孤城傲影 2020-12-07 18:02

The data.table package provides many of the same table handling methods as SQL. If a table has a key, that key consists of one or more columns. But a table can\'t have more

相关标签:
1条回答
  • 2020-12-07 18:37

    Good question. Note the following (admittedly buried) in ?data.table :

    When i is a data.table, x must have a key. i is joined to x using the key and the rows in x that match are returned. An equi-join is performed between each column in i to each column in x's key. The match is a binary search in compiled C in O(log n) time. If i has less columns than x's key then many rows of x may match to each row of i. If i has more columns than x's key, the columns of i not involved in the join are included in the result. If i also has a key, it is i's key columns that are used to match to x's key columns and a binary merge of the two tables is carried out.

    So, the key here is that i doesn't have to be keyed. Only x must be keyed.

    X2 <- data.table(id = 11:15, y_id = c(14,14,11,12,12), key="id")
         id y_id
    [1,] 11   14
    [2,] 12   14
    [3,] 13   11
    [4,] 14   12
    [5,] 15   12
    Y2 <- data.table(id = 11:15, b = letters[1:5], key="id")
         id b
    [1,] 11 a
    [2,] 12 b
    [3,] 13 c
    [4,] 14 d
    [5,] 15 e
    Y2[J(X2$y_id)]  # binary search for each item of (unsorted and unkeyed) i
         id b
    [1,] 14 d
    [2,] 14 d
    [3,] 11 a
    [4,] 12 b
    [5,] 12 b
    

    or,

    Y2[SJ(X2$y_id)]  # binary merge of keyed i, see ?SJ
         id b
    [1,] 11 a
    [2,] 12 b
    [3,] 12 b
    [4,] 14 d
    [5,] 14 d
    
    identical(Y2[J(X2$y_id)], Y2[X2$y_id])
    [1] FALSE
    
    0 讨论(0)
提交回复
热议问题