The data.table package provides many of the same table handling methods as SQL. If a table has a key, that key consists of one or more columns. But a table can\'t have more
Good question. Note the following (admittedly buried) in ?data.table
:
When
i
is adata.table
,x
must have a key.i
is joined tox
using the key and the rows inx
that match are returned. An equi-join is performed between each column ini
to each column inx
's key. The match is a binary search in compiled C in O(log n) time. Ifi
has less columns thanx
's key then many rows ofx
may match to each row ofi
. Ifi
has more columns thanx
's key, the columns ofi
not involved in the join are included in the result. Ifi
also has a key, it isi
's key columns that are used to match tox
's key columns and a binary merge of the two tables is carried out.
So, the key here is that i
doesn't have to be keyed. Only x
must be keyed.
X2 <- data.table(id = 11:15, y_id = c(14,14,11,12,12), key="id")
id y_id
[1,] 11 14
[2,] 12 14
[3,] 13 11
[4,] 14 12
[5,] 15 12
Y2 <- data.table(id = 11:15, b = letters[1:5], key="id")
id b
[1,] 11 a
[2,] 12 b
[3,] 13 c
[4,] 14 d
[5,] 15 e
Y2[J(X2$y_id)] # binary search for each item of (unsorted and unkeyed) i
id b
[1,] 14 d
[2,] 14 d
[3,] 11 a
[4,] 12 b
[5,] 12 b
or,
Y2[SJ(X2$y_id)] # binary merge of keyed i, see ?SJ
id b
[1,] 11 a
[2,] 12 b
[3,] 12 b
[4,] 14 d
[5,] 14 d
identical(Y2[J(X2$y_id)], Y2[X2$y_id])
[1] FALSE