I have a data.table
structure like so (except mine is really huge):
dt <- data.table(x=1:5, y=3:7, key=\'x\')
I want to loo
Setting a key is not required and it's faster:
dt[eval(dt[, x %in% ..x])]
x y
1: 3 5
2: 4 6
Benchmark with the previously posted answers
microbenchmark(dt[eval(dt[, x %in% ..x])],
dt[J(get('x', parent.frame(3)))],
dt[eval(list(x))],
dt[eval(J(x))],
dt[eval(.(x))],
merge(dt, data.table(x)),
times = 100L)
Unit: microseconds
expr min lq mean median uq max neval
dt[eval(dt[, x %in% ..x])] 486.1 500.60 518.529 503.70 512.65 1238.0 100
dt[J(get("x", parent.frame(3)))] 837.3 853.25 891.424 860.00 868.30 1675.3 100
dt[eval(list(x))] 831.8 842.70 929.521 851.95 859.85 3878.3 100
dt[eval(J(x))] 833.8 845.50 948.535 856.00 870.00 4599.2 100
dt[eval(.(x))] 828.6 846.40 871.054 851.75 859.35 1985.6 100
merge(dt, data.table(x)) 1766.0 1804.70 1907.617 1819.95 1870.95 3123.1 100
Adding some benchmarking results, by request.
dt
is a 53080731 x 5 data.table
object, keyed by a numeric column with around 100 unique values, fairly evenly distributed. x
is a vector containing 5 of those values.
library(microbenchmark)
> mb <- microbenchmark(
+ dt[eval(J(x))],
+ merge(dt, data.table(x)),
+ times=10
+ )
> mb
Unit: milliseconds
expr min lq median uq max neval
dt[eval(J(x))] 127.324 127.549 133.5305 154.410 159.433 10
merge(dt, data.table(x)) 5028.349 5083.792 5129.6590 5170.451 5250.255 10
@Tyler, if you can assist me with how to use qdap::lookup()
for this case with multiple columns, I can add that too.
New answer, now that I think I understand what was requested:
> X <- data.table(x=x)
> merge(dt, X)
x y
1: 3 6
2: 4 7
There is an item in the NEWS for 1.8.2 that suggests a ..()
syntax will be added at some point, allowing this
New DT[.(...)] syntax (in the style of package plyr) is identical to DT[list(...)], DT[J(...)] and DT[data.table(...)]. We plan to add ..(), too, so that .() and ..() are analogous to the file system's ./ and ../; i.e., .() evaluates within the frame of DT and ..() in the parent scope.
In the mean time, you can get
from the appropriate environment
dt[J(get('x', envir = parent.frame(3)))]
## x y
## 1: 3 5
## 2: 4 6
or you could eval
the whole call to list(x)
or J(x)
dt[eval(list(x))]
dt[eval(J(x))]
dt[eval(.(x))]