Perhaps this is already answered and I missed it, but it\'s hard to search.
A very simple question: Why is dt[,x]
generally a tiny bit faster than
With j
, you are subsetting and selecting within a call to [.data.table
.
With $
(and your call), you are subsetting within [.data.table
and then selecting with $
You are in essence calling 2 functions not 1, thus there is a neglible difference in timing.
In your current example you are calling `sampling(1e,200000) each time.
For comparison to return identical results
dt<-data.table(id=1:1e7,var=rnorm(1e6))
setkey(dt, id)
ii <- sample(1e7,size=200000)
microbenchmark("in j" = dt[.(ii),var], "$"=dt[.(ii)]$var, '[[' =dt[.(ii)][['var']], .subset2(dt[.(ii)],'var'), dt[.(ii)][[2]], dt[['var']][ii], dt$var[ii], .subset2(dt,'var')[ii] )
Unit: milliseconds
expr min lq mean median uq max neval cld
in j 39.491156 40.358669 41.570057 40.860342 41.485622 70.202441 100 b
$ 39.957211 40.561965 41.587420 41.136836 41.634584 69.928363 100 b
[[ 40.046558 40.515480 42.388432 41.244444 41.750946 72.224827 100 b
.subset2(dt[.(ii)], "var") 39.772781 40.564077 41.561271 41.111630 41.635489 69.252222 100 b
dt[.(ii)][[2]] 40.004300 40.513669 41.682526 40.927503 41.492866 72.986995 100 b
dt[["var"]][ii] 4.432346 4.546898 4.946219 4.623416 4.755777 31.761115 100 a
dt$var[ii] 4.440496 4.539502 4.668361 4.597457 4.729214 5.425125 100 a
.subset2(dt, "var")[ii] 4.365939 4.508261 4.660435 4.598815 4.703858 6.072289 100 a