Apologies for a question that probably makes it obvious that I usually work in Python/pandas, but I\'m stuck with this. How do I select a data.table
column usin
I'll add that if you want a bunch of columns, you may wish to use something like:
dt[ , c("id", paste0("col", 1:10)), with = FALSE]
As @Arun adds below, other options for getting multiple columns are:
dt[ , mget(c("id", paste0("col", 1:5)))]
and
dt[ , .SD, .SDcols = c("id", paste0("col", 1:5))]
In recent versions of data.table
(e.g. current CRAN) you can also use the "up-a-level" notation like:
keep_cols = c('id', paste0('col', 1:5))
dt[ , ..keep_cols]
For reference, mget
seems to be very slow; .SDcols
is fastest, but competitive with with = FALSE
; I personally find all to be useful/most natural in different situations.
Here's a simple benchmark:
NN <- 10000L
MM <- 100L
mm <- 10L
DT = data.table(id = 1:NN)
DT[ , paste0("col", 1:MM) := lapply(integer(MM), function(x) runif(NN))]
sdcols = function(...) DT[ , .SD, .SDcols = paste0("col", sample(MM, size = mm))]
m.get = function(...) DT[ , mget(paste0("col", sample(MM, size=mm)))]
withF = function(...) DT[ , paste0("col", sample(MM, size = mm)), with = FALSE]
library(microbenchmark)
microbenchmark(times=100L, sdcols(), m.get(), withF())
# Unit: microseconds
# expr min lq mean median uq max neval cld
# sdcols() 780.201 810.4350 865.3564 827.4970 853.4875 2354.577 100 a
# m.get() 2792.293 2864.1225 3052.3872 2899.9370 3031.9260 4831.963 100 c
# withF() 897.822 927.7105 1005.3166 945.9495 981.0580 2600.445 100 b
You can do assignments without get but using brackets:
dt[, ("col1"):=col2]
instead of:
dt[, get("col1"):=col2]
See for more explanation: Select / assign to data.table variables which names are stored in a character vector
You can use get()
as the j
argument using single brackets:
library(data.table)
dt <- data.table(iris)
dt[, get("Species")]
The result:
[1] setosa setosa setosa setosa setosa setosa .....
You can also use a string directly inside the double bracket operator, like this:
dt[["Species"]]