Access data.table columns with strings

后端 未结 3 1112
予麋鹿
予麋鹿 2020-12-15 10:46

Apologies for a question that probably makes it obvious that I usually work in Python/pandas, but I\'m stuck with this. How do I select a data.table column usin

相关标签:
3条回答
  • 2020-12-15 11:14

    I'll add that if you want a bunch of columns, you may wish to use something like:

    dt[ , c("id", paste0("col", 1:10)), with = FALSE]
    

    As @Arun adds below, other options for getting multiple columns are:

    dt[ , mget(c("id", paste0("col", 1:5)))]
    

    and

    dt[ , .SD, .SDcols = c("id", paste0("col", 1:5))]
    

    In recent versions of data.table (e.g. current CRAN) you can also use the "up-a-level" notation like:

    keep_cols = c('id', paste0('col', 1:5))
    dt[ , ..keep_cols]
    

    For reference, mget seems to be very slow; .SDcols is fastest, but competitive with with = FALSE; I personally find all to be useful/most natural in different situations.

    Here's a simple benchmark:

    NN <- 10000L
    MM <- 100L
    mm <- 10L
    
    DT = data.table(id = 1:NN)
    DT[ , paste0("col", 1:MM) := lapply(integer(MM), function(x) runif(NN))]
    
    sdcols = function(...) DT[ , .SD, .SDcols = paste0("col", sample(MM, size = mm))]
    m.get = function(...) DT[ , mget(paste0("col", sample(MM, size=mm)))]
    withF = function(...) DT[ , paste0("col", sample(MM, size = mm)), with = FALSE]
    
    library(microbenchmark)
    microbenchmark(times=100L, sdcols(), m.get(), withF())
    # Unit: microseconds
    #      expr      min        lq      mean    median        uq      max neval cld
    #  sdcols()  780.201  810.4350  865.3564  827.4970  853.4875 2354.577   100 a  
    #   m.get() 2792.293 2864.1225 3052.3872 2899.9370 3031.9260 4831.963   100   c
    #   withF()  897.822  927.7105 1005.3166  945.9495  981.0580 2600.445   100  b 
    
    0 讨论(0)
  • 2020-12-15 11:22

    You can do assignments without get but using brackets:

    dt[, ("col1"):=col2]
    

    instead of:

    dt[, get("col1"):=col2]
    

    See for more explanation: Select / assign to data.table variables which names are stored in a character vector

    0 讨论(0)
  • 2020-12-15 11:30

    You can use get() as the j argument using single brackets:

    library(data.table)
    dt <- data.table(iris)
    dt[, get("Species")]
    

    The result:

    [1] setosa     setosa     setosa     setosa     setosa     setosa .....
    

    You can also use a string directly inside the double bracket operator, like this:

    dt[["Species"]]
    
    0 讨论(0)
提交回复
热议问题