Referring to data.table columns by names saved in variables

前端 未结 4 1026
借酒劲吻你
借酒劲吻你 2020-11-29 21:21

data.table is a fantastic R package and I am using it in a library I am developing. So far all is going very well, except for one complication. It seems to be m

相关标签:
4条回答
  • 2020-11-29 22:09

    Maybe you know about this solution already?

    DT[[colname]]
    

    This is inspired by @eddi's solution in the comments below, using the OP's example:

    set.seed(1)
    x = data.table(a = 1:10, b=rnorm(10))
    colstr="b"
    col <- eval(parse(text=paste("quote(",colstr,")",sep="")))
    x[eval(col)<0]
    x[eval(col)<0,c(colstr):=-100]
    
    0 讨论(0)
  • 2020-11-29 22:09

    eval is definitely not a recommended approach to subset a data.table using dynamically saved variables. The following example will help:

    # Toy data.table example
    DT = data.table(a = c(1,2,3), b = c(4,5,6))
    
    # Saved variable
    mVar <- "a"
    
    # Subset
    DT[DT[[mVar]] < 2]
    

    eval is very sensitive to complex character expressions and generally not recommended for production code.

    0 讨论(0)
  • 2020-11-29 22:17

    Say you have the column name in variable x, you could do

    colname = as.name(x)
    

    you can then use colname in the subset function

    0 讨论(0)
  • 2020-11-29 22:27

    If you are going to be doing complicated operations inside your j expressions, you should probably use eval and quote. One problem with that in current version of data.table is that the environment of eval is not always correctly processed - eval and quote in data.table (Note: There has been an update to that answer based on an update to the package.) - and the current fix for that is to add .SD to eval. As far as I can tell from a few tests that I've run this doesn't affect speed (the way e.g. having .SD[1] in j would).

    Interestingly this issue only plagues the j and you'll be fine using eval normally in i (where .SD is not available anyway).

    The other problem is assignment, and there you have to have strings. I know one way to extract the string name from a quoted expression - it's not pretty, but it works. Here's an example combining everything together:

    x = data.table(dist = c(1:10), val = c(1:10))
    distcol = quote(dist)
    valcol = quote(val)
    
    x[eval(valcol) < 5,
      capture.output(str(distcol, give.head = F)) := eval(distcol)*sum(eval(distcol, .SD))]
    

    Note how I was ok not adding .SD in one eval(distcol), but won't be if I take it out of the other eval.

    Another option is to use get:

    diststr = "dist"
    valstr = "val"
    
    x[get(valstr) < 5, c(diststr) := get(diststr)*sum(get(diststr))]
    
    0 讨论(0)
提交回复
热议问题