In R data.table, how do I pass variable parameters to an expression?

前端 未结 1 1191
自闭症患者
自闭症患者 2020-11-30 01:46

I am stuck with a small R issue with data.table. Your help is much appreciated. How do I do this:

getResult <- function(dt, expr, gby) {
  e          


        
相关标签:
1条回答
  • 2020-11-30 02:10

    An alternative to flodel's answer in the comments could be

    e <- parse(text = paste0("sum(", v1, ", na.rm = TRUE)"))
    
    b <- parse(text = v2)
    
    rDT2 <- dt[, eval(e), by = eval(b)]
    
    #               b    V1
    # [1,]     setosa 250.3
    # [2,] versicolor 296.8
    # [3,]  virginica 329.4
    

    EDIT:

    And to put this into a function,

    getResult <- function(dt, expr, gby){
      return(dt[, eval(expr), by = eval(gby)])
    }
    
    (dtR <- getResult(dt = dt, expr = e, gby = b))
    # gives the same result as above
    


    EDIT from Matthew: There's a subtle reason why the paste0 and eval \ quote methods can be faster than get in some cases, too. One of the reasons grouping can be fast is that data.table inspects j to see which columns it uses, then only subsets those used columns (FAQ 1.12 and 3.1). It uses base::all.vars(j) to do that. When using get() in j the column being used is hidden from all.vars and data.table falls back to subsetting all the columns just in case the j expression needs them (much like when the .SD symbol is used in j, for which .SDcols was added to solve). If all the columns are used anyway then it doesn't make a difference, but if DT is say 1e7x100 then a grouped j=sum(V1) should be much faster than a grouped j=sum(get("V1")) for that reason. At least, that's what's supposed to happen, and if it doesn't then it may be a bug. If on the other hand many queries are being constructed dynamically and repeated then the time to paste0 and parse might come into it. All depends really. Setting verbose=TRUE should print out a message about which columns have been detected as used by j, so that can be checked.

    0 讨论(0)
提交回复
热议问题