I am stuck with a small R issue with data.table
. Your help is much appreciated. How do I do this:
getResult <- function(dt, expr, gby) {
e
An alternative to flodel's answer in the comments could be
e <- parse(text = paste0("sum(", v1, ", na.rm = TRUE)"))
b <- parse(text = v2)
rDT2 <- dt[, eval(e), by = eval(b)]
# b V1
# [1,] setosa 250.3
# [2,] versicolor 296.8
# [3,] virginica 329.4
EDIT:
And to put this into a function,
getResult <- function(dt, expr, gby){
return(dt[, eval(expr), by = eval(gby)])
}
(dtR <- getResult(dt = dt, expr = e, gby = b))
# gives the same result as above
EDIT from Matthew:
There's a subtle reason why the paste0
and eval
\ quote
methods can be faster than get
in some cases, too. One of the reasons grouping can be fast is that data.table
inspects j
to see which columns it uses, then only subsets those used columns (FAQ 1.12 and 3.1). It uses base::all.vars(j)
to do that. When using get()
in j
the column being used is hidden from all.vars
and data.table
falls back to subsetting all the columns just in case the j
expression needs them (much like when the .SD
symbol is used in j
, for which .SDcols
was added to solve). If all the columns are used anyway then it doesn't make a difference, but if DT
is say 1e7x100 then a grouped j=sum(V1)
should be much faster than a grouped j=sum(get("V1"))
for that reason. At least, that's what's supposed to happen, and if it doesn't then it may be a bug. If on the other hand many queries are being constructed dynamically and repeated then the time to paste0
and parse
might come into it. All depends really. Setting verbose=TRUE
should print out a message about which columns have been detected as used by j
, so that can be checked.