I was attempting to answer this nice question about creating a non-standard evaluating function for a data.table object, doing a grouped sum. Akrun came up with a lovely answer which I'll simplify here:
akrun <- function(data, var, group){
var <- substitute(var)
group <- substitute(group)
data[, sum(eval(var)), by = group]
}
library(data.table)
mt = as.data.table(mtcars)
akrun(mt, cyl, mpg)
# group V1
# 1: 6 138.2
# 2: 4 293.3
# 3: 8 211.4
I was also working on an answer, and had close to the same answer, but with the substitute
s inline with the rest. Mine results in an error:
gregor = function(data, var, group) {
data[, sum(eval(substitute(var))), by = substitute(group)]
}
gregor(mt, mpg, cyl)
# Error in `[.data.table`(data, , sum(eval(substitute(var))), by = substitute(group)) :
# 'by' or 'keyby' must evaluate to vector or list of vectors
# (where 'list' includes data.table and data.frame which are lists, too)
At its face, my function is a simple substitution of Akrun's. Why doesn't it work?
Note that both substitutions cause problems, as shown here:
gregor_1 = function(data, var, group) {
var = substitute(var)
data[,sum(eval(var)),
by = substitute(group)]
}
gregor_1(mt, mpg, cyl)
# Same error as above
gregor_2 = function(data, var, group) {
group = substitute(group)
data[,sum(eval(substitute(var))),
by = group]
}
gregor_2(mt, mpg, cyl)
# Error in eval(substitute(var)) : object 'mpg' not found
In substitute
's documentation you can read how it decides what to substitute,
and the fact that, by default, it searches the environment where it is called.
If you call substitute
inside the data.table
frame
(i.e. inside []
)
it won't be able to find the symbols because they are not present inside the data.table
evaluation environment,
they are in the environment where [
was called.
You can "invert" the order in which the functions are called in order to get the behavior you want:
library(data.table)
foo <- function(dt, group, var) {
eval(substitute(dt[, sum(var), by = group]))
}
foo(as.data.table(mtcars), cyl, mpg)
cyl V1
1: 6 138.2
2: 4 293.3
3: 8 211.4
It seems that substitute
does not work within data table in the way one might expect from how it works in other contexts but you can use enexpr
from the rlang package in place of substitute
:
library(data.table)
library(rlang)
gregor_rlang = function(data, var, group) {
data[, sum(eval(enexpr(var))), by = .(group = eval(enexpr(group)))]
}
gregor_rlang(mt, mpg, cyl)
## group V1
## 1: 6 138.2
## 2: 4 293.3
## 3: 8 211.4
environments
The problem seems to be related to environments as this works where we have specifically given the environment substitute
should use.
gregor_pf = function(data, val, group) {
data[, sum(eval(substitute(val, parent.env(environment())))),
by = c(deparse(substitute(group)))]
}
gregor_pf(mt, mpg, cyl)
## cyl V1
## 1: 6 138.2
## 2: 4 293.3
## 3: 8 211.4
来源:https://stackoverflow.com/questions/58649510/why-does-substitute-work-in-multiple-lines-but-not-in-a-single-line