I am writing a custom aggregation function with data.table (v 1.9.6) and struggle to pass function arguments to it. there have been similar questions on this but none deals
Here's an option using mget
, as commented:
fn_agg <- function(DT, var_list, var_name_list, by_var_list, order_var_list) {
temp <- DT[, setNames(lapply(.SD, sum, na.rm = TRUE), var_name_list),
by = by_var_list, .SDcols = var_list]
setorderv(temp, order_var_list)
cols1 <- paste0(var_name_list, "_del")
cols2 <- paste0(cols1, "_rel")
temp[, (cols1) := lapply(mget(var_name_list), function(x) {
x - shift(x, n = 1, type = "lag")
})]
temp[, (cols2) := lapply(mget(var_name_list), function(x) {
xshift <- shift(x, n = 1, type = "lag")
(x - xshift) / xshift
})]
temp[]
}
fn_agg(dt,
var_list = c("x", "y"),
var_name_list = c("x_sum", "y_sum"),
by_var_list = c("a", "b"),
order_var_list = c("a", "b"))
# a b x_sum y_sum x_sum_del y_sum_del x_sum_del_rel y_sum_del_rel
#1: a e 254 358 NA NA NA NA
#2: b f 246 116 -8 -242 -0.031496063 -0.6759777
#3: c g 272 242 26 126 0.105691057 1.0862069
#4: d h 273 194 1 -48 0.003676471 -0.1983471
Instead of mget
, you could also make use of data.table
's .SDcols
argument as in
temp[, (cols1) := lapply(.SD, function(x) {
x - shift(x, n = 1, type = "lag")
}), .SDcols = var_name_list]
Also, there are probably ways to improve the function by avoiding duplicated computation of shift(x, n = 1, type = "lag")
but I only wanted to demonstrate a way to use data.table in functions.
Looks like a question to me :)
I prefer computing on the language over get
/mget
.
fn_agg = function(dt, var_list, var_name_list, by_var_list, order_var_list) {
j_call = as.call(c(
as.name("."),
sapply(setNames(var_list, var_name_list), function(var) as.call(list(as.name("sum"), as.name(var), na.rm=TRUE)), simplify=FALSE)
))
order_call = as.call(c(
as.name("order"),
lapply(order_var_list, as.name)
))
j2_call = as.call(c(
as.name(":="),
c(
sapply(setNames(var_name_list, paste0(var_name_list,"_del")), function(var) {
substitute(.var - shift(x = .var, n = 1, type = "lag"), list(.var=as.name(var)))
}, simplify=FALSE),
sapply(setNames(var_name_list, paste0(var_name_list,"_del_rel")), function(var) {
substitute((.var - shift(x = .var, n = 1, type = "lag")) / (shift(x = .var, n = 1, type = "lag")), list(.var=as.name(var)))
}, simplify=FALSE)
)
))
dt[eval(order_call), eval(j_call), by=by_var_list
][, eval(j2_call)
][]
}
ans = fn_agg(dt, var_list=c("x","y"), var_name_list=c("x_sum","y_sum"), by_var_list=c("a","b"), order_var_list=c("a","b"))
all.equal(temp2, ans)
#[1] TRUE
Some extra notes:
_del
in step2 and _del_rel
in step3.order
variables is always the same as by
variables you can put them into keyby
argument.