R custom data.table function with multiple variable inputs

前端 未结 2 517
后悔当初
后悔当初 2021-01-14 04:44

I am writing a custom aggregation function with data.table (v 1.9.6) and struggle to pass function arguments to it. there have been similar questions on this but none deals

相关标签:
2条回答
  • 2021-01-14 05:10

    Here's an option using mget, as commented:

    fn_agg <- function(DT, var_list, var_name_list, by_var_list, order_var_list) {
    
      temp <- DT[, setNames(lapply(.SD, sum, na.rm = TRUE), var_name_list), 
                 by = by_var_list, .SDcols = var_list]
    
      setorderv(temp, order_var_list)
    
      cols1 <- paste0(var_name_list, "_del")
      cols2 <- paste0(cols1, "_rel")
    
      temp[, (cols1) := lapply(mget(var_name_list), function(x) {
        x - shift(x, n = 1, type = "lag")
      })]
    
      temp[, (cols2) := lapply(mget(var_name_list), function(x) {
        xshift <- shift(x, n = 1, type = "lag")
        (x - xshift) / xshift
      })]
    
      temp[]
    }
    
    fn_agg(dt, 
           var_list = c("x", "y"), 
           var_name_list = c("x_sum", "y_sum"), 
           by_var_list = c("a", "b"), 
           order_var_list = c("a", "b"))
    
    #   a b x_sum y_sum x_sum_del y_sum_del x_sum_del_rel y_sum_del_rel
    #1: a e   254   358        NA        NA            NA            NA
    #2: b f   246   116        -8      -242  -0.031496063    -0.6759777
    #3: c g   272   242        26       126   0.105691057     1.0862069
    #4: d h   273   194         1       -48   0.003676471    -0.1983471
    

    Instead of mget, you could also make use of data.table's .SDcols argument as in

    temp[, (cols1) := lapply(.SD, function(x) {
        x - shift(x, n = 1, type = "lag")
      }), .SDcols = var_name_list]
    

    Also, there are probably ways to improve the function by avoiding duplicated computation of shift(x, n = 1, type = "lag") but I only wanted to demonstrate a way to use data.table in functions.

    0 讨论(0)
  • 2021-01-14 05:16

    Looks like a question to me :)
    I prefer computing on the language over get/mget.

    fn_agg = function(dt, var_list, var_name_list, by_var_list, order_var_list) {
        j_call = as.call(c(
            as.name("."),
            sapply(setNames(var_list, var_name_list), function(var) as.call(list(as.name("sum"), as.name(var), na.rm=TRUE)), simplify=FALSE)
        ))
        order_call = as.call(c(
            as.name("order"),
            lapply(order_var_list, as.name)
        ))
        j2_call = as.call(c(
            as.name(":="),
            c(
                sapply(setNames(var_name_list, paste0(var_name_list,"_del")), function(var) {
                    substitute(.var - shift(x = .var, n = 1, type = "lag"), list(.var=as.name(var)))
                }, simplify=FALSE),
                sapply(setNames(var_name_list, paste0(var_name_list,"_del_rel")), function(var) {
                    substitute((.var - shift(x = .var, n = 1, type = "lag")) / (shift(x = .var, n = 1, type = "lag")), list(.var=as.name(var)))
                }, simplify=FALSE)
            )
        ))
        dt[eval(order_call), eval(j_call), by=by_var_list
           ][, eval(j2_call)
             ][]
    }
    
    ans = fn_agg(dt, var_list=c("x","y"), var_name_list=c("x_sum","y_sum"), by_var_list=c("a","b"), order_var_list=c("a","b"))
    all.equal(temp2, ans)
    #[1] TRUE
    

    Some extra notes:

    1. make strict input validation as debugging issues is more difficuilt against meta programming.
    2. optimization of step2 is possible as shift is computed multiple times, easy way is just to compute _del in step2 and _del_rel in step3.
    3. if order variables is always the same as by variables you can put them into keyby argument.
    0 讨论(0)
提交回复
热议问题