How to call a function that returns multiple rows and columns in a data.table?

前端 未结 2 430
醉酒成梦
醉酒成梦 2021-01-06 08:41

I want to call a function inside a data.table that calculates a set of summary statistics like the following:

summ.stats <- function(vec) {
    list(
             


        
相关标签:
2条回答
  • 2021-01-06 08:58

    Alternatively you can modify your function as follows:

    summ.stats <- function(vec) {
        list(
            Var = names(vec),
             Min = sapply(vec, min),
             Mean = sapply(vec, mean),
             S.D. = sapply(vec, sd),
             Median = sapply(vec, median),
             Max = sapply(vec, max))
    }
    
    DT[, summ.stats(.SD), by=a] # no need for as.list(.SD) as Roger mentions
       a Var Min Mean      S.D. Median Max
    1: 1   b   1  1.5 0.7071068    1.5   2
    2: 1   c   2  3.5 2.1213203    3.5   5
    3: 2   b   1  2.5 2.1213203    2.5   4
    4: 2   c   2  2.5 0.7071068    2.5   3
    5: 3   b   3  3.5 0.7071068    3.5   4
    6: 3   c   1  2.5 2.1213203    2.5   4
    
    0 讨论(0)
  • Without explicitly reshaping to long form, you could do something like

    rbindlist(lapply(c('b','c'), function(x) data.table(var = x, DT[,summ.stats(get(x)),by=a])))
    
    
    
    #    var a Min Mean      S.D. Median Max
    # 1:   b 1   1  1.5 0.7071068    1.5   2
    # 2:   b 2   1  2.5 2.1213203    2.5   4
    # 3:   b 3   3  3.5 0.7071068    3.5   4
    # 4:   c 1   2  3.5 2.1213203    3.5   5
    # 5:   c 2   2  2.5 0.7071068    2.5   3
    # 6:   c 3   1  2.5 2.1213203    2.5   4
    

    If you reshape the data to long form

    reshape(DT, direction = 'long', 
                varying = list(value = c('b','c')), 
                times = c('b','c'))[,summ.stats(b), by = list(a, Var = time)]
    

    will work as well.


    Less efficiently you could use ldplyfrom plyr, with a slight redefinition of the function

    summ.stats2 <- function(vec) {
        data.table(
             Min = min(vec),
             Mean = mean(vec),
             S.D. = sd(vec),
             Median = median(vec),
             Max = max(vec))
    }
    library(plyr)
    DT[, ldply(lapply(.SD, summ.stats2)),by =a]
    
    0 讨论(0)
提交回复
热议问题