Apply several summary functions on several variables by group in one call

前端 未结 7 1498
一个人的身影
一个人的身影 2020-11-22 00:03

I have the following data frame

x <- read.table(text = \"  id1 id2 val1 val2
1   a   x    1    9
2   a   x    2    4
3   a   y    3    5
4   a   y    4            


        
7条回答
  •  南方客
    南方客 (楼主)
    2020-11-22 00:12

    Given this in the question :

    I could use the plyr package, but my data set is quite large and plyr is very slow (almost unusable) when the size of the dataset grows.

    Then in data.table (1.9.4+) you could try :

    > DT
       id1 id2 val1 val2
    1:   a   x    1    9
    2:   a   x    2    4
    3:   a   y    3    5
    4:   a   y    4    9
    5:   b   x    1    7
    6:   b   y    4    4
    7:   b   x    3    9
    8:   b   y    2    8
    
    > DT[ , .(mean(val1), mean(val2), .N), by = .(id1, id2)]   # simplest
       id1 id2  V1  V2 N
    1:   a   x 1.5 6.5 2
    2:   a   y 3.5 7.0 2
    3:   b   x 2.0 8.0 2
    4:   b   y 3.0 6.0 2
    
    > DT[ , .(val1.m = mean(val1), val2.m = mean(val2), count = .N), by = .(id1, id2)]  # named
       id1 id2 val1.m val2.m count
    1:   a   x    1.5    6.5     2
    2:   a   y    3.5    7.0     2
    3:   b   x    2.0    8.0     2
    4:   b   y    3.0    6.0     2
    
    > DT[ , c(lapply(.SD, mean), count = .N), by = .(id1, id2)]   # mean over all columns
       id1 id2 val1 val2 count
    1:   a   x  1.5  6.5     2
    2:   a   y  3.5  7.0     2
    3:   b   x  2.0  8.0     2
    4:   b   y  3.0  6.0     2
    

    For timings comparing aggregate (used in question and all 3 other answers) to data.table see this benchmark (the agg and agg.x cases).

提交回复
热议问题