Aggregate data frame while keeping original order, in a simple manner

前端 未结 4 1026
囚心锁ツ
囚心锁ツ 2021-02-15 12:38

I\'m having some trouble aggregating a data frame while keeping the groups in their original order (order based on first appearance in data frame). I\'ve managed to get it right

4条回答
  •  青春惊慌失措
    2021-02-15 12:55

    It's short and simple in data.table. It returns the groups in first appearance order by default.

    require(data.table)
    DT = as.data.table(orig.df)
    DT[, list(sum(add.1),sum(add.2)), by=list(sel.1,sel.2)]
    
        sel.1 sel.2  V1  V2
     1:     5     4  96  84
     2:     2     2 175 176
     3:     1     5 384 366
     4:     2     5  95  89
     5:     4     1 174 192
     6:     2     4  82  87
     7:     5     3  91  98
     8:     3     2 189 178
     9:     1     4 170 183
    10:     1     1 100  91
    11:     3     3  81  82
    12:     5     5  83  88
    13:     2     3  90  96
    

    And this will be fast for large data, so no need to change your code later if you do find speed issues. The following alternative syntax is the easiest way to pass in which columns to group by.

    DT[, lapply(.SD,sum), by=c("sel.1","sel.2")]
    
        sel.1 sel.2 add.1 add.2
     1:     5     4    96    84
     2:     2     2   175   176
     3:     1     5   384   366
     4:     2     5    95    89
     5:     4     1   174   192
     6:     2     4    82    87
     7:     5     3    91    98
     8:     3     2   189   178
     9:     1     4   170   183
    10:     1     1   100    91
    11:     3     3    81    82
    12:     5     5    83    88
    13:     2     3    90    96
    

    or, by may also be a single comma separated string of column names, too :

    DT[, lapply(.SD,sum), by="sel.1,sel.2"]
    

提交回复
热议问题