Aggregate data frame while keeping original order, in a simple manner

前端未结

关注

 4  1026

囚心锁ツ 2021-02-15 12:38

I\'m having some trouble aggregating a data frame while keeping the groups in their original order (order based on first appearance in data frame). I\'ve managed to get it right

4条回答

青春惊慌失措 (楼主)

2021-02-15 12:55

It's short and simple in data.table. It returns the groups in first appearance order by default.

require(data.table)
DT = as.data.table(orig.df)
DT[, list(sum(add.1),sum(add.2)), by=list(sel.1,sel.2)]

    sel.1 sel.2  V1  V2
 1:     5     4  96  84
 2:     2     2 175 176
 3:     1     5 384 366
 4:     2     5  95  89
 5:     4     1 174 192
 6:     2     4  82  87
 7:     5     3  91  98
 8:     3     2 189 178
 9:     1     4 170 183
10:     1     1 100  91
11:     3     3  81  82
12:     5     5  83  88
13:     2     3  90  96

And this will be fast for large data, so no need to change your code later if you do find speed issues. The following alternative syntax is the easiest way to pass in which columns to group by.

DT[, lapply(.SD,sum), by=c("sel.1","sel.2")]

    sel.1 sel.2 add.1 add.2
 1:     5     4    96    84
 2:     2     2   175   176
 3:     1     5   384   366
 4:     2     5    95    89
 5:     4     1   174   192
 6:     2     4    82    87
 7:     5     3    91    98
 8:     3     2   189   178
 9:     1     4   170   183
10:     1     1   100    91
11:     3     3    81    82
12:     5     5    83    88
13:     2     3    90    96

or, by may also be a single comma separated string of column names, too :

DT[, lapply(.SD,sum), by="sel.1,sel.2"]

0 讨论(0)

查看其它4个回答