I\'m having some trouble aggregating a data frame while keeping the groups in their original order (order based on first appearance in data frame). I\'ve managed to get it right
It's short and simple in data.table. It returns the groups in first appearance order by default.
require(data.table)
DT = as.data.table(orig.df)
DT[, list(sum(add.1),sum(add.2)), by=list(sel.1,sel.2)]
sel.1 sel.2 V1 V2
1: 5 4 96 84
2: 2 2 175 176
3: 1 5 384 366
4: 2 5 95 89
5: 4 1 174 192
6: 2 4 82 87
7: 5 3 91 98
8: 3 2 189 178
9: 1 4 170 183
10: 1 1 100 91
11: 3 3 81 82
12: 5 5 83 88
13: 2 3 90 96
And this will be fast for large data, so no need to change your code later if you do find speed issues. The following alternative syntax is the easiest way to pass in which columns to group by.
DT[, lapply(.SD,sum), by=c("sel.1","sel.2")]
sel.1 sel.2 add.1 add.2
1: 5 4 96 84
2: 2 2 175 176
3: 1 5 384 366
4: 2 5 95 89
5: 4 1 174 192
6: 2 4 82 87
7: 5 3 91 98
8: 3 2 189 178
9: 1 4 170 183
10: 1 1 100 91
11: 3 3 81 82
12: 5 5 83 88
13: 2 3 90 96
or, by
may also be a single comma separated string of column names, too :
DT[, lapply(.SD,sum), by="sel.1,sel.2"]