I\'m having some trouble aggregating a data frame while keeping the groups in their original order (order based on first appearance in data frame). I\'ve managed to get it right
Not sure how this solution is for speed and storage capacity etc. for large datasets, but I thought it was a pretty easy way for solving this problem.
# Create dataframe
x <- c("C", "C", "A", "A", "A","B", "B")
y <- c(5, 6, 3, 2, 7, 8, 9)
df <- data.frame(x, y)
df
Original dataframe:
x y
1 C 5
2 C 6
3 A 3
4 A 2
5 A 7
6 B 8
7 B 9
Solution:
# Add column with the original order
order <- seq(1:length(df$x))
df$order <- order
# Aggregate
# use sum for column Y (the variable you want to aggregate according to X)
df1 <- aggregate(y~x,data=df,FUN=sum)
# use mean for column 'order'
df2 <- aggregate(order~x, data=df,FUN=mean)
# Add the mean of order values to the dataframe
df <- df1
df$order <- df2$order
# Order the dataframe according the the mean of order values
df <- df[order(df$order),]
df
Aggregated dataframe with same order:
x y order
3 C 11 1.5
1 A 12 4.0
2 B 17 6.5