问题
I have a df like below
> head(df)
OrderId Timestamp ErrorCode
1 3000000 1455594300434609920 NA
2 3000001 1455594300434614272 NA
3 3000000 1455594300440175104 0
4 3000001 1455594300440179712 0
5 3000002 1455594303468741120 NA
6 3000002 1455594303469326848 0
I need to collapse row in a way that output is something like below
> head(df)
OrderId Timestamp1 Timestamp2 ErrorCode Diff
3000000 1455594300434609920 1455594300440175104 0
3000001 1455594300434614272 1455594300440179712 0
3000002 1455594303468741120 1455594303469326848 0
I used df2=aggregate(Timestamp~.,df,FUN=toString)
But output is
OrderId ErrorCode Timestamp
10 3000001 0 1455594300440179712
11 3000002 0 1455594303469326848
12 3000003 0 1455594303713897984
When I dropped the ErrorCode column and used the same command, I get an expected output
> head(kf)
OrderId Timestamp
1 3000000 1455594300434609920
2 3000001 1455594300434614272
3 3000000 1455594300440175104
4 3000001 1455594300440179712
5 3000002 1455594303468741120
6 3000002 1455594303469326848
> kf2=aggregate(Timestamp~.,kf,FUN=toString)
head(kf2)
OrderId Timestamp
10 3000001 1455594300434614272, 1455594300440179712
11 3000002 1455594303468741120, 1455594303469326848
12 3000003 1455594303711330816, 1455594303713897984
How do I aggregate it in the above manner without removing ErrorCode column. There must be some little thing I am missing.
回答1:
I take it you're actually looking just to reshape your data into a wide format with separate columns for timestamp 1 and 2. One way is to first add a new column that defines the time point of the measurement and then melt and cast the data using reshape2
.
# Add an index to the data.frame
for (i in unique(df$OrderId)) {
ii <- df$OrderId == i
df$time_ind[ii] <- seq_along(ii[ii])
}
library(reshape2)
df_long <- melt(df, id.vars = c("OrderId", "time_ind"),
measure.vars = c("Timestamp", "ErrorCode"))
dcast(df_long, OrderId ~ variable + time_ind)
which will give you
OrderId Timestamp_1 Timestamp_2 ErrorCode_1 ErrorCode_2
1 3000000 1455594300434609920 1455594300440175104 <NA> 0
2 3000001 1455594300434614272 1455594300440179712 <NA> 0
3 3000002 1455594303468741120 1455594303469326848 <NA> 0
来源:https://stackoverflow.com/questions/42642091/aggregate-adjacent-rows-ignoring-certain-columns