问题
I am new to R and I am trying to write a function to cumulatively sum previously ordered items by customers. I have already found an almost-fitting example of code on Stack Overflow, but I do not manage to modify it accordingly to my needs.
This is the code:
Fruits <- Fruits[order(Cars$order.id), ] #sort data
Fruits$prev_Apples<-with(Fruits,
ave(
ave(Apples, customer.id, FUN=cumsum), #get running sum per customer.id
interaction(customer.id, order.id, drop=T),
FUN=max, na.rm=T) #find largest sum per index per seg
)
And this is the Fruits data.frame:
order.id customer.id Apples Peaches Pears
1001 J Car Ltd 1 0 0
1002 Som Comp 0 2 0
1005 Richardson 0 0 1
1004 J Car Ltd 1 0 0
1003 J Car Ltd 2 0 0
1006 Richardson 1 0 1
1007 Aldridge 0 0 1
1008 J Car Ltd 0 0 1
1010 Som Comp 0 1 0
1009 J Car Ltd 1 0 0
This is what I would like to obtain:
order id customer id Apples Peaches Pears Prev_Apples
1001 J Car Ltd 1 0 0 0
1002 Som Comp 0 2 0 0
1003 J Car Ltd 2 0 0 1
1004 J Car Ltd 1 0 0 3
1005 Richardson 0 0 1 0
1006 Richardson 1 0 1 0
1007 Aldridge 0 0 1 0
1008 J Car Ltd 0 0 1 4
1009 J Car Ltd 1 0 0 4
1010 Som Comp 0 1 0 0
And this is what I actually get:
order id customer id Apples Peaches Pears Prev_Apples
1001 J Car Ltd 1 0 0 1
1002 Som Comp 0 2 0 0
1003 J Car Ltd 2 0 0 3
1004 J Car Ltd 1 0 0 4
1005 Richardson 0 0 1 0
1006 Richardson 1 0 1 1
1007 Aldridge 0 0 1 0
1008 J Car Ltd 0 0 1 4
1009 J Car Ltd 1 0 0 5
1010 Som Comp 0 1 0 0
So the problem is that cumsum includes also the current order of Apples, while I would like it to include only previous orders. How should I modify the code? Any answer will be highly appreciated.
回答1:
Assuming the input shown reproducibly in the Note at the end we sort Fruits
fixing the erroneous reference to Cars
and then use ave
with cumsum
subtracting the current value of Apples
from cumsum
cancelling the last value in the sum.
This gives the same answer as the one listed as expected in the question.
Fruits <- Fruits[order(Fruits$order.id), ]
transform(Fruits, Prev_Apples = ave(Apples, customer.id, FUN = cumsum) - Apples)
giving:
order.id customer.id Apples Peaches Pears Prev_Apples
1 1001 J Car Ltd 1 0 0 0
2 1002 Som Comp 0 2 0 0
5 1003 J Car Ltd 2 0 0 1
4 1004 J Car Ltd 1 0 0 3
3 1005 Richardson 0 0 1 0
6 1006 Richardson 1 0 1 0
7 1007 Aldridge 0 0 1 0
8 1008 J Car Ltd 0 0 1 4
10 1009 J Car Ltd 1 0 0 4
9 1010 Som Comp 0 1 0 0
Note: The input in reproducible form is assumed to be:
Fruits <- structure(list(order.id = c(1001L, 1002L, 1005L, 1004L, 1003L,
1006L, 1007L, 1008L, 1010L, 1009L), customer.id = structure(c(2L,
4L, 3L, 2L, 2L, 3L, 1L, 2L, 4L, 2L), .Label = c("Aldridge", "J Car Ltd",
"Richardson", "Som Comp"), class = "factor"), Apples = c(1L,
0L, 0L, 1L, 2L, 1L, 0L, 0L, 0L, 1L), Peaches = c(0L, 2L, 0L,
0L, 0L, 0L, 0L, 0L, 1L, 0L), Pears = c(0L, 0L, 1L, 0L, 0L, 1L,
1L, 1L, 0L, 0L)), .Names = c("order.id", "customer.id", "Apples",
"Peaches", "Pears"), class = "data.frame", row.names = c(NA,
-10L))
来源:https://stackoverflow.com/questions/47637184/cumsum-excluding-current-value