I have a data set customerId, transactionDate, productId, purchaseQty loaded into a data.table. for each row, I want to calculate the sum, and mean of purchaseQty for the pr
First, we find how many transaction dates occur in 45 day window prior to the current date (including current date)
setDT(df)
df[, n:= 1:.N - findInterval(transactionDate - 45, transactionDate), by=.(customerID)]
df
# productId customerID transactionDate purchaseQty n
#1: 870826 1186951 2016-03-28 162000 1
#2: 870826 1244216 2016-03-31 5000 1
#3: 870826 1244216 2016-04-08 6500 2
#4: 870826 1308671 2016-03-28 221367 1
#5: 870826 1308671 2016-03-29 83633 2
#6: 870826 1308671 2016-11-29 60500 1
Next we find a rolling sum of purchaseQty
with window size n
. Adopting a great answer here
g <- function(x, window){
b_pos <- seq_along(x) - window + 1 # begin positions
cum <- cumsum(x)
cum - cum[b_pos] + x[b_pos]
}
df[, sumWindowPurchases := g(purchaseQty, n),][,n:=NULL,]
df
# productId customerID transactionDate purchaseQty sumWindowPurchases
#1: 870826 1186951 2016-03-28 162000 162000
#2: 870826 1244216 2016-03-31 5000 5000
#3: 870826 1244216 2016-04-08 6500 11500
#4: 870826 1308671 2016-03-28 221367 221367
#5: 870826 1308671 2016-03-29 83633 305000
#6: 870826 1308671 2016-11-29 60500 60500
structure(list(productId = c(870826L, 870826L, 870826L, 870826L,
870826L, 870826L), customerID = c(1186951L, 1244216L, 1244216L,
1308671L, 1308671L, 1308671L), transactionDate = structure(c(16888,
16891, 16899, 16888, 16889, 17134), class = "Date"), purchaseQty = c(162000L,
5000L, 6500L, 221367L, 83633L, 60500L)), .Names = c("productId",
"customerID", "transactionDate", "purchaseQty"), row.names = c("1:",
"2:", "3:", "4:", "5:", "6:"), class = "data.frame")