I have some transactional records, like the following:
library(data.table)
customers <- 1:75
purchase_dates <- seq( as.Date(\'2016-01-01\'),
I would like to know the prior transaction count and total amount, within a 365-day prior window (i.e., at
d-365
throughd-1
for a transaction on dated
).
I think the idiomatic way is:
df[, c("ppn", "ppa") :=
df[.(cust_id = cust_id, d_dn = purch_dt-365, d_up = purch_dt),
on=.(cust_id, purch_dt >= d_dn, purch_dt < d_up),
.(.N, sum(purch_amt, na.rm=TRUE))
, by=.EACHI][, .(N, V2)]
]
cust_id purch_dt purch_amt ppn ppa
1: 1 2016-03-20 69.65 0 0.00
2: 1 2016-05-17 413.60 1 69.65
3: 1 2016-12-25 357.18 2 483.25
4: 1 2017-03-20 256.21 3 840.43
5: 2 2016-05-26 49.14 0 0.00
---
494: 75 2018-01-12 381.24 2 201.04
495: 75 2018-04-01 65.83 3 582.28
496: 75 2018-06-17 170.30 4 648.11
497: 75 2018-07-22 60.49 5 818.41
498: 75 2018-10-10 66.12 4 677.86
This is a "non-equi join".
Here's the Cartesian self-join with date-range filter:
df_prior <- df[df, on=.(cust_id), allow.cartesian=TRUE
][i.purch_dt < purch_dt &
i.purch_dt >= purch_dt - 365
][, .(prior_purch_cnt = .N,
prior_purch_amt = sum(i.purch_amt)),
keyby=.(cust_id, purch_dt)]
df2 <- df_prior[df, on=.(cust_id, purch_dt)]
df2[is.na(prior_purch_cnt), `:=`(prior_purch_cnt=0,
prior_purch_amt=0
)]
df2
# cust_id purch_dt prior_purch_cnt prior_purch_amt purch_amt
# 1 2016-03-20 0 0.00 69.65
# 1 2016-05-17 1 69.65 413.60
# 1 2016-12-25 2 483.25 357.18
# 1 2017-03-20 3 840.43 256.21
# 2 2016-05-26 0 0.00 49.14
I'm concerned about how this could blow up prior to filtering on datasets where customers have many prior transactions.