I have a data.table
with date, zipcode and purchase amounts.
library(data.table)
set.seed(88)
DT <- data.table(date = Sys.Date()-365 + sort(s
This seems to work:
DT[, new_col :=
DT[.(zip = zip, d0 = date - 10, d1 = date), on=.(zip, date >= d0, date <= d1),
sum(purchaseAmount)
, by=.EACHI ]$V1
]
date zip purchaseAmount new_col
1: 2016-01-08 1150 5 5
2: 2016-01-15 3000 15 15
3: 2016-02-15 1150 16 16
4: 2016-02-20 2000 18 18
5: 2016-03-07 2000 19 19
6: 2016-03-15 2000 11 30
7: 2016-03-17 2000 6 36
8: 2016-04-02 1150 17 17
9: 2016-04-08 3000 7 7
10: 2016-04-09 3000 20 27
This uses a "non-equi" join, effectively taking each row; finding all rows that meet our criteria in the on=
expression for each row; and then summing by row (by=.EACHI
). In this case, a non-equi join is probably less efficient than some rolling-sum approach.
How it works.
To add columns to a data.table, the usual syntax is DT[, new_col := expression]
. Here, the expression actually works even outside of the DT[...]
. Try running it on its own:
DT[.(zip = zip, d0 = date - 10, d1 = date), on=.(zip, date >= d0, date <= d1),
sum(purchaseAmount)
, by=.EACHI ]$V1
You can progressively simplify this until it's just the join...
DT[.(zip = zip, d0 = date - 10, d1 = date), on=.(zip, date >= d0, date <= d1),
sum(purchaseAmount)
, by=.EACHI ]
# note that V1 is the default name for computed columns
DT[.(zip = zip, d0 = date - 10, d1 = date), on=.(zip, date >= d0, date <= d1)]
# now we're down to just the join
The join syntax is like x[i, on=.(xcol = icol, xcol2 < icol2)]
, as documented in the doc page that opens when you type ?data.table
into an R console with the data.table package loaded.
To get started with data.table, I'd suggest reviewing the vignettes. After that, this'll probably look a lot more legible.
I didn't find any data.table solutions, this is how I got it though:
library(dplyr)
earlierPurchases <- vector()
for(i in 1:nrow(DT)) {
temp <- dplyr::filter(DT, zip == zip[i] & date < date[i])
earlierPurchases[i] <- sum(temp$purchaseAmount)
}
DT <- cbind(DT, earlierPurchases)
It worked quite fast.