问题
I have a data.table with ID, dates and values like the following one:
DT <- setDT(data.frame(ContractID= c(1,1,1,2,2), Date = c("2018-02-01", "2018-02-20", "2018-03-12", "2018-02-01", "2018-02-12"), Value = c(10,20,30,10,20)))
ContractID Date Value
1: 1 2018-02-01 10
2: 1 2018-02-20 20
3: 1 2018-03-12 30
4: 2 2018-02-01 10
5: 2 2018-02-12 20
I'd like to get a new column with the total cumulative sum per ID from a month ago until the current day for each row, like in the table below. NB: the third row is the sum of the second and the own third, because 2018-03-12 minus 1 month is greater than 2018-02-01, so we exclude the first row in the cum sum.
ContractID Date Value Cum_Sum_1M
1: 1 2018-02-01 10 10
2: 1 2018-02-20 20 30
3: 1 2018-03-12 30 50
4: 2 2018-02-01 10 10
5: 2 2018-02-12 20 30
Is there any way to achieve this using data.table?
Thank you!
回答1:
Using tidyverse
and lubridate
, we first convert Date
to actual Date
object using as.Date
, then group_by
ContractID
and for each Date
sum
the Value
which is between current Date
and one month before the current Date
.
library(tidyverse)
library(lubridate)
DT %>%
mutate(Date = as.Date(Date)) %>%
group_by(ContractID) %>%
mutate(Cum_Sum_1M = map_dbl(1:n(), ~ sum(Value[(Date >= (Date[.] - months(1))) &
(Date <= Date[.])], na.rm = TRUE)))
# A tibble: 5 x 4
# Groups: ContractID [2]
# ContractID Date Value Cum_Sum_1M
# <dbl> <date> <dbl> <dbl>
#1 1 2018-02-01 10 10
#2 1 2018-02-20 20 30
#3 1 2018-03-12 30 50
#4 2 2018-02-01 10 10
#5 2 2018-02-12 20 30
回答2:
This is largely a rolling sum question. froll()
would likely work but you'd have to complete the dataset first so that you can say how many days to roll backwards.
Here I do a non-equi self join. As data.table wants all fields generated before the join, I have to add a column Dates_Lower = Dates-30
so that I can complete the non-equi conditions. My chain with last(Value)
makes it work but I'm not always certain with these self-joins...
I also convert the Date to as.Date
and also renames it as Date()
is a base function.
library(data.table)
dt <- data.table(ContractID= c(1,1,1,2,2)
, Dates = as.Date(c("2018-02-01", "2018-02-20", "2018-03-12", "2018-02-01", "2018-02-12"))
, Value = c(10,20,30,10,20))
dt[dt[, .(ContractID, Dates, Dates_Lower = Dates - 30, Value)] #self-join
,on = .(ContractID = ContractID
, Dates >= Dates_Lower
, Dates <= Dates
)
, j = .(ContractID, Dates, Value)
, allow.cartesian = TRUE
][, j = .(Value = last(Value), Cum_Sum_1M = sum(Value))
,by = .(ContractID, Dates)
]
ContractID Dates Value Cum_Sum_1M
1: 1 2018-02-01 10 10
2: 1 2018-02-20 20 30
3: 1 2018-03-12 30 50
4: 2 2018-02-01 10 10
5: 2 2018-02-12 20 30
回答3:
This is an other working data.table
solution..
dt[, Date := lubridate::ymd( Date ) ]
setkey(dt, Date)
dt[dt, Cum_Sum_1M := {
val = dt[ ContractID == i.ContractID & Date %between% c( i.Date - months(1), i.Date ), Value];
list( sum( val ) )
}, by = .EACHI ]
setkey(dt, ContractID, Date)
output
# ContractID Date Value Cum_Sum_1M
# 1: 1 2018-02-01 10 10
# 2: 1 2018-02-20 20 30
# 3: 1 2018-03-12 30 50
# 4: 2 2018-02-01 10 10
# 5: 2 2018-02-12 20 30
来源:https://stackoverflow.com/questions/55973512/cumulative-sum-from-a-month-ago-until-the-current-day-for-all-the-rows