Cumulative sum from a month ago until the current day for all the rows

送分小仙女□ 提交于 2020-12-30 03:57:35

问题


I have a data.table with ID, dates and values like the following one:

DT <- setDT(data.frame(ContractID= c(1,1,1,2,2), Date = c("2018-02-01", "2018-02-20", "2018-03-12", "2018-02-01", "2018-02-12"), Value = c(10,20,30,10,20)))

   ContractID       Date Value
1:          1 2018-02-01    10
2:          1 2018-02-20    20
3:          1 2018-03-12    30
4:          2 2018-02-01    10
5:          2 2018-02-12    20

I'd like to get a new column with the total cumulative sum per ID from a month ago until the current day for each row, like in the table below. NB: the third row is the sum of the second and the own third, because 2018-03-12 minus 1 month is greater than 2018-02-01, so we exclude the first row in the cum sum.

   ContractID       Date Value Cum_Sum_1M
1:          1 2018-02-01    10         10
2:          1 2018-02-20    20         30
3:          1 2018-03-12    30         50
4:          2 2018-02-01    10         10
5:          2 2018-02-12    20         30

Is there any way to achieve this using data.table?

Thank you!


回答1:


Using tidyverse and lubridate, we first convert Date to actual Date object using as.Date, then group_by ContractID and for each Date sum the Value which is between current Date and one month before the current Date.

library(tidyverse)
library(lubridate)

DT %>%
  mutate(Date = as.Date(Date)) %>%
  group_by(ContractID) %>%
  mutate(Cum_Sum_1M = map_dbl(1:n(), ~ sum(Value[(Date >= (Date[.] - months(1))) &
                                            (Date <= Date[.])], na.rm = TRUE)))


# A tibble: 5 x 4
# Groups:   ContractID [2]
#  ContractID Date       Value Cum_Sum_1M
#       <dbl> <date>     <dbl>      <dbl>
#1          1 2018-02-01    10         10
#2          1 2018-02-20    20         30
#3          1 2018-03-12    30         50
#4          2 2018-02-01    10         10
#5          2 2018-02-12    20         30



回答2:


This is largely a rolling sum question. froll() would likely work but you'd have to complete the dataset first so that you can say how many days to roll backwards.

Here I do a non-equi self join. As data.table wants all fields generated before the join, I have to add a column Dates_Lower = Dates-30 so that I can complete the non-equi conditions. My chain with last(Value) makes it work but I'm not always certain with these self-joins...

I also convert the Date to as.Date and also renames it as Date() is a base function.

library(data.table)

dt <- data.table(ContractID= c(1,1,1,2,2)
                 , Dates = as.Date(c("2018-02-01", "2018-02-20", "2018-03-12", "2018-02-01", "2018-02-12"))
                 , Value = c(10,20,30,10,20))

dt[dt[, .(ContractID, Dates, Dates_Lower = Dates - 30, Value)] #self-join
   ,on = .(ContractID = ContractID
          , Dates >= Dates_Lower
          , Dates <= Dates
          )
   , j = .(ContractID, Dates, Value)
   , allow.cartesian = TRUE
   ][, j = .(Value = last(Value), Cum_Sum_1M = sum(Value))
     ,by = .(ContractID, Dates)
   ]
   ContractID      Dates Value Cum_Sum_1M
1:          1 2018-02-01    10         10
2:          1 2018-02-20    20         30
3:          1 2018-03-12    30         50
4:          2 2018-02-01    10         10
5:          2 2018-02-12    20         30



回答3:


This is an other working data.table solution..

dt[, Date := lubridate::ymd( Date ) ]
setkey(dt, Date)
dt[dt, Cum_Sum_1M := {
  val = dt[ ContractID == i.ContractID & Date %between% c( i.Date - months(1), i.Date ), Value];
  list( sum( val ) )
}, by = .EACHI ]
setkey(dt, ContractID, Date)

output

#    ContractID       Date Value Cum_Sum_1M
# 1:          1 2018-02-01    10         10
# 2:          1 2018-02-20    20         30
# 3:          1 2018-03-12    30         50
# 4:          2 2018-02-01    10         10
# 5:          2 2018-02-12    20         30


来源:https://stackoverflow.com/questions/55973512/cumulative-sum-from-a-month-ago-until-the-current-day-for-all-the-rows

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!