sum over past window-size dates per group

偶尔善良 提交于 2019-11-28 09:45:10

问题


The problem is similar to How do I do a conditional sum which only looks between certain date criteria but slightly different and the answer from that does not fit into current problem. The main difference is that the date column based on each group may not necessarily be complete (i.e., certain date may be missing)

Input:

input <- read.table(text="
2017-04-01     A     1
2017-04-02     B     2
2017-04-02     B     2
2017-04-02     C     2
2017-04-02     A     2
2017-04-03     C     3
2017-04-04     A     4
2017-04-05     B     5
2017-04-06     C     6
2017-04-07     A     7
2017-04-08     B     8
2017-04-09     C     9")
colnames(input) <- c("Date","Group","Score")

Rule: for each group at each date, looking back 3 calendar dates (include current date). calculate the sum.

Expected output:

    Date Group 3DaysSumPerGroup
    2017-04-01     A                1 #1  previous two dates are not available. partial is allowed
    2017-04-02     A                3 #2+1 both 4-01 and 4-02 are in the range
    2017-04-04     A                6 #4+2
    2017-04-07     A                7 #7
    2017-04-02     B                4 # 2+2 at the same day
    2017-04-05     B                5
    2017-04-08     B                8
    2017-04-02     C                2
    2017-04-03     C                5
    2017-04-06     C                6
    2017-04-09     C                9

I tried to use rollapply with partial=T, but result doesn't seem correct.

 input %>% 
     group_by(Group) %>% 
     arrange(Date) %>% mutate("3DaysSumPerGroup"=rollapply(data=Score,width=3,align="right",FUN=sum,partial=T,fill=NA,rm.na=T))

回答1:


Here's a (supposedly efficient) solution using the new non-equi joins and the by = .EACHI features in data.table (v1.9.8+)

library(data.table) #v1.10.4

## Convert to a proper date class, and add another column in order to define the range
setDT(input)[, c("Date", "Date2") := {
  Date = as.IDate(Date)
  Date2 = Date - 2L
  .(Date, Date2)
}]

## Run a non-equi join against the unique Date/Group combination in input
## Sum the Scores on the fly
## You can ignore the second Date column 

input[unique(input, by = c("Date", "Group")), ## This removes the dupes
      on = .(Group, Date <= Date, Date >= Date2), ## The join condition
      .(Score = sum(Score)), ## sum the scores
      keyby = .EACHI] ## Run the sum by each row in unique(input, by = c("Date", "Group"))

#     Group       Date       Date Score
#  1:     A 2017-04-01 2017-03-30     1
#  2:     A 2017-04-02 2017-03-31     3
#  3:     A 2017-04-04 2017-04-02     6
#  4:     A 2017-04-07 2017-04-05     7
#  5:     B 2017-04-02 2017-03-31     4
#  6:     B 2017-04-05 2017-04-03     5
#  7:     B 2017-04-08 2017-04-06     8
#  8:     C 2017-04-02 2017-03-31     2
#  9:     C 2017-04-03 2017-04-01     5
# 10:     C 2017-04-06 2017-04-04     6
# 11:     C 2017-04-09 2017-04-07     9


来源:https://stackoverflow.com/questions/44406040/sum-over-past-window-size-dates-per-group

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!