Linearly apportion amounts by month

前端 未结 1 1804
借酒劲吻你
借酒劲吻你 2021-01-24 16:11

Please consider the following synthetic data frame:

#Learning to enable splitting contributions spanning two months

start = c(as.Date(\"2013-01-01\"), as.Date(\         


        
1条回答
  •  时光取名叫无心
    2021-01-24 16:28

    Create a function explode that explodes an interval into a data frame with one row per day. Use Map to apply explode to each interval producing a list of data frames, one per interval. Next rbind the data frames in the list into one big data frame, by.date, having one row per day. Finally aggregate by.date into one row for each year/month:

    library(zoo) # as.yearmon
    
    explode <- function(start, end, amount) {
       dates <- seq(start, end, "day")
       data.frame(dates, yearmon = as.yearmon(dates), amount = amount / length(dates))
    }
    by.date <- do.call("rbind", Map(explode, df$start, df$end, df$amount))
    aggregate(amount ~ yearmon, by.date, sum)
    

    Using the data in the question (assuming the occurrence of 2010 was supposed to be 2013) we get:

       yearmon    amount
    1 Jan 2013 100.00000
    2 Feb 2013  94.91525
    3 Mar 2013 105.08475
    4 Apr 2013 100.00000
    5 May 2013 100.00000
    

    UPDATE: If memory is a problem use this for explode instead. It aggregates within explode first so that its output is smaller. Also we have eliminated the dates column in DF as it was only included for debugging:

    explode <- function(start, end, amount) {
       dates <- seq(start, end, "day")
       DF <- data.frame(yearmon = as.yearmon(dates), amount = amount / length(dates))
       aggregate(amount ~ yearmon, DF, sum)
    }
    

    UPDATE 2: Here is another attempt. It uses rowsum which is specialized for aggregating sums. This one ran 10x faster on the data in the post in my test.

    explode2 <- function(start, end, amount) {
      dates <- seq(start, end, "day")
      n <- length(dates)
      rowsum(rep(amount, n) / n, format(dates, "%Y-%m"))
    }
    by.date <- do.call("rbind", Map(explode2, df$start, df$end, df$amount))
    rowsum(by.date, rownames(by.date))
    

    0 讨论(0)
提交回复
热议问题