Please consider the following synthetic data frame:
#Learning to enable splitting contributions spanning two months
start = c(as.Date(\"2013-01-01\"), as.Date(\
Create a function explode
that explodes an interval into a data frame with one row per day. Use Map
to apply explode
to each interval producing a list of data frames, one per interval. Next rbind
the data frames in the list into one big data frame, by.date
, having one row per day. Finally aggregate by.date
into one row for each year/month:
library(zoo) # as.yearmon
explode <- function(start, end, amount) {
dates <- seq(start, end, "day")
data.frame(dates, yearmon = as.yearmon(dates), amount = amount / length(dates))
}
by.date <- do.call("rbind", Map(explode, df$start, df$end, df$amount))
aggregate(amount ~ yearmon, by.date, sum)
Using the data in the question (assuming the occurrence of 2010 was supposed to be 2013) we get:
yearmon amount
1 Jan 2013 100.00000
2 Feb 2013 94.91525
3 Mar 2013 105.08475
4 Apr 2013 100.00000
5 May 2013 100.00000
UPDATE: If memory is a problem use this for explode
instead. It aggregates within explode
first so that its output is smaller. Also we have eliminated the dates
column in DF
as it was only included for debugging:
explode <- function(start, end, amount) {
dates <- seq(start, end, "day")
DF <- data.frame(yearmon = as.yearmon(dates), amount = amount / length(dates))
aggregate(amount ~ yearmon, DF, sum)
}
UPDATE 2: Here is another attempt. It uses rowsum
which is specialized for aggregating sums. This one ran 10x faster on the data in the post in my test.
explode2 <- function(start, end, amount) {
dates <- seq(start, end, "day")
n <- length(dates)
rowsum(rep(amount, n) / n, format(dates, "%Y-%m"))
}
by.date <- do.call("rbind", Map(explode2, df$start, df$end, df$amount))
rowsum(by.date, rownames(by.date))