I am sure this is straight forward but I just cant seem to get it to work. I have a data frame that represents daily totals. I simply want to sum the totals by week, retai
Here is a solution that reads in the data, aggregates it by week and then fills in missing weeks with zero all in 3 lines of code. read.zoo
reads it in assuming a header and a field separator of comma. It converts the first column to Date
class and then transforms the date to the following Friday. The nextfri
function that does this transformation taken from the zoo-quickref
vignette in the zoo package. (If you want to have the end of week be a different day of the week just replace 5 with another day number.) The read.zoo
command also aggregates all points that have the same index (remember that we have transformed them to the last Friday of the week so all points in the same week will have the same Friday as their index now). The next command creates a zero width zoo object that has the weeks from the first to the last and merges that with the output of the read using fill = 0
so that the filled in weeks get that value.
Lines <- "date,amt
2009-04-01,45
2009-04-02,150
2009-04-03,165
2009-04-13,165
2009-04-14,45
2009-04-15,45"
library(zoo)
nextfri <- function(x) 7 * ceiling(as.numeric(x - 5 + 4)/7) + as.Date(5 - 4)
z <- read.zoo(textConnection(Lines), header = TRUE, sep = ",",
FUN = as.Date, FUN2 = nextfri, aggregate = sum)
merge(z, zoo(, seq(min(time(z)), max(time(z)), 7)), fill = 0)
We used textConnection(Lines)
above to make it self contained so that you can just copy this and paste it right into your session but in reality textConnection(Lines)
would be replaced with the name of your file, e.g. "myfile.csv"
.
For the input above the output would be the following zoo object:
2009-04-03 2009-04-10 2009-04-17
360 0 255
There are three vignettes that come with the zoo package that you might want to read.
A solution with the lubridate
library:
library(lubridate)
Lines <- "date,amt
2009-04-01,45
2009-04-02,150
2009-04-03,165
2009-04-13,165
2009-04-14,45
2009-04-15,45
2009-05-15,45"
df <- read.csv(textConnection(Lines))
If you don't need 0 for missing weeks it's simple:
weeks <- week(df$date)
sums <- tapply(df$amt, weeks, sum)
# 14 15 16 20
#360 210 45 45
To put zeros for missing weeks:
span <- min(weeks):max(weeks)
out <- array(0, dim = length(span), dimnames = list(span))
out[dimnames(sums)[[1]]] <- sums
# 14 15 16 17 18 19 20
#360 210 45 0 0 0 45