Aggregate Weekly Data in R

后端 未结 2 1650
无人及你
无人及你 2021-01-01 06:23

I am sure this is straight forward but I just cant seem to get it to work. I have a data frame that represents daily totals. I simply want to sum the totals by week, retai

相关标签:
2条回答
  • 2021-01-01 07:06

    Here is a solution that reads in the data, aggregates it by week and then fills in missing weeks with zero all in 3 lines of code. read.zoo reads it in assuming a header and a field separator of comma. It converts the first column to Date class and then transforms the date to the following Friday. The nextfri function that does this transformation taken from the zoo-quickref vignette in the zoo package. (If you want to have the end of week be a different day of the week just replace 5 with another day number.) The read.zoo command also aggregates all points that have the same index (remember that we have transformed them to the last Friday of the week so all points in the same week will have the same Friday as their index now). The next command creates a zero width zoo object that has the weeks from the first to the last and merges that with the output of the read using fill = 0 so that the filled in weeks get that value.

    Lines <- "date,amt
    2009-04-01,45
    2009-04-02,150
    2009-04-03,165
    2009-04-13,165
    2009-04-14,45
    2009-04-15,45"
    library(zoo)
    nextfri <- function(x) 7 * ceiling(as.numeric(x - 5 + 4)/7) + as.Date(5 - 4)
    z <- read.zoo(textConnection(Lines), header = TRUE, sep = ",", 
        FUN = as.Date, FUN2 = nextfri, aggregate = sum)
    merge(z, zoo(, seq(min(time(z)), max(time(z)), 7)), fill = 0)
    

    We used textConnection(Lines) above to make it self contained so that you can just copy this and paste it right into your session but in reality textConnection(Lines) would be replaced with the name of your file, e.g. "myfile.csv" .

    For the input above the output would be the following zoo object:

    2009-04-03 2009-04-10 2009-04-17 
           360          0        255
    

    There are three vignettes that come with the zoo package that you might want to read.

    0 讨论(0)
  • 2021-01-01 07:19

    A solution with the lubridate library:

    library(lubridate)
    Lines <- "date,amt
    2009-04-01,45
    2009-04-02,150
    2009-04-03,165
    2009-04-13,165
    2009-04-14,45
    2009-04-15,45
    2009-05-15,45"
    df <- read.csv(textConnection(Lines))
    

    If you don't need 0 for missing weeks it's simple:

    weeks <- week(df$date)
    sums <- tapply(df$amt, weeks, sum)
    # 14  15  16  20 
    #360 210  45  45 
    

    To put zeros for missing weeks:

    span <- min(weeks):max(weeks)
    out <- array(0, dim = length(span), dimnames = list(span))
    out[dimnames(sums)[[1]]] <- sums
    # 14  15  16  17  18  19  20 
    #360 210  45   0   0   0  45 
    
    0 讨论(0)
提交回复
热议问题