dplyr, lubridate : how to aggregate a dataframe by week?

后端 未结 4 881
时光取名叫无心
时光取名叫无心 2020-11-29 03:54

Consider the following example

library(tidyverse)
library(lubridate)
time <- seq(from =ymd(\"2014-02-24\"),to= ymd(\"2014-03-20\"), by=\"days\")
set.seed(         


        
相关标签:
4条回答
  • 2020-11-29 04:02

    In the tidyverse,

    df2 %>% group_by(week = week(time)) %>% summarise(value = mean(values))
    
    ## # A tibble: 5 × 2
    ##    week    value
    ##   <dbl>    <dbl>
    ## 1     8 37.50000
    ## 2     9 38.57143
    ## 3    10 38.57143
    ## 4    11 36.42857
    ## 5    12 45.00000
    

    Or use isoweek instead:

    df2 %>% group_by(week = isoweek(time)) %>% summarise(value = mean(values))
    
    ## # A tibble: 4 × 2
    ##    week    value
    ##   <int>    <dbl>
    ## 1     9 37.14286
    ## 2    10 40.71429
    ## 3    11 35.00000
    ## 4    12 42.50000
    

    Or cut.Date:

    df2 %>% group_by(week = cut(time, "week")) %>% summarise(value = mean(values))
    
    ## # A tibble: 4 × 2
    ##         week    value
    ##       <fctr>    <dbl>
    ## 1 2014-02-24 37.14286
    ## 2 2014-03-03 40.71429
    ## 3 2014-03-10 35.00000
    ## 4 2014-03-17 42.50000
    

    which you can tell to start on Sunday, if you prefer:

    df2 %>% group_by(week = cut(time, "week", start.on.monday = FALSE)) %>% 
        summarise(value = mean(values))
    
    ## # A tibble: 4 × 2
    ##         week    value
    ##       <fctr>    <dbl>
    ## 1 2014-02-23 37.50000
    ## 2 2014-03-02 40.00000
    ## 3 2014-03-09 33.57143
    ## 4 2014-03-16 44.00000
    

    If you want to shift to, say, Tuesday start, add one to your dates:

    df2 %>% group_by(week = cut(time + 1, "week")) %>% summarise(value = mean(values))
    
    ## # A tibble: 4 × 2
    ##         week    value
    ##       <fctr>    <dbl>
    ## 1 2014-02-24 37.50000
    ## 2 2014-03-03 40.00000
    ## 3 2014-03-10 33.57143
    ## 4 2014-03-17 44.00000
    

    Labels will be off, though. If using cut, consider the implications of its include.lowest and right parameters, documented at ?cut.

    0 讨论(0)
  • 2020-11-29 04:02

    why not straight up use floor_date and an integer to adjust the start date of the week?

    library(lubridate)
    time <- seq(from =ymd("2014-02-24"),to= ymd("2014-03-20"), by="days")
    
    set.seed(123)
    
    values <- sample(seq(from = 20, to = 50, by = 5), size = length(time), replace = TRUE)  
    df2 <- data_frame(time, values)
    df2 <- df2 %>% mutate(day_of_week = weekdays(time))
    
    # week wednesday to tuesday
    df2 %>% group_by(Week = floor_date(time-3, unit="week")) %>% 
      summarize(WeeklyAveDist=mean(values), mean(values), min_date = min(time), max_date = max(time)) %>% mutate(weekdays(min_date), weekdays(max_date)))
    
            Week WeeklyAveDist mean.values.   min_date   max_date
    1 2014-02-16      37.50000     37.50000 2014-02-24 2014-02-25
    2 2014-02-23      38.57143     38.57143 2014-02-26 2014-03-04
    3 2014-03-02      38.57143     38.57143 2014-03-05 2014-03-11
    4 2014-03-09      36.42857     36.42857 2014-03-12 2014-03-18
    5 2014-03-16      45.00000     45.00000 2014-03-19 2014-03-20
      weekdays.min_date. weekdays.max_date.
    1             Monday            Tuesday
    2          Wednesday            Tuesday
    3          Wednesday            Tuesday
    4          Wednesday            Tuesday
    5          Wednesday           Thursday
    
    
    # Week Thursday to Wednesday
    df2 %>% group_by(Week = floor_date(time-4, unit="week")) %>% 
      summarize(WeeklyAveDist=mean(values), mean(values), min_date = min(time), max_date = max(time)) %>% mutate(weekdays(min_date), weekdays(max_date)))
    
            Week WeeklyAveDist mean.values.   min_date   max_date
    1 2014-02-16      35.00000     35.00000 2014-02-24 2014-02-26
    2 2014-02-23      39.28571     39.28571 2014-02-27 2014-03-05
    3 2014-03-02      37.14286     37.14286 2014-03-06 2014-03-12
    4 2014-03-09      40.00000     40.00000 2014-03-13 2014-03-19
    5 2014-03-16      40.00000     40.00000 2014-03-20 2014-03-20
      weekdays.min_date. weekdays.max_date.
    1             Monday          Wednesday
    2           Thursday          Wednesday
    3           Thursday          Wednesday
    4           Thursday          Wednesday
    5           Thursday           Thursday
    
    0 讨论(0)
  • 2020-11-29 04:10

    Just this once, after some research, I actually think I came up with a better solution that

    • gives the correct aggregation
    • gives the correct labels

    Example below for weeks starting on a thursday. The weeks will be labeled by their first day a given cycle.

    library(tidyverse)
    library(lubridate)
    options(tibble.print_min = 30)
    
    time <- seq(from =ymd("2014-02-24"),to= ymd("2014-03-20"), by="days")
    set.seed(123)
    values <- sample(seq(from = 20, to = 50, by = 5), size = length(time), replace = TRUE)
    df2 <- data_frame(time, values)
    
    df2 <- df2 %>% mutate(day_of_week_label = wday(time, label = TRUE),
                          day_of_week = wday(time, label = FALSE))
    
    df2 <- df2 %>% mutate(thursday_cycle = time - ((as.integer(day_of_week) - 5) %% 7),
                          tmp_1 = (as.integer(day_of_week) - 5),
                          tmp_2 = ((as.integer(day_of_week) - 5) %% 7))
    

    which gives

    > df2
    # A tibble: 25 × 7
             time values day_of_week_label day_of_week thursday_cycle tmp_1 tmp_2
           <date>  <dbl>             <ord>       <dbl>         <date> <dbl> <dbl>
    1  2014-02-24     30               Mon           2     2014-02-20    -3     4
    2  2014-02-25     45              Tues           3     2014-02-20    -2     5
    3  2014-02-26     30               Wed           4     2014-02-20    -1     6
    4  2014-02-27     50             Thurs           5     2014-02-27     0     0
    5  2014-02-28     50               Fri           6     2014-02-27     1     1
    6  2014-03-01     20               Sat           7     2014-02-27     2     2
    7  2014-03-02     35               Sun           1     2014-02-27    -4     3
    8  2014-03-03     50               Mon           2     2014-02-27    -3     4
    9  2014-03-04     35              Tues           3     2014-02-27    -2     5
    10 2014-03-05     35               Wed           4     2014-02-27    -1     6
    11 2014-03-06     50             Thurs           5     2014-03-06     0     0
    12 2014-03-07     35               Fri           6     2014-03-06     1     1
    13 2014-03-08     40               Sat           7     2014-03-06     2     2
    14 2014-03-09     40               Sun           1     2014-03-06    -4     3
    15 2014-03-10     20               Mon           2     2014-03-06    -3     4
    16 2014-03-11     50              Tues           3     2014-03-06    -2     5
    17 2014-03-12     25               Wed           4     2014-03-06    -1     6
    18 2014-03-13     20             Thurs           5     2014-03-13     0     0
    19 2014-03-14     30               Fri           6     2014-03-13     1     1
    20 2014-03-15     50               Sat           7     2014-03-13     2     2
    21 2014-03-16     50               Sun           1     2014-03-13    -4     3
    22 2014-03-17     40               Mon           2     2014-03-13    -3     4
    23 2014-03-18     40              Tues           3     2014-03-13    -2     5
    24 2014-03-19     50               Wed           4     2014-03-13    -1     6
    25 2014-03-20     40             Thurs           5     2014-03-20     0     0
    
    0 讨论(0)
  • 2020-11-29 04:26
    aggregate(df2$values,by=list(week(df2$time)),mean)
    
      Group.1        x
    1       8 30.00000
    2       9 40.00000
    3      10 36.42857
    4      11 37.85714
    5      12 43.33333
    

    This uses the week function of lubridate and gives the week number of the week in the year.

    To control which day of the week is the starting day just refer to this thread on that topic:

    Changing lubridate function to start on Monday rather than Sunday

    The solution from that thread by nograpes suggests that if you want a custom version of the week() function using an arbitrary day of the week as the beginning of the week that you just construct it from base R like this:

    start.of.week <- function(date)
      date - (setNames(c(6,0:5),0:6) [strftime(date,'%w')])
    
    end.of.week <- function(date)
      date + (setNames(c(0,6:1),0:6) [strftime(date,'%w')])
    
    start.of.week(as.Date(c('2014-01-05','2014-10-02','2014-09-22','2014-09-27')))
    # "2013-12-30" "2014-09-29" "2014-09-22" "2014-09-22"
    end.of.week(as.Date(c('2014-01-05','2014-10-02','2014-09-22','2014-09-27')))
    # "2014-01-05" "2014-10-05" "2014-09-28" "2014-09-28"
    

    In the future lubridate will have this option for an arbitrary start day for weeks, but Hadley hasn't got around to adding it yet (https://github.com/hadley/lubridate/issues/257).

    0 讨论(0)
提交回复
热议问题