Aggregate daily level data to weekly level in R

前端未结

关注

 5  653

I have a huge dataset similar to the following reproducible sample data.

   Interval    value
1  2012-06-10   552
2  2012-06-11  4850
3  2012-06-12  4642
4


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  遇见更好的自我        
                
              
                            
                2021-01-03 17:20
              
            
            
                                                                       
If you are using a data frame, you can easily do this with the tidyquant package. Use the tq_transmute function, which applies a mutation and returns a new data frame. Select the "value" column and apply the xts function apply.weekly. The additional argument FUN = sum will get the aggregate by week.





library(tidyquant)

df
#> # A tibble: 31 x 2
#>      Interval value
#>        <date> <int>
#>  1 2012-06-10   552
#>  2 2012-06-11  4850
#>  3 2012-06-12  4642
#>  4 2012-06-13  4132
#>  5 2012-06-14  4190
#>  6 2012-06-15  4186
#>  7 2012-06-16  1139
#>  8 2012-06-17   490
#>  9 2012-06-18  5156
#> 10 2012-06-19  4430
#> # ... with 21 more rows

df %>%
    tq_transmute(select     = value,
                 mutate_fun = apply.weekly,
                 FUN        = sum)
#> # A tibble: 6 x 2
#>     Interval value
#>       <date> <int>
#> 1 2012-06-10   552
#> 2 2012-06-17 23629
#> 3 2012-06-24 23872
#> 4 2012-07-01 23667
#> 5 2012-07-08 23552
#> 6 2012-07-10 10902

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  你的背包        
                
              
                            
                2021-01-03 17:25
              
            
            
                                                                       
When you say "aggregate" the values, you mean take their sum? Let's say your data frame is d and assuming d$Interval is of class Date, you can try

# if d$Interval is not of class Date d$Interval <- as.Date(d$Interval)
formatdate <- function(date)
    paste0("Week ", (as.numeric(format(date, "%d")) - 1) + 1,
        ", ", format(date, "%b %Y"))
# change "sum" to your required function
aggregate(d$value, by = list(formatdate(d$Interval)), sum)
#            Group.1        x
# 1 Week 1, Jul 2012 3725.667
# 2 Week 2, Jul 2012 3199.500
# 3 Week 2, Jun 2012 3544.000
# 4 Week 3, Jun 2012 3434.000
# 5 Week 4, Jun 2012 3333.143
# 6 Week 5, Jun 2012 3158.667

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  没有蜡笔的小新        
                
              
                            
                2021-01-03 17:27
              
            
            
                                                                       
If you mean the sum of of ‘value’ by week I think the easiest way to do it is to convert the data into a xts object as GSee suggested:

data <- as.xts(data$value,order.by=as.Date(data$interval))
weekly <- apply.weekly(data,sum)

            [,1]
2012-06-10   552
2012-06-17 23629
2012-06-24 23872
2012-07-01 23667
2012-07-08 23552
2012-07-10 10902


I leave the formatting of the output as an exercise for you :-)
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  轮回少年        
                
              
                            
                2021-01-03 17:30
              
            
            
                                                                       
I just came across this old question because it was used as a dupe target.
Unfortunately, all the upvoted answers (except the one by konvas  and a now deleted one) present solutions for aggregating the data by week of the year while the OP has requested to aggregate by week of the month.
The definition of week of the year and week of the month is ambiguous as discussed here, here, and here.
However, the OP has indicated that he wants to count the days 1 to 7 of each month as week 1 of the month, days 8 to 14 as week 2 of the month, etc. Note that week 5 is a stub for most of the months consisting of only 2 or 3 days (except for the month of February if no leap year).
Having prepared the ground, here is a data.table solution for this kind of aggregation:
library(data.table)
DT[, .(value = sum(value)), 
       by = .(Interval = sprintf("Week %i, %s", 
                                 (mday(Interval) - 1L) %/% 7L + 1L, 
                                 format(Interval, "%b %Y")))]


           Interval value
1: Week 2, Jun 2012 18366
2: Week 3, Jun 2012 24104
3: Week 4, Jun 2012 23348
4: Week 5, Jun 2012  5204
5: Week 1, Jul 2012 23579
6: Week 2, Jul 2012 11573


We can verify that we have picked the correct intervals by
DT[, .(value = sum(value),
       date_range = toString(range(Interval))), 
   by = .(Week = sprintf("Week %i, %s", 
                             (mday(Interval) -1L) %/% 7L + 1L, 
                             format(Interval, "%b %Y")))]


               Week value             date_range
1: Week 2, Jun 2012 18366 2012-06-10, 2012-06-14
2: Week 3, Jun 2012 24104 2012-06-15, 2012-06-21
3: Week 4, Jun 2012 23348 2012-06-22, 2012-06-28
4: Week 5, Jun 2012  5204 2012-06-29, 2012-06-30
5: Week 1, Jul 2012 23579 2012-07-01, 2012-07-07
6: Week 2, Jul 2012 11573 2012-07-08, 2012-07-10


which is in line with OP's specification.
Data
library(data.table)
DT <- fread(
  "rn   Interval    value
  1  2012-06-10   552
  2  2012-06-11  4850
  3  2012-06-12  4642
  4  2012-06-13  4132
  5  2012-06-14  4190
  6  2012-06-15  4186
  7  2012-06-16  1139
  8  2012-06-17   490
  9  2012-06-18  5156
  10 2012-06-19  4430
  11 2012-06-20  4447
  12 2012-06-21  4256
  13 2012-06-22  3856
  14 2012-06-23  1163
  15 2012-06-24   564
  16 2012-06-25  4866
  17 2012-06-26  4421
  18 2012-06-27  4206
  19 2012-06-28  4272
  20 2012-06-29  3993
  21 2012-06-30  1211
  22 2012-07-01   698
  23 2012-07-02  5770
  24 2012-07-03  5103
  25 2012-07-04   775
  26 2012-07-05  5140
  27 2012-07-06  4868
  28 2012-07-07  1225
  29 2012-07-08   671
  30 2012-07-09  5726
  31 2012-07-10  5176", drop = 1L)
DT[, Interval := as.Date(Interval)]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  萌比男神i        
                
              
                            
                2021-01-03 17:36
              
            
            
                                                                       
If you were to use week from lubridate, you would only get five weeks to pass to by. Assume dat is your data,

> library(lubridate)
> do.call(rbind, by(dat$value, week(dat$Interval), summary))
#    Min. 1st Qu. Median Mean 3rd Qu. Max.
# 24  552    4146   4188 3759    4529 4850
# 25  490    2498   4256 3396    4438 5156
# 26  564    2578   4206 3355    4346 4866
# 27  698     993   4868 3366    5122 5770
# 28  671    1086   3200 3200    5314 5726


This shows a summary for the 24th through 28th week of the year. Similarly, we can get the means with aggregate with

> aggregate(value~week(Interval), data = dat, mean)
#   week(Interval)    value
# 1             24 3758.667
# 2             25 3396.286
# 3             26 3355.000
# 4             27 3366.429
# 5             28 3199.500

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复