Create a time interval of 15 minutes from minutely data in R?

后端 未结 2 841
抹茶落季
抹茶落季 2020-12-03 03:47

I have some data which is formatted in the following way:

time     count 
00:00    17
00:01    62
00:02    41

So I have from 00:00 to 23:5

相关标签:
2条回答
  • 2020-12-03 04:21

    The cut approach is handy but slow with large data frames. The following approach is approximately 1,000x faster than the cut approach (tested with 400k records.)

      #     Function: Truncate (floor) POSIXct to time interval (specified in seconds)
      #       Author: Stephen McDaniel @ PowerTrip Analytics
      #        Date : 2017MAY
      #    Copyright: (C) 2017 by Freakalytics, LLC
      #      License: MIT
    
      floor_datetime <- function(date_var, floor_seconds = 60, 
            origin = "1970-01-01") { # defaults to minute rounding
         if(!is(date_var, "POSIXct")) stop("Please pass in a POSIXct variable")
         if(is.na(date_var)) return(as.POSIXct(NA)) else {
            return(as.POSIXct(floor(as.numeric(date_var) / 
               (floor_seconds))*(floor_seconds), origin = origin))
         }
      }
    

    Sample output:

    test <- data.frame(good = as.POSIXct(Sys.time()), 
       bad1 = as.Date(Sys.time()),
       bad2 = as.POSIXct(NA))
    
    test$good_15 <- floor_datetime(test$good, 15 * 60)
    test$bad1_15 <- floor_datetime(test$bad1, 15 * 60)
    Error in floor_datetime(test$bad, 15 * 60) : 
      Please pass in a POSIXct variable
    test$bad2_15 <- floor_datetime(test$bad2, 15 * 60)
    
    test
    
                            good       bad1 bad2             good_15 bad2_15
        1 2017-05-06 13:55:34.48 2017-05-06 <NA> 2007-05-06 13:45:00    <NA>
    
    0 讨论(0)
  • 2020-12-03 04:22

    For data that's in POSIXct format, you can use the cut function to create 15-minute groupings, and then aggregate by those groups. The code below shows how to do this in base R and with the dplyr and data.table packages.

    First, create some fake data:

    set.seed(4984)
    dat = data.frame(time=seq(as.POSIXct("2016-05-01"), as.POSIXct("2016-05-01") + 60*99, by=60),
                     count=sample(1:50, 100, replace=TRUE))
    

    Base R

    cut the data into 15 minute groups:

    dat$by15 = cut(dat$time, breaks="15 min")
    
                       time count                by15
    1   2016-05-01 00:00:00    22 2016-05-01 00:00:00
    2   2016-05-01 00:01:00    11 2016-05-01 00:00:00
    3   2016-05-01 00:02:00    31 2016-05-01 00:00:00
    ...
    98  2016-05-01 01:37:00    20 2016-05-01 01:30:00
    99  2016-05-01 01:38:00    29 2016-05-01 01:30:00
    100 2016-05-01 01:39:00    37 2016-05-01 01:30:00
    

    Now aggregate by the new grouping column, using sum as the aggregation function:

    dat.summary = aggregate(count ~ by15, FUN=sum, data=dat)
    
                     by15 count
    1 2016-05-01 00:00:00   312
    2 2016-05-01 00:15:00   395
    3 2016-05-01 00:30:00   341
    4 2016-05-01 00:45:00   318
    5 2016-05-01 01:00:00   349
    6 2016-05-01 01:15:00   397
    7 2016-05-01 01:30:00   341
    

    dplyr

    library(dplyr)
    
    dat.summary = dat %>% group_by(by15=cut(time, "15 min")) %>%
      summarise(count=sum(count))
    

    data.table

    library(data.table)
    
    dat.summary = setDT(dat)[ , list(count=sum(count)), by=cut(time, "15 min")]
    

    UPDATE: To answer the comment, for this case the end point of each grouping interval is as.POSIXct(as.character(dat$by15)) + 60*15 - 1. In other words, the endpoint of the grouping interval is 15 minutes minus one second from the start of the interval. We add 60*15 - 1 because POSIXct is denominated in seconds. The as.POSIXct(as.character(...)) is because cut returns a factor and this just converts it back to date-time so that we can do math on it.

    If you want the end point to the nearest minute before the next interval (instead of the nearest second), you could to as.POSIXct(as.character(dat$by15)) + 60*14.

    If you don't know the break interval, for example, because you chose the number of breaks and let R pick the interval, you could find the number of seconds to add by doing max(unique(diff(as.POSIXct(as.character(dat$by15))))) - 1.

    0 讨论(0)
提交回复
热议问题