Create a time interval of 15 minutes from minutely data in R?

后端未结

关注

 2  841

I have some data which is formatted in the following way:

time     count 
00:00    17
00:01    62
00:02    41

So I have from 00:00 to 23:5

相关标签:

2条回答

甜味超标

2020-12-03 04:21

The cut approach is handy but slow with large data frames. The following approach is approximately 1,000x faster than the cut approach (tested with 400k records.)

  #     Function: Truncate (floor) POSIXct to time interval (specified in seconds)
  #       Author: Stephen McDaniel @ PowerTrip Analytics
  #        Date : 2017MAY
  #    Copyright: (C) 2017 by Freakalytics, LLC
  #      License: MIT

  floor_datetime <- function(date_var, floor_seconds = 60, 
        origin = "1970-01-01") { # defaults to minute rounding
     if(!is(date_var, "POSIXct")) stop("Please pass in a POSIXct variable")
     if(is.na(date_var)) return(as.POSIXct(NA)) else {
        return(as.POSIXct(floor(as.numeric(date_var) / 
           (floor_seconds))*(floor_seconds), origin = origin))
     }
  }

Sample output:

test <- data.frame(good = as.POSIXct(Sys.time()), 
   bad1 = as.Date(Sys.time()),
   bad2 = as.POSIXct(NA))

test$good_15 <- floor_datetime(test$good, 15 * 60)
test$bad1_15 <- floor_datetime(test$bad1, 15 * 60)
Error in floor_datetime(test$bad, 15 * 60) : 
  Please pass in a POSIXct variable
test$bad2_15 <- floor_datetime(test$bad2, 15 * 60)

test

                        good       bad1 bad2             good_15 bad2_15
    1 2017-05-06 13:55:34.48 2017-05-06 <NA> 2007-05-06 13:45:00    <NA>

0 讨论(0)

慢半拍i

2020-12-03 04:22
For data that's in POSIXct format, you can use the cut function to create 15-minute groupings, and then aggregate by those groups. The code below shows how to do this in base R and with the dplyr and data.table packages.

First, create some fake data:
```
set.seed(4984)
dat = data.frame(time=seq(as.POSIXct("2016-05-01"), as.POSIXct("2016-05-01") + 60*99, by=60),
                 count=sample(1:50, 100, replace=TRUE))
```
Base R

cut the data into 15 minute groups:
```
dat$by15 = cut(dat$time, breaks="15 min")
```
```
                   time count                by15
1   2016-05-01 00:00:00    22 2016-05-01 00:00:00
2   2016-05-01 00:01:00    11 2016-05-01 00:00:00
3   2016-05-01 00:02:00    31 2016-05-01 00:00:00
...
98  2016-05-01 01:37:00    20 2016-05-01 01:30:00
99  2016-05-01 01:38:00    29 2016-05-01 01:30:00
100 2016-05-01 01:39:00    37 2016-05-01 01:30:00
```
Now aggregate by the new grouping column, using sum as the aggregation function:
```
dat.summary = aggregate(count ~ by15, FUN=sum, data=dat)
```
```
                 by15 count
1 2016-05-01 00:00:00   312
2 2016-05-01 00:15:00   395
3 2016-05-01 00:30:00   341
4 2016-05-01 00:45:00   318
5 2016-05-01 01:00:00   349
6 2016-05-01 01:15:00   397
7 2016-05-01 01:30:00   341
```
dplyr
```
library(dplyr)

dat.summary = dat %>% group_by(by15=cut(time, "15 min")) %>%
  summarise(count=sum(count))
```
data.table
```
library(data.table)

dat.summary = setDT(dat)[ , list(count=sum(count)), by=cut(time, "15 min")]
```
UPDATE: To answer the comment, for this case the end point of each grouping interval is as.POSIXct(as.character(dat$by15)) + 60*15 - 1. In other words, the endpoint of the grouping interval is 15 minutes minus one second from the start of the interval. We add 60*15 - 1 because POSIXct is denominated in seconds. The as.POSIXct(as.character(...)) is because cut returns a factor and this just converts it back to date-time so that we can do math on it.

If you want the end point to the nearest minute before the next interval (instead of the nearest second), you could to as.POSIXct(as.character(dat$by15)) + 60*14.

If you don't know the break interval, for example, because you chose the number of breaks and let R pick the interval, you could find the number of seconds to add by doing max(unique(diff(as.POSIXct(as.character(dat$by15))))) - 1.
0 讨论(0)
发布评论:

提交评论
- 加载中...