R: Efficiently subsetting dataframe based on time of day

邮差的信 提交于 2019-11-30 22:31:24

Say that you have your target time t0 on the same scale as pTime: seconds since epoch. Then t0 - pTime = (difference in the number of days since epoch between the two) + (difference in remaining seconds). Taking t0 - pTime %% (num. seconds per day) will leave us with the difference in seconds in clock arithmetic (wrapped around if the difference is negative). This suggests the following function:

SecondsPerDay <- 24 * 60 * 60
within <- function(d, t0Sec, wMin) {
  diff <- (d$pTime - t0Sec) %% SecondsPerDay
  wSec <- 60 * wMin
  return(d[diff < wSec | diff > (SecondsPerDay - wSec), ])
}
G. Grothendieck

1) If DF is the data frame shown in the question then create a zoo object from it as you have done and split it into days giving zs. Then lapply your function f to each successive set of w points in each component (i.e. in each day). For example, if you want to apply your function to 2 hours of data at a time and your data is regularly spaced 5 minute data then w = 24 (since there are 24 five minute periods in two hours). In such a case f would be passed 24 rows of data as a matrix each time its called. Also align has been set to "right" below but it can alternately be set to align="center" and the condition giving ix can be changed to double sided, etc. For more on rollapply see: ?rollapply

library(zoo)
z <- zoo(DF[-2], as.POSIXct(DF[,1], origin = "1970-01-01"))
w <- 3 # replace this with 24 to handle two hours at a time with five min data
f <- function(x) {
            tt <- x[, 1]
            ix <- tt[w] - tt <= w * 5 * 60 # RHS converts w to seconds
            x <- x[ix, -1]
            sum(x) # replace sum with your function
    }
out <- rollapply(z, w, f, by.column = FALSE, align = "right")

Using the data frame in the question we get this:

> out
$`2008-05-30`
2008-05-30 02:00:00 2008-05-30 02:05:00 2008-05-30 02:10:00 2008-05-30 02:15:00 
          -66.04703           -83.92148           -95.93558          -100.24924 
2008-05-30 02:20:00 2008-05-30 02:25:00 2008-05-30 02:30:00 2008-05-30 02:35:00 
         -108.15038          -121.24519          -134.39873          -140.28436 

By the way, be sure to read this post .

2) This could alternately be done as the following where w and f are as above:

n <- nrow(DF)
m <- as.matrix(DF[-2])
sapply(w:n, function(i) { m <- m[seq(length = w, to = i), ]; f(m) })

Replace the sapply with lapply if needed. Also this may seem shorter than the first solution but its not much different once you add the code to define f and w (which appear in the first but not the second).

If there are no holes during the day and only holes between days then these solutions could be simplified.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!