R: Efficiently subsetting dataframe based on time of day

后端 未结 2 972
挽巷
挽巷 2021-01-06 07:43

I have a large (150,000x7) dataframe that I intend to use for back-testing and real-time analysis of a financial market. The data represents the condition of an investment v

相关标签:
2条回答
  • 2021-01-06 08:09

    1) If DF is the data frame shown in the question then create a zoo object from it as you have done and split it into days giving zs. Then lapply your function f to each successive set of w points in each component (i.e. in each day). For example, if you want to apply your function to 2 hours of data at a time and your data is regularly spaced 5 minute data then w = 24 (since there are 24 five minute periods in two hours). In such a case f would be passed 24 rows of data as a matrix each time its called. Also align has been set to "right" below but it can alternately be set to align="center" and the condition giving ix can be changed to double sided, etc. For more on rollapply see: ?rollapply

    library(zoo)
    z <- zoo(DF[-2], as.POSIXct(DF[,1], origin = "1970-01-01"))
    w <- 3 # replace this with 24 to handle two hours at a time with five min data
    f <- function(x) {
                tt <- x[, 1]
                ix <- tt[w] - tt <= w * 5 * 60 # RHS converts w to seconds
                x <- x[ix, -1]
                sum(x) # replace sum with your function
        }
    out <- rollapply(z, w, f, by.column = FALSE, align = "right")
    

    Using the data frame in the question we get this:

    > out
    $`2008-05-30`
    2008-05-30 02:00:00 2008-05-30 02:05:00 2008-05-30 02:10:00 2008-05-30 02:15:00 
              -66.04703           -83.92148           -95.93558          -100.24924 
    2008-05-30 02:20:00 2008-05-30 02:25:00 2008-05-30 02:30:00 2008-05-30 02:35:00 
             -108.15038          -121.24519          -134.39873          -140.28436 
    

    By the way, be sure to read this post .

    2) This could alternately be done as the following where w and f are as above:

    n <- nrow(DF)
    m <- as.matrix(DF[-2])
    sapply(w:n, function(i) { m <- m[seq(length = w, to = i), ]; f(m) })
    

    Replace the sapply with lapply if needed. Also this may seem shorter than the first solution but its not much different once you add the code to define f and w (which appear in the first but not the second).

    If there are no holes during the day and only holes between days then these solutions could be simplified.

    0 讨论(0)
  • 2021-01-06 08:16

    Say that you have your target time t0 on the same scale as pTime: seconds since epoch. Then t0 - pTime = (difference in the number of days since epoch between the two) + (difference in remaining seconds). Taking t0 - pTime %% (num. seconds per day) will leave us with the difference in seconds in clock arithmetic (wrapped around if the difference is negative). This suggests the following function:

    SecondsPerDay <- 24 * 60 * 60
    within <- function(d, t0Sec, wMin) {
      diff <- (d$pTime - t0Sec) %% SecondsPerDay
      wSec <- 60 * wMin
      return(d[diff < wSec | diff > (SecondsPerDay - wSec), ])
    }
    
    0 讨论(0)
提交回复
热议问题