Average number of seconds between two time observations

问题

I have a irregular time index from an xts object. I need to find the average number of seconds between two time observations. This is the my sample data:

dput(tt)
structure(c(1371.25, NA, 1373.95, NA, NA, 1373, NA, 1373.95, 
1373.9, NA, NA, 1374, 1374.15, NA, 1374, 1373.85, 1372.55, 1374.05, 
1374.15, 1374.75, NA, NA, 1375.9, 1374.05, NA, NA, NA, NA, NA, 
NA, NA, 1375, NA, NA, NA, NA, NA, 1376.35, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, 1376.25, NA, 1378, 1376.5, NA, NA, NA, 1378, 
1378, NA, NA, 1378.8, 231.9, 231.85, NA, 231.9, 231.85, 231.9, 
231.8, 231.9, 232.6, 231.95, 232.35, 232, 232.1, 232.05, 232.05, 
232.05, 231.5, 231.3, NA, NA, 231.1, 231.1, 231.1, 231, 231, 
230.95, 230.6, 230.6, 230.7, 230.6, 231, NA, 231, 231, 231.45, 
231.65, 231.4, 231.7, 231.3, 231.25, 231.25, 231.4, 231.4, 231.85, 
231.75, 231.5, 231.55, 231.35, NA, 231.5, 231.5, NA, 231.5, 231.25, 
231.15, 231, 231, 231, 231.05, NA), .Dim = c(60L, 2L), .indexCLASS = c("POSIXct", 
"POSIXt"), tclass = c("POSIXct", "POSIXt"), .indexTZ = "Asia/Calcutta", tzone = "Asia/Calcutta", index = structure(c(1459482299, 
1459482301, 1459482302, 1459482303, 1459482304, 1459482305, 1459482306, 
1459482307, 1459482309, 1459482310, 1459482311, 1459482312, 1459482314, 
1459482315, 1459482316, 1459482317, 1459482318, 1459482319, 1459482320, 
1459482321, 1459482322, 1459482323, 1459482324, 1459482326, 1459482328, 
1459482329, 1459482330, 1459482331, 1459482332, 1459482336, 1459482337, 
1459482338, 1459482339, 1459482342, 1459482344, 1459482346, 1459482347, 
1459482348, 1459482349, 1459482590, 1459482591, 1459482594, 1459482595, 
1459482596, 1459482597, 1459482598, 1459482599, 1459482602, 1459482603, 
1459482604, 1459482609, 1459482610, 1459482611, 1459482612, 1459482613, 
1459482618, 1459482619, 1459482620, 1459482622, 1459482628), tzone = "Asia/Calcutta", tclass = c("POSIXct", 
"POSIXt")), .Dimnames = list(NULL, c("A", "B")), class = c("xts", 
"zoo"))

This is my attempt:

difftime(index(tt),index(lag.xts(tt, k=1)), units=c("auto"))
Time differences in secs
 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
attr(,"tclass")
[1] "POSIXct" "POSIXt"

Any help is highly appreciated.

Edit:

Based on the answers, I have made the following code. The code is meant to calculate mean number of seconds for A and B every day.

But the code takes the index of tt instead of A or B and so the results of A and B is same.

fun.time= function(x) mean(diff(time(x)))
df.time<-do.call(rbind, lapply(split(tt, "days"), FUN=function (x) {do.call(cbind, lapply(as.list(x), fun.time))})) 


dput(df.time)
structure(c(5.57627118644068, 5.57627118644068), .Dim = 1:2, .Dimnames = list(
    NULL, c("A", "B")))

回答1:

First create some test data that has more than one day. Using tt from the question we create tt2.

The remaining code calculates the differences of successive times by column and by day after removing that column's NA values. We lapply over the columns and within column we aggregate by date. To do that we remove the NAs of the current column and then construct a zoo object whose values are the numeric seconds and whose index is the formatted times. The numeric values are so that we can avoid dealing with difftime objects and their unpredictable units while format causes as.Date to take the date relative to the tzone attribute of the data; otherwise, as.Date would take the date relative to GMT which is not what is wanted (based on the comments by the poster below). This returns a zoo object but it's easy enough to apply as.xts to the result if an xts output were important.

library(xts)

# test input
tt2 <- tt
time(tt2) <- time(tt) + seq(1, 24*60*60, length = 60)

do.call(cbind, lapply(tt2, function(x) {
  times <- time(na.omit(x))
  aggregate(zoo(as.numeric(times), format(times)), as.Date, function(x) mean(diff(x)))
}))

giving the following zoo series:

                A      B
2016-04-01 3029.0 1648.9
2016-04-02 5416.1 1633.0

Note 1: If we only needed the mean differences by column and not by date then we could eliminate the aggregate simplifying it to the single line of code below. Here we are reverting back to using tt as the test input since that is sufficient to illustrate this situation. Note that we again convert the index to numeric to avoid the unpredictability of difftime's output units.

sapply(tt, function(x) mean(diff(as.numeric(time(na.omit(x))))))

giving this named numeric vector:

      A       B 
14.9545  6.2115

Note 2: With the development version of zoo this could be simplified. In that version of zoo there exists a coredata argument to aggregate.zoo and if set to FALSE the entire zoo object will be sent to the function and not just the coredata part. In the code below, we define functions to take the mean difference after NA removal and convert the index to character and then Date which has the effect of using the tzone attribute of the input for its time zone (or the local time zone). Then we apply aggregate.zoo by date over each column and cbind the resulting list back together again:

library(xts)

mean_diff_time <- function(x) mean(diff(as.numeric(time(na.omit(x)))))
dates <- function(x) as.Date(format(x))

do.call("cbind", lapply(as.zoo(tt2), aggregate, dates, mean_diff_time, coredata = FALSE))

Update: Have rearranged presentation.

回答2:

To supplement the answer by G.Grothendieck, you can also use mean(diff(index(tt)))to return the result as a difftime object:

> mean(diff(index(tt)))
Time difference of 5.576271 secs

or simply mean(diff(.index(tt))) to get the result as numeric:

> mean(diff(.index(tt)))
[1] 5.576271

EDIT:

> lapply(tt, function(x){mean(diff(.index(x[!is.na(x)])))})
$A
[1] 14.95455

$B
[1] 6.211538

来源：https://stackoverflow.com/questions/39999744/average-number-of-seconds-between-two-time-observations

标签

xts

difftime