问题
I have a irregular time index from an xts object. I need to find the average number of seconds between two time observations. This is the my sample data:
dput(tt)
structure(c(1371.25, NA, 1373.95, NA, NA, 1373, NA, 1373.95,
1373.9, NA, NA, 1374, 1374.15, NA, 1374, 1373.85, 1372.55, 1374.05,
1374.15, 1374.75, NA, NA, 1375.9, 1374.05, NA, NA, NA, NA, NA,
NA, NA, 1375, NA, NA, NA, NA, NA, 1376.35, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 1376.25, NA, 1378, 1376.5, NA, NA, NA, 1378,
1378, NA, NA, 1378.8, 231.9, 231.85, NA, 231.9, 231.85, 231.9,
231.8, 231.9, 232.6, 231.95, 232.35, 232, 232.1, 232.05, 232.05,
232.05, 231.5, 231.3, NA, NA, 231.1, 231.1, 231.1, 231, 231,
230.95, 230.6, 230.6, 230.7, 230.6, 231, NA, 231, 231, 231.45,
231.65, 231.4, 231.7, 231.3, 231.25, 231.25, 231.4, 231.4, 231.85,
231.75, 231.5, 231.55, 231.35, NA, 231.5, 231.5, NA, 231.5, 231.25,
231.15, 231, 231, 231, 231.05, NA), .Dim = c(60L, 2L), .indexCLASS = c("POSIXct",
"POSIXt"), tclass = c("POSIXct", "POSIXt"), .indexTZ = "Asia/Calcutta", tzone = "Asia/Calcutta", index = structure(c(1459482299,
1459482301, 1459482302, 1459482303, 1459482304, 1459482305, 1459482306,
1459482307, 1459482309, 1459482310, 1459482311, 1459482312, 1459482314,
1459482315, 1459482316, 1459482317, 1459482318, 1459482319, 1459482320,
1459482321, 1459482322, 1459482323, 1459482324, 1459482326, 1459482328,
1459482329, 1459482330, 1459482331, 1459482332, 1459482336, 1459482337,
1459482338, 1459482339, 1459482342, 1459482344, 1459482346, 1459482347,
1459482348, 1459482349, 1459482590, 1459482591, 1459482594, 1459482595,
1459482596, 1459482597, 1459482598, 1459482599, 1459482602, 1459482603,
1459482604, 1459482609, 1459482610, 1459482611, 1459482612, 1459482613,
1459482618, 1459482619, 1459482620, 1459482622, 1459482628), tzone = "Asia/Calcutta", tclass = c("POSIXct",
"POSIXt")), .Dimnames = list(NULL, c("A", "B")), class = c("xts",
"zoo"))
This is my attempt:
difftime(index(tt),index(lag.xts(tt, k=1)), units=c("auto"))
Time differences in secs
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
attr(,"tclass")
[1] "POSIXct" "POSIXt"
Any help is highly appreciated.
Edit:
Based on the answers, I have made the following code. The code is meant to calculate mean number of seconds for A and B every day.
But the code takes the index of tt instead of A or B and so the results of A and B is same.
fun.time= function(x) mean(diff(time(x)))
df.time<-do.call(rbind, lapply(split(tt, "days"), FUN=function (x) {do.call(cbind, lapply(as.list(x), fun.time))}))
dput(df.time)
structure(c(5.57627118644068, 5.57627118644068), .Dim = 1:2, .Dimnames = list(
NULL, c("A", "B")))
回答1:
First create some test data that has more than one day. Using tt
from the question we create tt2
.
The remaining code calculates the differences of successive times by column and by day after removing that column's NA values. We lapply
over the columns and within column we aggregate
by date. To do that we remove the NAs of the current column and then construct a zoo object whose values are the numeric seconds and whose index is the formatted times. The numeric values are so that we can avoid dealing with difftime objects and their unpredictable units while format
causes as.Date
to take the date relative to the tzone
attribute of the data; otherwise, as.Date
would take the date relative to GMT which is not what is wanted (based on the comments by the poster below). This returns a zoo object but it's easy enough to apply as.xts
to the result if an xts output were important.
library(xts)
# test input
tt2 <- tt
time(tt2) <- time(tt) + seq(1, 24*60*60, length = 60)
do.call(cbind, lapply(tt2, function(x) {
times <- time(na.omit(x))
aggregate(zoo(as.numeric(times), format(times)), as.Date, function(x) mean(diff(x)))
}))
giving the following zoo series:
A B
2016-04-01 3029.0 1648.9
2016-04-02 5416.1 1633.0
Note 1: If we only needed the mean differences by column and not by date then we could eliminate the aggregate
simplifying it to the single line of code below. Here we are reverting back to using tt
as the test input since that is sufficient to illustrate this situation. Note that we again convert the index to numeric to avoid the unpredictability of difftime's output units.
sapply(tt, function(x) mean(diff(as.numeric(time(na.omit(x))))))
giving this named numeric vector:
A B
14.9545 6.2115
Note 2: With the development version of zoo this could be simplified. In that version of zoo there exists a coredata
argument to aggregate.zoo
and if set to FALSE
the entire zoo object will be sent to the function and not just the coredata
part. In the code below, we define functions to take the mean difference after NA removal and convert the index to character and then Date which has the effect of using the tzone
attribute of the input for its time zone (or the local time zone). Then we apply aggregate.zoo
by date over each column and cbind
the resulting list back together again:
library(xts)
mean_diff_time <- function(x) mean(diff(as.numeric(time(na.omit(x)))))
dates <- function(x) as.Date(format(x))
do.call("cbind", lapply(as.zoo(tt2), aggregate, dates, mean_diff_time, coredata = FALSE))
Update: Have rearranged presentation.
回答2:
To supplement the answer by G.Grothendieck, you can also use mean(diff(index(tt)))
to return the result as a difftime
object:
> mean(diff(index(tt)))
Time difference of 5.576271 secs
or simply mean(diff(.index(tt)))
to get the result as numeric:
> mean(diff(.index(tt)))
[1] 5.576271
EDIT:
> lapply(tt, function(x){mean(diff(.index(x[!is.na(x)])))})
$A
[1] 14.95455
$B
[1] 6.211538
来源:https://stackoverflow.com/questions/39999744/average-number-of-seconds-between-two-time-observations