问题
I try to interpolate this meterValue, full csv here: https://drive.google.com/open?id=18cwtw-chAB-FqqCesXZJ-6NB6eHFJlgQ
localminute,dataid,meter_value
2015-10-03 09:51:53,6578,157806
2015-10-13 13:41:49,6578,158086
:
:
2016-01-17 16:00:33,6578,164544 #end of meter_value data for ID=6578
Based on what @G. Grothendieck, suggested, and I got error at z.interpolate (merging data)
D6578z <- read.csv.zoo("test_6578.csv")[,2]
D6578zd <- to.daily(D6578z)[,4]
#Warning messages:
#1: In zoo(xx, order.by = index(x), ...) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique
#2: In zoo(rval, index(x)[i]) :some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique
test_6578t <- time(D6578zd)
plot(D6578zd,type="p",xaxt="n", pch=19, col="blue",cex=1.5)
diff(test_6578t)
t.daily6578 <- seq(from =min(test_6578t),to=max(test_6578t),by="1 day")
dummy6578 <- zoo(,t.daily6578)
z.interpolated <- merge(D6578zd,dummy6578,all=TRUE)
*#Error in merge.zoo(D6578zd, dummy6578, all = TRUE) : series cannot be merged with non-unique index entries in a series*
Solution of R code for one hour time difference in interpolated data provided by @G. Grothendieck, as below.
Hi @G. Grothendieck, Thanks for solution code. I have some questions to clarify with you regarding about your code.
`line1: to.hour <- function(x) as.POSIXct(trunc(as.POSIXct(x, origin = "1970-01-01"), "hour"))
line2: z <- read.csv.zoo("test_6578.csv", FUN = to.hour, aggregate = function(x) tail(x, 1))`
`line3: zz <- na.approx(as.zoo(as.ts(z)))`
`line4: time(zz) <- as.POSIXct(time(zz), origin = "1970-01-01")`
in line1, why “as.POSIXct” before `trunc(as.POSIXct(x,origin =”1970-01-01”)?
I understand that "trunc" function round up the datetime value.In line2, What does this code mean “FUN=to.hour, aggregate =function(x) tail (x,1)” work?
As I could not understand what is tail(x,1). I extracted the
z
function in csv file, I observed that only dataid and meter_value columns are generated when ‘read.csv.zoo’ function is used.In line3, I understand that,
zz
function gives interpolated data but I didn’t fully understand the code “na.approx(as.zoo(as.ts(z)))” , sincez
is already zoo series after read.csv.zoo, why we still have to use “as.zoo” and “as.ts” in “na.approx” line?what is the difference between zoo and zooreg series?
In line4, “time(zz)” is the index of “zz” function?
Thanks in advance your explanation.
I could plot the interpolated data with time difference=1hour.
回答1:
Read the file in using read.csv.zoo
converting to Date
class aggregating duplicate dates such that the last one is used. Then convert to ts
and back to zoo which will fill in empty days with NAs. Now use na.approx
to fill in the NA values. Since ts
cannot represent Date
class the resulting series will have numbers representing dates so convert them back.
library(zoo)
z <- read.csv.zoo("test_6578.csv", FUN = as.Date, aggregate = function(x) tail(x, 1))
zz <- na.approx(as.zoo(as.ts(z)))
time(zz) <- as.Date(time(zz))
In comments there was a claim that there are holes in the ouptut but that is not the case. The difference between successive times is identically 1 and there are no NAs.
table(diff(time(zz)))
## 1
## 106
any(is.na(zz))
## [1] FALSE
any(is.na(time(zz)))
## [1] FALSE
Here is an example of doing this for one hour instead of one day differences.
to.hour <- function(x) as.POSIXct(trunc(as.POSIXct(x, origin = "1970-01-01"), "hour"))
z <- read.csv.zoo("test_6578.csv", FUN = to.hour, aggregate = function(x) tail(x, 1))
zz <- na.approx(as.zoo(as.ts(z)))
time(zz) <- as.POSIXct(time(zz), origin = "1970-01-01")
plot(zz[, 2], type = "p", pch = ".")
来源:https://stackoverflow.com/questions/52795960/interpolate-data-for-irregular-time-series