Imputing missing values using ARIMA model

问题

I am trying to impute missing values in a time series with an ARIMA model in R. I tried this code but no success.

x <- AirPassengers
x[90:100] <- NA
fit <- auto.arima(x)
fitted(fit)[90:100]  ## this is giving me NAs
plot(x)
lines(fitted(fit), col="red")

The fitted model is not imputing the missing values. Any idea on how this is done?

回答1:

fitted gives in-sample one-step forecasts. The "right" way to do what you want is via a Kalman smoother. A rough approximation good enough for most purposes is obtained using the average of the forward and backward forecasts for the missing section. Like this:

x <- AirPassengers
x[90:100] <- NA
fit <- auto.arima(x)
fit1 <- forecast(Arima(AirPassengers[1:89],model=fit),h=10)
fit2 <- forecast(Arima(rev(AirPassengers[101:144]), model=fit), h=10)

plot(x)
lines(ts(0.5*c(fit1$mean+rev(fit2$mean)), 
  start=time(AirPassengers)[90],freq=12), col="red")

回答2:

As said by Rob, using a Kalman Smoother is usually the "better" solution.

This can for example be done via the imputeTS package (disclaimer: I maintain the package). (https://cran.r-project.org/web/packages/imputeTS/index.html)

library("imputeTS")
x <- AirPassengers
x[90:100] <- NA
x <- na.kalman(x, model = "auto.arima")

Internally the imputeTS package performs KalmanSmoothing on the State Space Representation of the ARIMA model obtained by auto.arima.

Even if the theoretical background is not easy to understand, it usually gives very good results :)

来源：https://stackoverflow.com/questions/30584423/imputing-missing-values-using-arima-model

标签

time-series

missing-data