问题
I'm trying to predict a simple lagged time series regression with the dyn library in R. This question was a helpful starting point, but I'm getting some weird behaviour that I'm hoping someone can explain.
Here's a minimum working example.
library(dyn)
# Initial data
y.orig <- arima.sim(model=list(ar=c(.9)),n=10)
x1.orig <- rnorm(10)
data <- cbind(y=y.orig, x1=x1.orig)
# This model, with a single lag term, predicts from t=2
mod1 <- dyn$lm(y ~ lag(y, -1), data)
y.new <- window(y.orig, end=end(y.orig) + c(5,0), extend=TRUE)
newdata1 <- cbind(y=y.new)
predict(mod1, newdata1)
# This one, with a lag plus another predictor, predicts from t=1 on
mod2 <- dyn$lm(y ~ lag(y, -1) + x1, data)
y.new <- window(y.orig, end=end(y.orig) + c(5,0), extend=TRUE)
x1.new <- c(x1.orig, rnorm(5))
newdata2 <- cbind(y=y.new, x1=x1.new)
predict(mod2, newdata2)
Why is there the difference between the two? Can anyone suggest how to predict my ''mod1'' using dyn? Thanks in advance.
回答1:
Both mod1
and mod2
start predicting at t=2
. The prediction vector for mod2
starts at t=1
but its NA
. Regarding why one starts at 2 and the other at 1 note that predict
merges together the variables on the right hand side of the formula and in the case of mod1
we see that lag(y, -1)
starts at t=2 since y
starts at t=1. On the other hand in the case of mod2
when we merge lag(y, -1)
and x1
we get a series that starts at t=1 (since x1
starts at t=1). Try this which does not involve dyn:
> start(with(as.list(newdata1), merge.zoo(lag(y, -1))))
[1] 2
> start(with(as.list(newdata2), merge.zoo(lag(y, -1), x1)))
[1] 1
If we wanted predict(mod1, newdata1)
to start at t=1 we could add our own Intercept column and remove the default intercept to avoid duplication. That would force it to start at 1 since now the RHS has a series which starts at 1:
data.b <- cbind(y=y.orig, x1=x1.orig, Intercept = 1)
mod.b <- dyn$lm(y ~ Intercept + lag(y, -1) - 1, data.b)
newdata.b <- cbind(Intercept = 1, y = y.new)
predict(mod.b, newdata.b)
Regarding the second question, if you want to predict mod1
then use fitted(mod1)
.
It seems there is lurking some third question about how it basically all works so maybe this clarifies it. All dyn does is to align the time series in the formula and then lm
and predict
can be run as usual. For example, if we create an aligned model frame using dyn$model.frame
then everything else can be done using just ordinary lm
and ordinary predict
and dyn
is not involved from that point onwards. Below mod1a
is similar to mod1
from the question except it runs an ordinary lm
on the aligned model frame. If you understand the mod1a
lm
and its predict
then mod1
and predict
are similar.
## mod1 and mod1a are similar
# from code in the question
mod1 <- dyn$lm(y ~ lag(y, -1), data = data)
mod1
# redo it using a plain lm by applying dyn to model.frame
mf <- dyn$model.frame(y ~ lag(y, -1), data = data)
mod1a <- lm(y ~ `lag(y, -1)`, mf)
mod1a
## the two predicts below are similar
# the 1 ensures its an mts rather than ts but is otherwise not used
newdata1 <- cbind(y=y.new, 1)
predict(mod1, newdata1)
newdata1a <- cbind(1, `lag(y, -1)` = lag(y.new, -1))
predict(mod1a, newdata1a)
来源:https://stackoverflow.com/questions/11215868/r-predicting-simple-dyn-model-with-one-lag-term