问题
I have a vector Y containing future returns and a vector X contain current returns. The last Y element is NA, as the last current return is also the very end of the available series.
X = { 0.1, 0.3, 0.2, 0.5 }
Y = { 0.3, 0.2, 0.5, NA }
Other = { 5500, 222, 523, 3677 }
lm(Y ~ X + Other)
I want to make sure that the last element of each series is not included in the regression. I read the na.action documentation but I'm not clear if this is the default behaviour.
For cor(), is this the correct solution to exclude X[4] and Y[4] from the calculation?
cor(X, Y, use = "pairwise.complete.obs")
回答1:
The factory-fresh default for lm
is to disregard observations containing NA
values. Since this could be overridden using global options, you might want to explicitly set na.action
to na.omit
:
> summary(lm(Y ~ X + Other, na.action=na.omit))
Call:
lm(formula = Y ~ X + Other, na.action = na.omit)
[snip]
(1 observation deleted due to missingness)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
As to your second question cor(X,Y,use='pairwise.complete.obs')
is correct. Since there are only two variables, cor(X,Y,use='complete.obs')
would also produce the expected result.
来源:https://stackoverflow.com/questions/8448019/prevent-na-from-being-used-in-a-lm-regresion