问题
Suppose I want to fit a linear regression model with degree two (orthogonal) polynomial and then predict the response. Here are the codes for the first model (m1)
x=1:100
y=-2+3*x-5*x^2+rnorm(100)
m1=lm(y~poly(x,2))
prd.1=predict(m1,newdata=data.frame(x=105:110))
Now let's try the same model but instead of using $poly(x,2)$, I will use its columns like:
m2=lm(y~poly(x,2)[,1]+poly(x,2)[,2])
prd.2=predict(m2,newdata=data.frame(x=105:110))
Let's look at the summaries of m1 and m2.
> summary(m1)
Call:
lm(formula = y ~ poly(x, 2))
Residuals:
Min 1Q Median 3Q Max
-2.50347 -0.48752 -0.07085 0.53624 2.96516
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.677e+04 9.912e-02 -169168 <2e-16 ***
poly(x, 2)1 -1.449e+05 9.912e-01 -146195 <2e-16 ***
poly(x, 2)2 -3.726e+04 9.912e-01 -37588 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9912 on 97 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 1.139e+10 on 2 and 97 DF, p-value: < 2.2e-16
> summary(m2)
Call:
lm(formula = y ~ poly(x, 2)[, 1] + poly(x, 2)[, 2])
Residuals:
Min 1Q Median 3Q Max
-2.50347 -0.48752 -0.07085 0.53624 2.96516
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.677e+04 9.912e-02 -169168 <2e-16 ***
poly(x, 2)[, 1] -1.449e+05 9.912e-01 -146195 <2e-16 ***
poly(x, 2)[, 2] -3.726e+04 9.912e-01 -37588 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9912 on 97 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 1.139e+10 on 2 and 97 DF, p-value: < 2.2e-16
So m1 and m2 are basically the same. Now let's look at the predictions prd.1 and prd.2
> prd.1
1 2 3 4 5 6
-54811.60 -55863.58 -56925.56 -57997.54 -59079.52 -60171.50
> prd.2
1 2 3 4 5 6
49505.92 39256.72 16812.28 -17827.42 -64662.35 -123692.53
Q1: Why prd.2 is significantly different from prd.1?
Q2: How can I obtain prd.1 using the model m2?
回答1:
m1
is the right way to do this. m2
is entering a whole world of pain...
To do predictions from m2
, the model needs to know it was fitted to an orthogonal set of basis functions, so that it uses the same basis functions for the extrapolated new data values. Compare: poly(1:10,2)[,2]
with poly(1:12,2)[,2]
- the first ten values are not the same. If you fit the model explicitly with poly(x,2)
then predict
understands all that and does the right thing.
What you have to do is make sure your predicted locations are transformed using the same set of basis functions as used to create the model in the first place. You can use predict.poly
for this (note I call my explanatory variables x1
and x2
so that its easy to match the names up):
px = poly(x,2)
x1 = px[,1]
x2 = px[,2]
m3 = lm(y~x1+x2)
newx = 90:110
pnew = predict(px,newx) # px is the previous poly object, so this calls predict.poly
prd.3 = predict(m3, newdata=data.frame(x1=pnew[,1],x2=pnew[,2]))
来源:https://stackoverflow.com/questions/13896578/polynomial-regression-nonsense-predictions