问题
I have a simple data set for which I applied a simple linear regression model. Now I would like to use fixed effects to make a better prediction on the model. I know that I could also consider making dummy variables, but my real dataset consist of more years and has more variables so I would like to avoid making dummies.
My data and code is similar to this:
data <- read.table(header = TRUE,
stringsAsFactors = FALSE,
text="CompanyNumber ResponseVariable Year ExplanatoryVariable1 ExplanatoryVariable2
1 2.5 2000 1 2
1 4 2001 3 1
1 3 2002 5 7
2 1 2000 3 2
2 2.4 2001 0 4
2 6 2002 2 9
3 10 2000 8 3")
library(lfe)
library(caret)
fe <- getfe(felm(data = data, ResponseVariable ~ ExplanatoryVariable1 + ExplanatoryVariable2 | Year))
fe
lm.1<-lm(ResponseVariable ~ ExplanatoryVariable1 + ExplanatoryVariable2, data=data)
prediction<- predict(lm.1, data)
prediction
check_model=postResample(pred = prediction, obs = data$ResponseVariable)
check_model
For my real dataset I will make a prediction based on my test set but for simplicity I just use the trainingset here as well.
I would like to make a prediction with the help of the fixed effects that I found. But it does not seem to match the fixed effect right, anyone who knows how to use this fe$effects
?
prediction_fe<- predict(lm.1, data) + fe$effect
回答1:
Here's a few extra comments on your setup and the models that you are running.
The primary model you are fitting is
lm.1<-lm(ResponseVariable ~ ExplanatoryVariable1 + ExplanatoryVariable2, data=data)
which yields
> lm.1
Call:
lm(formula = ResponseVariable ~ ExplanatoryVariable1 + ExplanatoryVariable2,
data = data)
Coefficients:
(Intercept) ExplanatoryVariable1 ExplanatoryVariable2
0.8901 0.7857 0.1923
When you run the predict
function on this model you get
> predict(lm.1)
1 2 3 4 5 6 7
2.060385 3.439410 6.164590 3.631718 1.659333 4.192205 7.752359
That corresponds to computing (for observation 1) : 0.8901 + 1*0.7857 + 2*0.1923 so the estimated fixed effects are used in the prediction. The felm
model is slightly more complicated as it "factors out" the year component. The model fit is shown here
> felm(data = data, ResponseVariable ~ ExplanatoryVariable1 + ExplanatoryVariable2 | Year)
ExplanatoryVariable1 ExplanatoryVariable2
0.9726 1.3262
Now this correspond to "correcting for" or conditioning on Year
so you get the same result if you fit
> lm(data = data, ResponseVariable ~ ExplanatoryVariable1 + ExplanatoryVariable2 + factor(Year))
Call:
lm(formula = ResponseVariable ~ ExplanatoryVariable1 + ExplanatoryVariable2 +
factor(Year), data = data)
Coefficients:
(Intercept) ExplanatoryVariable1 ExplanatoryVariable2 factor(Year)2001
-2.4848 0.9726 1.3262 0.9105
factor(Year)2002
-7.0286
and then just throw away all but the coefficients for the explanatory variables. Thus, you cannnot go from the extracted fixed effects from felm
and obtain the predictions (since you are lacking the intercept and all the year effects) - you can only see the effect sizes.
Hope this helps.
来源:https://stackoverflow.com/questions/45286538/prediction-using-fixed-effects