问题
I'm trying to produce a predictive model where i performed multiple pooled regressions in each year (based on previous years) and thus allow coefficients to vary across time. (This might not make sense in the sample data provided, but it is done in practice for my sample).
Here is what I came up so far: I adjusted my code to a reproducible sample from the plm package:
The data is structured in the following way (panel) with firm, year indexed.
> head(Grunfeld)
firm year inv value capital
1 1 1935 317.6 3078.5 2.8
2 1 1936 391.8 4661.7 52.6
3 1 1937 410.6 5387.1 156.9
4 1 1938 257.7 2792.2 209.2
5 1 1939 330.8 4313.2 203.4
6 1 1940 461.2 4643.9 207.2
and here is my code:
library(plm)
data("Grunfeld", package="plm")
# Store each subset regression in myregression
myregression <- list()
count <- 1
## pooled regression in each year t,
## with subset data of the previous six years (t-5)
for(t in 1940:1950){
myregression[[count]] <- plm(inv ~ value + capital,
subset(Grunfeld, year<=t & year>=t-5),
index=c("firm","year"))
# Name each regression based on the year range included in the data subset
names(myregression)[[count]] = paste0("Year_",t)
count <- count+1
}
## Prediction
#######################
# Alternative 1: Loop
Forecast<-list()
count<-1
for(t in 1940:1950){
Forecast[[count]]<-predict(myregression[[count]], subset(Grunfeld, year==t))
## Name each Prediction based on the year t:
names(Forecast)[[count]] = paste0("Year_",t)
count <- count+1
}
Unfortunately my code does not work and i get the following error:
Error in crossprod(beta, t(X)) : non-conformable arguments
Ideally i would like to store my Predictions/Forecasts in $Grunfeld$Forecast in the same structure as the original Grunfeld data. However I experienced a lot of difficulties working with Lists and often failed to correctly address them and store the results in a vector next to the original data. This is crucial as in my own sample, there is a lot of missing data (NA's) and i can only use the predict function on a limited subset. How do you arrange the data in a desired way?
And is this the right approach to obtain conditional forecasts (on the year)with varying slopes and storing them in the same manner as the original data or are there more efficient ways i'm unaware of?
回答1:
Note that you are not estimating a pooled regression. plm
, by default, estimates a within
model. A quick summary of the first regression reveals this. See e.g. summary(myregression[[1]]
, whose first lines read:
Oneway (individual) effect Within Model
Call:
plm(formula = inv ~ value + capital, data = subset(Grunfeld,
year <= t & year >= t - 5), index = c("firm", "year"))
...
Since you talk about a pooled regression, try the following code. I took the liberty to make it a bit shorter:
for(t in 1940:1950){
myregression[[as.character(t)]] <- plm(inv ~ value + capital,
subset(Grunfeld, year<=t & year>=t-5),
index=c("firm","year") , model="pooling")
}
for(t in 1940:1950){
Forecast[[as.character(t)]]<-predict(myregression[[as.character(t)]],
subset(Grunfeld, year==t))
}
This gives you your predicted values without error messages.
I can't comment on your last question about whether or not this is the right statistical approach, but I hope that the R-related question is settled.
To respond to your comment, try
Grunfeld$forc <- NA
for(t in 1940:1950){
Grunfeld[which(Grunfeld$year==as.character(t)), "forc"] <-
predict(myregression[[as.character(t)]], subset(Grunfeld, year==t))
}
来源:https://stackoverflow.com/questions/24849234/model-prediction-for-pooled-regression-model-in-panel-data