Model Prediction for pooled regression model in panel data

旧巷老猫 提交于 2019-12-11 09:07:59

问题


I'm trying to produce a predictive model where i performed multiple pooled regressions in each year (based on previous years) and thus allow coefficients to vary across time. (This might not make sense in the sample data provided, but it is done in practice for my sample).

Here is what I came up so far: I adjusted my code to a reproducible sample from the plm package:

The data is structured in the following way (panel) with firm, year indexed.

> head(Grunfeld)
  firm year   inv  value capital
1    1 1935 317.6 3078.5     2.8
2    1 1936 391.8 4661.7    52.6
3    1 1937 410.6 5387.1   156.9
4    1 1938 257.7 2792.2   209.2
5    1 1939 330.8 4313.2   203.4
6    1 1940 461.2 4643.9   207.2

and here is my code:

library(plm)
data("Grunfeld", package="plm")

# Store each subset regression in myregression
myregression <- list()
count <- 1

## pooled regression in each year t, 
## with subset data of the previous six years (t-5) 

for(t in 1940:1950){  
  myregression[[count]] <- plm(inv ~ value + capital, 
                              subset(Grunfeld, year<=t & year>=t-5),
                              index=c("firm","year"))
# Name each regression based on the year range included in the data subset
names(myregression)[[count]] = paste0("Year_",t)
count <- count+1
}


## Prediction
#######################
# Alternative 1: Loop

Forecast<-list()
count<-1
for(t in 1940:1950){
  Forecast[[count]]<-predict(myregression[[count]], subset(Grunfeld, year==t))
  ## Name each Prediction based on the year t:
 names(Forecast)[[count]] = paste0("Year_",t)
 count <- count+1
}

Unfortunately my code does not work and i get the following error:

Error in crossprod(beta, t(X)) : non-conformable arguments

Ideally i would like to store my Predictions/Forecasts in $Grunfeld$Forecast in the same structure as the original Grunfeld data. However I experienced a lot of difficulties working with Lists and often failed to correctly address them and store the results in a vector next to the original data. This is crucial as in my own sample, there is a lot of missing data (NA's) and i can only use the predict function on a limited subset. How do you arrange the data in a desired way?

And is this the right approach to obtain conditional forecasts (on the year)with varying slopes and storing them in the same manner as the original data or are there more efficient ways i'm unaware of?


回答1:


Note that you are not estimating a pooled regression. plm, by default, estimates a within model. A quick summary of the first regression reveals this. See e.g. summary(myregression[[1]], whose first lines read:

Oneway (individual) effect Within Model

Call:
plm(formula = inv ~ value + capital, data = subset(Grunfeld, 
    year <= t & year >= t - 5), index = c("firm", "year"))

...

Since you talk about a pooled regression, try the following code. I took the liberty to make it a bit shorter:

for(t in 1940:1950){  
  myregression[[as.character(t)]] <- plm(inv ~ value + capital, 
                                         subset(Grunfeld, year<=t & year>=t-5),
                                         index=c("firm","year") , model="pooling")
}
for(t in 1940:1950){
  Forecast[[as.character(t)]]<-predict(myregression[[as.character(t)]], 
                                       subset(Grunfeld, year==t))
}

This gives you your predicted values without error messages.

I can't comment on your last question about whether or not this is the right statistical approach, but I hope that the R-related question is settled.

To respond to your comment, try

Grunfeld$forc <- NA

for(t in 1940:1950){
  Grunfeld[which(Grunfeld$year==as.character(t)), "forc"] <-
               predict(myregression[[as.character(t)]], subset(Grunfeld, year==t))
}


来源:https://stackoverflow.com/questions/24849234/model-prediction-for-pooled-regression-model-in-panel-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!