fixed effects in R: plm vs lm + factor()

I'm trying to run a fixed effects regression model in R. I want to control for heterogeneity in variables C and D (neither are a time variable).

I tried the following two approaches:

1) Use the plm package: Gives me the following error message

formula = Y ~ A + B + C + D

reg = plm(formula, data= data, index=c('C','D'), method = 'within')

duplicate couples (time-id)Error in pdim.default(index[[1]], index[[2]]) :

I also tried creating first a panel using

data_p = pdata.frame(data,index=c('C','D'))

But I have repeated observations in both columns.

2) Use factor() and lm: works well

formula = Y ~ A + B + factor(C) + factor(D)
reg = lm(formula, data= data)

What is the difference between the two methods? Why is plm not working for me? is it because one of the indices should be time?

Rodrigo Remedio

That error is saying you have repeated id-time pairs formed by variables C and D.

Let's say you have a third variable F which jointly with C keep individuals distinct from other one (or your first dimension, whatever it is). Then with dplyr you can create a unique indice, say id :

data.frame$id <- data.frame %>% group_indices(C, F)

The the index argument in plm becomes index = c(id, D).

The lm + factor() is a solution just in case you have distinct observations. If this is not the case, it will not properly weights the result within each id, that is, the fixed effect is not properly identified.

来源：https://stackoverflow.com/questions/39563595/fixed-effects-in-r-plm-vs-lm-factor

标签

plm

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!