Why factor is not included in first differences model?

北战南征 提交于 2021-01-07 01:24:26

问题


Let's consider data following:

library(plm)
data("EmplUK", package="plm")
df1<-EmplUK
df1 <- cbind(df1,"Trend" = as.numeric(as.factor(unlist(df1[, 2])))) 
> head(df1)
  firm year sector   emp    wage capital   output Trend
1    1 1977      7 5.041 13.1516  0.5894  95.7072     2
2    1 1978      7 5.600 12.3018  0.6318  97.3569     3
3    1 1979      7 5.015 12.8395  0.6771  99.6083     4
4    1 1980      7 4.715 13.8039  0.6171 100.5501     5
5    1 1981      7 4.093 14.2897  0.5076  99.5581     6
6    1 1982      7 3.166 14.8681  0.4229  98.6151     7

I want to perform first difference panel regression. So:

> plm(capital~wage+output+Trend,data=df1, model = 'fd')

Model Formula: capital ~ wage + output + Trend

Coefficients:
(Intercept)        wage      output 
  0.0111227  -0.0014415   0.0110732

My question is: Why 'Trend' is not included in my plm model? And is there any possibility in which I can include it?


回答1:


For calculating first differences, plm internally uses c("firm", "year") columns. This can be shown with:

plm(capital ~ wage + output + Trend, 
    data=df1[-which(names(df1) %in% c("firm", "year"))],
    model='fd')  ## throws a warning
# Model Formula: capital ~ wage + output + Trend
# 
# Coefficients:
#   (Intercept)        wage      output       Trend 
#      0.165677    0.076483    0.038369    0.261935 

As we can see "Trend" appears now (of course the result is wrong).

You can see the reason when looking in the correlation matrix of your data.

round(cor(df1))
#         firm year sector emp wage capital output Trend
# firm       1    0      0   0    0       0      0     0
# year       0    1      0   0    0       0     -1     1
# sector     0    0      1   0    0       0      0     0
# emp        0    0      0   1    0       1      0     0
# wage       0    0      0   0    1       0      0     0
# capital    0    0      0   1    0       1      0     0
# output     0   -1      0   0    0       0      1    -1
# Trend      0    1      0   0    0       0     -1     1

"Trend" and "year" are perfectly correlated, i.e. you're experiencing multicollinearity.

with(df1, cor(Trend, year))
# [1] 1

Using lm such coefficients would be displayed as NA, similar to

r <- lm(capital ~ wage + output + factor(year) + factor(firm) + Trend, 
   data=df1)$coe
r[-grep("year|firm", names(r))]
# (Intercept)        wage      output       Trend 
# -2.62878756  0.03206621  0.02363581          NA 

whereas plm drops them.



来源:https://stackoverflow.com/questions/65324590/why-factor-is-not-included-in-first-differences-model

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!