问题
my problem is this: I get NA
where I should get some values in the computation of robust standard errors.
I am trying to do a fixed effect panel regression with cluster-robust standard errors. For this, I follow Arai (2011) who on p. 3 follows Stock/ Watson (2006) (later published in Econometrica, for those who have access). I would like to correct the degrees of freedom by (M/(M-1)*(N-1)/(N-K)
against downward bias as my number of clusters is finite and I have unbalanced data.
Similar problems have been posted before [1, 2] on StackOverflow and related problems [3] on CrossValidated.
Arai (and the answer in the 1st link) uses the following code for functions (I provide my data below with some further comment):
gcenter <- function(df1,group) {
variables <- paste(
rep("C", ncol(df1)), colnames(df1), sep=".")
copydf <- df1
for (i in 1:ncol(df1)) {
copydf[,i] <- df1[,i] - ave(df1[,i], group,FUN=mean)}
colnames(copydf) <- variables
return(cbind(df1,copydf))}
# 1-way adjusting for clusters
clx <- function(fm, dfcw, cluster){
# R-codes (www.r-project.org) for computing
# clustered-standard errors. Mahmood Arai, Jan 26, 2008.
# The arguments of the function are:
# fitted model, cluster1 and cluster2
# You need to install libraries `sandwich' and `lmtest'
# reweighting the var-cov matrix for the within model
library(sandwich);library(lmtest)
M <- length(unique(cluster))
N <- length(cluster)
K <- fm$rank
dfc <- (M/(M-1))*((N-1)/(N-K))
uj <- apply(estfun(fm),2, function(x) tapply(x, cluster, sum));
vcovCL <- dfc*sandwich(fm, meat=crossprod(uj)/N)*dfcw
coeftest(fm, vcovCL) }
,where the gcenter
computes deviations from the mean (fixed effect). I then continue and do the regression with DS_CODE
being my cluster variable (I have named my data 'data').
centerdata <- gcenter(data, data$DS_CODE)
datalm <- lm(C.L1.retE1M ~ C.MCAP_SEC + C.Impact_change + C.Mom + C.BM + C.PD + C.CashGen + C.NITA + C.PE + C.PEdummy + factor(DS_CODE), data=centerdata)
M <- length(unique(data$DS_CODE))
dfcw <- datalm$df / (datalm$df - (M-1))
and want to calculate
clx(datalm, dfcw, data$DS_CODE)
However, when I want to compute uj (see formula clx
above) for the variance, I get only at the beginning some values for my regressors, then lots of zeros. If this input uj is used for the variance, only NAs
result.
My data
Since my data may be of special structure and I can't figure out the problem, I post the entire thing as a link from Hotmail. The reason is that with other data (taken from Arai (2011)) my problem does not occur. Sorry in advance for the mess but I'd be very grateful if you could have a look at it nevertheless. The file is a 5mb .txt file containing purely data.
回答1:
After some time playing around, it works for me and gives me:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.5099e-16 5.2381e-16 0.8610 0.389254
C.MCAP_SEC -5.9769e-07 1.2677e-07 -4.7149 2.425e-06 ***
C.Impact_change -5.3908e-04 7.5601e-05 -7.1306 1.014e-12 ***
C.Mom 3.7560e-04 3.3378e-03 0.1125 0.910406
C.BM -1.6438e-04 1.7368e-05 -9.4645 < 2.2e-16 ***
C.PD 6.2153e-02 3.8766e-02 1.6033 0.108885
C.CashGen -2.7876e-04 1.4031e-02 -0.0199 0.984149
C.NITA -8.1792e-02 3.2153e-02 -2.5438 0.010969 *
C.PE -6.6170e-06 4.0138e-06 -1.6485 0.099248 .
C.PEdummy 1.3143e-02 4.8864e-03 2.6897 0.007154 **
factor(DS_CODE)130324 -5.2497e-16 5.2683e-16 -0.9965 0.319028
factor(DS_CODE)130409 -4.0276e-16 5.2384e-16 -0.7689 0.441986
factor(DS_CODE)130775 -4.4113e-16 5.2424e-16 -0.8415 0.400089
...
This leaves us with the question why it doesn't for you. I guess it has something to do with the format of your data. Is everything numeric? I converted the column classes and it looks like that for me:
str(dat)
'data.frame': 48251 obs. of 12 variables:
$ DS_CODE : chr "902172" "902172" "902172" "902172" ...
$ DNEW : num 2e+05 2e+05 2e+05 2e+05 2e+05 ...
$ MCAP_SEC : num 78122 71421 81907 80010 82462 ...
$ NITA : num 0.135 0.135 0.135 0.135 0.135 ...
$ CashGen : num 0.198 0.198 0.198 0.198 0.198 ...
$ BM : num 0.1074 0.1108 0.097 0.0968 0.0899 ...
$ PE : num 57 55.3 63.1 63.2 68 ...
$ PEdummy : num 0 0 0 0 0 0 0 0 0 0 ...
$ L1.retE1M : num -0.72492 0.13177 0.00122 0.07214 -0.07332 ...
$ Mom : num 0 0 0 0 0 ...
$ PD : num 5.41e-54 1.51e-66 3.16e-80 2.87e-79 4.39e-89 ...
$ Impact_change: num 0 -10.59 -10.43 0.7 -6.97 ...
What does str(data)
return for you?
回答2:
The plm
package can estimate clustered SEs for panel regressions. The original data is no longer available, so here's an example using dummy data.
require(foreign)
require(plm)
require(lmtest)
test <- read.dta("http://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se/test_data.dta")
fpm <- plm(y ~ x, test, model='pooling', index=c('firmid', 'year'))
##Arellano clustered by *group* SEs
> coeftest(fpm, vcov=function(x) vcovHC(x, cluster="group", type="HC0"))
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.029680 0.066939 0.4434 0.6575
x 1.034833 0.050540 20.4755 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
If you're using lm
models (instead of plm
), then the multiwayvcov
package may help.
library("lmtest")
library("multiwayvcov")
data(petersen)
m1 <- lm(y ~ x, data = petersen)
> coeftest(m1, vcov=function(x) cluster.vcov(x, petersen[ , c("firmid")],
df_correction=FALSE))
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.029680 0.066939 0.4434 0.6575
x 1.034833 0.050540 20.4755 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
For more details see:
- Fama-MacBeth and Cluster-Robust (by Firm and Time) Standard Errors in R.
See also:
- Double clustered standard errors for panel data
来源:https://stackoverflow.com/questions/10691341/panel-data-regression-robust-standard-errors