panel-data

Panel data regression: Robust standard errors

安稳与你 提交于 2019-12-03 13:27:45
问题 my problem is this: I get NA where I should get some values in the computation of robust standard errors. I am trying to do a fixed effect panel regression with cluster-robust standard errors. For this, I follow Arai (2011) who on p. 3 follows Stock/ Watson (2006) (later published in Econometrica, for those who have access). I would like to correct the degrees of freedom by (M/(M-1)*(N-1)/(N-K) against downward bias as my number of clusters is finite and I have unbalanced data. Similar

How to deal with NA in a panel data regression?

我的未来我决定 提交于 2019-12-03 09:47:55
问题 I am trying to predict fitted values over data containing NA s, and based on a model generated by plm . Here's some sample code: require(plm) test.data <- data.frame(id=c(1,1,2,2,3), time=c(1,2,1,2,1), y=c(1,3,5,10,8), x=c(1, NA, 3,4,5)) model <- plm(y ~ x, data=test.data, index=c("id", "time"), model="pooling", na.action=na.exclude) yhat <- predict(model, test.data, na.action=na.pass) test.data$yhat <- yhat When I run the last line I get an error stating that the replacement has 4 rows while

Difference in Differences in Python + Pandas

馋奶兔 提交于 2019-12-03 07:32:57
I'm trying to perform a Difference in Differences (with panel data and fixed effects) analysis using Python and Pandas. I have no background in Economics and I'm just trying to filter the data and run the method that I was told to. However, as far as I could learn, I understood that the basic diff-in-diffs model looks like this: I.e., I am dealing with a multivariable model. Here it follows a simple example in R: https://thetarzan.wordpress.com/2011/06/20/differences-in-differences-estimation-in-r-and-stata/ As it can be seen, the regression takes as input one dependent variable and tree sets

Efficient calculation of var-covar matrix in R

雨燕双飞 提交于 2019-12-03 02:01:06
I'm looking for efficiency gains in calculating the (auto)covariance matrix from individual measurements over time t with t, t-1 , etc.. In the data matrix, each row represents an individual and each column represents monthly measurements (the columns are in time order). Similar to the following data (although with some more co-variance). # simulate data set.seed(1) periods <- 70L ind <- 90000L mat <- sapply(rep(ind, periods), rnorm) Below is the (ugly) code I came up with to get the covariance matrix for measurements/ lagged measurements. It takes almost 4 seconds to run. I'm sure that by

How to deal with NA in a panel data regression?

[亡魂溺海] 提交于 2019-12-03 00:21:44
I am trying to predict fitted values over data containing NA s, and based on a model generated by plm . Here's some sample code: require(plm) test.data <- data.frame(id=c(1,1,2,2,3), time=c(1,2,1,2,1), y=c(1,3,5,10,8), x=c(1, NA, 3,4,5)) model <- plm(y ~ x, data=test.data, index=c("id", "time"), model="pooling", na.action=na.exclude) yhat <- predict(model, test.data, na.action=na.pass) test.data$yhat <- yhat When I run the last line I get an error stating that the replacement has 4 rows while data has 5 rows. I have no idea how to get predict return a vector of length 5... If instead of

R packages effects & plm : “error in contrasts” when trying to plot marginal effects

*爱你&永不变心* 提交于 2019-11-30 09:28:45
问题 After reading this answer on error in contrasts and reviewing my data I am still stuck with a problem when trying to combine packages 'plm' and 'effects'. This might be impossible as John Fox does not discuss this possibility in his effects document (link is not allowed apparently - Google: "john fox effects package" if you want to have a look). So if it is indeed impossible please tell me. I am running a simple regression on a reduced data set library(plm) ; library(effects) shortdata<-plm

R packages effects & plm : “error in contrasts” when trying to plot marginal effects

孤街醉人 提交于 2019-11-29 15:47:42
After reading this answer on error in contrasts and reviewing my data I am still stuck with a problem when trying to combine packages 'plm' and 'effects'. This might be impossible as John Fox does not discuss this possibility in his effects document (link is not allowed apparently - Google: "john fox effects package" if you want to have a look). So if it is indeed impossible please tell me. I am running a simple regression on a reduced data set library(plm) ; library(effects) shortdata<-plm.data(shortdata,index=c("ID","Year")) MESS<-plm(paci_to_t ~ paco_to_t + cddom + cddom2,data=shortdata

Fama MacBeth standard errors in R

一曲冷凌霜 提交于 2019-11-29 14:56:55
问题 Does anyone know if there is a package that would run Fama-MacBeth regressions in R and calculate the standard errors? I am aware of the sandwich package and its ability to estimate Newey-West standard errors, as well as providing functions for clustering. However, I have not seen anything with respect to Fama-MacBeth. 回答1: The plm package can estimate Fama-MacBeth regressions and SEs. require(foreign) require(plm) require(lmtest) test <- read.dta("http://www.kellogg.northwestern.edu/faculty

Double clustered standard errors for panel data

帅比萌擦擦* 提交于 2019-11-28 07:48:34
I have a panel data set in R (time and cross section) and would like to compute standard errors that are clustered by two dimensions, because my residuals are correlated both ways. Googling around I found http://thetarzan.wordpress.com/2011/06/11/clustered-standard-errors-in-r/ which provides a function to do this. It seems a bit ad-hoc so I wanted to know if there is a package that has been tested and does this? I know sandwich does HAC standard errors, but it doesn't do double clustering (i.e. along two dimensions). Frank Harrell's package rms (which used to be named Design ) has a function

Create lagged variable in unbalanced panel data in R

邮差的信 提交于 2019-11-28 07:00:37
I'd like to create a variable containing the value of a variable in the previous year within a group. id date value 1 1 1992 4.1 2 1 NA 4.5 3 1 1991 3.3 4 1 1990 5.3 5 1 1994 3.0 6 2 1992 3.2 7 2 1991 5.2 value_lagged should be missing when the previous year is missing within a group - either because it is the first date within a group (as in row 4, 7), or because there are year gaps in the data (as in row 5). Also, value_lagged should be missing when the current time is missing (as in row 2). This gives: id date value value_lagged 1 1 1992 4.1 3.3 2 1 NA 4.5 NA 3 1 1991 3.3 5.3 4 1 1990 5.3