问题
I'm trying to estimate a logistic unit fixed effects model for panel data using R. My dependent variable is binary and measured daily over two years for 13 locations. The goal of this model is to predict the value of y for a particular day and location based on x.
zero <- seq(from=0, to=1, by=1)
ids = dplyr::data_frame(location=seq(from=1, to=13, by=1))
dates = dplyr::data_frame(date = seq(as.Date("2015-01-01"), as.Date("2016-12-31"), by="days"))
data = merge(dates, ids)
data$y <- sample(zero, size=9503, replace=TRUE)
data$x <- sample(zero, size=9503, replace=TRUE)
While surveying the available packages to do so, I've read a number of ways to (apparently) do this, but I'm not confident I've understood the differences between packages and approaches.
From what I have read so far, glm()
, survival::clogit()
and pglm::pglm()
can be used to do this, but I'm wondering if there are substantial differences between the packages and what those might be.
Here are the calls I've used:
fixed <- glm(y ~ x + factor(location), data=data)
fixed <- clogit(y ~ x + strata(location), data=data)
One of the reasons for this insecurity is the error I get when using pglm (also see this question) that pglm
can't use the "within" model:
fixed <- pglm(y ~ x, data=data, index=c("location", "date"), model="within", family=binomial("logit"))
.
What distinguishes the "within" model of pglm from the approaches in glm()
and clogit()
and which of the three would be the correct one to take here when trying to predict y for a given date and unit?
回答1:
I don't see that you have defined a proper hypothesis to test within the context of what you are calling "panel data", but as far as getting glm
to give estimates for logistic coefficients within strata it can be accomplished by adding family="binomial" and stratifying by your "unit" variable:
> fixed <- glm(y ~ x + strata(unit), data=data, family="binomial")
> fixed
Call: glm(formula = y ~ x + strata(unit), family = "binomial", data = data)
Coefficients:
(Intercept) x strata(unit)unit=2 strata(unit)unit=3
0.10287 -0.05910 -0.08302 -0.03020
strata(unit)unit=4 strata(unit)unit=5 strata(unit)unit=6 strata(unit)unit=7
-0.06876 -0.05042 -0.10200 -0.09871
strata(unit)unit=8 strata(unit)unit=9 strata(unit)unit=10 strata(unit)unit=11
-0.09702 0.02742 -0.13246 -0.04816
strata(unit)unit=12 strata(unit)unit=13
-0.11449 -0.16986
Degrees of Freedom: 9502 Total (i.e. Null); 9489 Residual
Null Deviance: 13170
Residual Deviance: 13170 AIC: 13190
That will not take into account any date-ordering, which is what I would have expected to be the interest. But as I said above, there doesn't yet appear to be a hypothesis that is premised on any sequential ordering.
This would create a fixed effects model that included a spline relationship of date
to probability of y-event. I chose to center the date rather than leaving it as a very large integer:
library(splines)
fixed <- glm(y ~ x + ns(scale(date),3) + factor(unit), data=data, family="binomial")
fixed
#----------------------
Call: glm(formula = y ~ x + ns(scale(date), 3) + factor(unit), family = "binomial",
data = data)
Coefficients:
(Intercept) x ns(scale(date), 3)1 ns(scale(date), 3)2
0.13389 -0.05904 0.04431 -0.10727
ns(scale(date), 3)3 factor(unit)2 factor(unit)3 factor(unit)4
-0.03224 -0.08302 -0.03020 -0.06877
factor(unit)5 factor(unit)6 factor(unit)7 factor(unit)8
-0.05042 -0.10201 -0.09872 -0.09702
factor(unit)9 factor(unit)10 factor(unit)11 factor(unit)12
0.02742 -0.13246 -0.04816 -0.11450
factor(unit)13
-0.16987
Degrees of Freedom: 9502 Total (i.e. Null); 9486 Residual
Null Deviance: 13170
Residual Deviance: 13160 AIC: 13200
来源:https://stackoverflow.com/questions/45216895/logistic-unit-fixed-effect-model-in-r