Logistic Unit Fixed Effect Model in R

问题

I'm trying to estimate a logistic unit fixed effects model for panel data using R. My dependent variable is binary and measured daily over two years for 13 locations. The goal of this model is to predict the value of y for a particular day and location based on x.

zero <- seq(from=0, to=1, by=1)
ids = dplyr::data_frame(location=seq(from=1, to=13, by=1))
dates = dplyr::data_frame(date = seq(as.Date("2015-01-01"), as.Date("2016-12-31"), by="days"))
data = merge(dates, ids)
data$y <- sample(zero, size=9503, replace=TRUE)
data$x <- sample(zero, size=9503, replace=TRUE)

While surveying the available packages to do so, I've read a number of ways to (apparently) do this, but I'm not confident I've understood the differences between packages and approaches.

From what I have read so far, glm(), survival::clogit() and pglm::pglm() can be used to do this, but I'm wondering if there are substantial differences between the packages and what those might be. Here are the calls I've used: fixed <- glm(y ~ x + factor(location), data=data) fixed <- clogit(y ~ x + strata(location), data=data)

One of the reasons for this insecurity is the error I get when using pglm (also see this question) that pglm can't use the "within" model: fixed <- pglm(y ~ x, data=data, index=c("location", "date"), model="within", family=binomial("logit")).

What distinguishes the "within" model of pglm from the approaches in glm() and clogit() and which of the three would be the correct one to take here when trying to predict y for a given date and unit?

回答1:

I don't see that you have defined a proper hypothesis to test within the context of what you are calling "panel data", but as far as getting glm to give estimates for logistic coefficients within strata it can be accomplished by adding family="binomial" and stratifying by your "unit" variable:

> fixed <- glm(y ~ x + strata(unit), data=data, family="binomial")
> fixed

Call:  glm(formula = y ~ x + strata(unit), family = "binomial", data = data)

Coefficients:
        (Intercept)                    x   strata(unit)unit=2   strata(unit)unit=3  
            0.10287             -0.05910             -0.08302             -0.03020  
 strata(unit)unit=4   strata(unit)unit=5   strata(unit)unit=6   strata(unit)unit=7  
           -0.06876             -0.05042             -0.10200             -0.09871  
 strata(unit)unit=8   strata(unit)unit=9  strata(unit)unit=10  strata(unit)unit=11  
           -0.09702              0.02742             -0.13246             -0.04816  
strata(unit)unit=12  strata(unit)unit=13  
           -0.11449             -0.16986  

Degrees of Freedom: 9502 Total (i.e. Null);  9489 Residual
Null Deviance:      13170 
Residual Deviance: 13170    AIC: 13190

That will not take into account any date-ordering, which is what I would have expected to be the interest. But as I said above, there doesn't yet appear to be a hypothesis that is premised on any sequential ordering.

This would create a fixed effects model that included a spline relationship of date to probability of y-event. I chose to center the date rather than leaving it as a very large integer:

library(splines)
fixed <- glm(y ~ x + ns(scale(date),3) + factor(unit), data=data, family="binomial")
fixed
#----------------------
Call:  glm(formula = y ~ x + ns(scale(date), 3) + factor(unit), family = "binomial", 
    data = data)

Coefficients:
        (Intercept)                    x  ns(scale(date), 3)1  ns(scale(date), 3)2  
            0.13389             -0.05904              0.04431             -0.10727  
ns(scale(date), 3)3        factor(unit)2        factor(unit)3        factor(unit)4  
           -0.03224             -0.08302             -0.03020             -0.06877  
      factor(unit)5        factor(unit)6        factor(unit)7        factor(unit)8  
           -0.05042             -0.10201             -0.09872             -0.09702  
      factor(unit)9       factor(unit)10       factor(unit)11       factor(unit)12  
            0.02742             -0.13246             -0.04816             -0.11450  
     factor(unit)13  
           -0.16987  

Degrees of Freedom: 9502 Total (i.e. Null);  9486 Residual
Null Deviance:      13170 
Residual Deviance: 13160    AIC: 13200

来源：https://stackoverflow.com/questions/45216895/logistic-unit-fixed-effect-model-in-r

标签

logistic-regression

panel-data