multinomial logistic multilevel models in R

后端 未结 6 1278
甜味超标
甜味超标 2021-01-29 21:57

Problem: I need to estimate a set of multinomial logistic multilevel models and can’t find an appropriate R package. What is the best R package to estimate such

相关标签:
6条回答
  • 2021-01-29 22:05

    An older question, but I think a viable option has recently emerged is brms, which uses the Bayesian Stan program to actually run the model For example, if you want to run a multinomial logistic regression on the iris data:

    b1 <- brm (Species ~ Petal.Length + Petal.Width + Sepal.Length + Sepal.Width,
               data=iris, family="categorical",
               prior=c(set_prior ("normal (0, 8)")))
    

    And to get an ordinal regression -- not appropriate for iris, of course -- you'd switch the family="categorical" to family="acat" (or cratio or sratio, depending on the type of ordinal regression you want) and make sure that the dependent variable is ordered.

    Clarification per Raphael's comment: This brm call compiles your formula and arguments into Stan code. Stan compiles it into C++ and uses your system's C++ compiler -- which is required. On a Mac, for example, you may need to install the free Developer Tools to get C++. Not sure about Windows. Linux should have C++ installed by default.)

    Clarification per Qaswed's comment: brms easily handles multilevel models as well using the R formula (1 | groupvar) to add a group (random) intercept for a group, (1 + foo | groupvar) to add a random intercept and slope, etc.

    0 讨论(0)
  • 2021-01-29 22:07

    I will recommend you to use the package "mlogit"

    0 讨论(0)
  • 2021-01-29 22:13

    I'm puzzled that this technique is descried as "standard" and "equivalent", though it might well be a good practical solution. (Guess I'd better to check out the Allison and Dobson & Barnett references). For the simple multinomial case ( no clusters, repeated measures etc.) Begg and Gray (1984) propose using k-1 binomial logits against a reference category as an approximation (though a good one) in many cases to full blown multinomial logit. They demonstrate some loss of efficiency when using a single reference category, though it's small for cases where a single high-frequency baseline category is use as the reference. Agresti (2002: p. 274) provides an example where there is a small increase in standard errors even when the baseline category constitutes over 70% of 219 cases in a five category example.

    Maybe it's no big deal, but I don't see how the approximation would get any better adding a second layer of randomness.

    References
    Agresti, A. (2002). Categorical data analysis. Hoboken NJ: Wiley.

    Begg, C. B., & Gray, R. (1984). Calculation of polychotomous logistic regression parameters using individualized regressions. Biometrika, 71(1), 11–18.

    0 讨论(0)
  • 2021-01-29 22:16

    I am dealing with the same issue and one possible solution I found seems to resort to the poisson (loglinear/count) equivalent of the multinomial logistic model as described in this mailinglist, these nice slides or in Agresti (2013: 353-356). Thus, it should be possible to use the glmer(... family=poisson) function from the package lme4 with some aggregation of the data.

    Reference:
    Agresti, A. (2013) Categorical data analysis. Hoboken, NJ: Wiley.

    0 讨论(0)
  • 2021-01-29 22:22

    There are generally two ways of fitting a multinomial models of a categorical variable with J groups: (1) Simultaneously estimating J-1 contrasts; (2) Estimating a separate logit model for each contrast.

    Produce these two methods the same results? No, but the results are often similar

    Which method is better? Simultaneously fitting is more precise (see below for an explanation why)

    Why would someone use separate logit models then? (1) the lme4 package has no routine for simultaneously fitting multinomial models and there is no other multilevel R package that could do this. So separate logit models are presently the only practical solution if someone wants to estimate multilevel multinomial models in R. (2) As some powerful statisticians have argued (Begg and Gray, 1984; Allison, 1984, p. 46-47), separate logit models are much more flexible as they permit for the independent specification of the model equation for each contrast.

    Is it legitimate to use separate logit models? Yes, with some disclaimers. This method is called the “Begg and Gray Approximation”. Begg and Gray (1984, p. 16) showed that this “individualized method is highly efficient”. However, there is some efficiency loss and the Begg and Gray Approximation produces larger standard errors (Agresti 2002, p. 274). As such, it is more difficult to obtain significant results with this method and the results can be considered conservative. This efficiency loss is smallest when the reference category is large (Begg and Gray, 1984; Agresti 2002). R packages that employ the Begg and Gray Approximation (not multilevel) include mlogitBMA (Sevcikova and Raftery, 2012).


    Why is a series of individual logit models imprecise? In my initial example we have a variable (migration) that can have three values A (no migration), B (internal migration), C (international migration). With only one predictor variable x (age), multinomial models are parameterized as a series of binomial contrasts as follows (Long and Cheng, 2004 p. 277):

    Eq. 1:  Ln(Pr(B|x)/Pr(A|x)) = b0,B|A + b1,B|A (x) 
    Eq. 2:  Ln(Pr(C|x)/Pr(A|x)) = b0,C|A + b1,C|A (x)
    Eq. 3:  Ln(Pr(B|x)/Pr(C|x)) = b0,B|C + b1,B|C (x)
    

    For these contrasts the following equations must hold:

    Eq. 4: Ln(Pr(B|x)/Pr(A|x)) + Ln(Pr(C|x)/Pr(A|x)) = Ln(Pr(B|x)/Pr(C|x))
    Eq. 5: b0,B|A + b0,C|A = b0,B|C
    Eq. 6: b1,B|A + b1,C|A = b1,B|C
    

    The problem is that these equations (Eq. 4-6) will in praxis not hold exactly because the coefficients are estimated based on slightly different samples since only cases from the two contrasting groups are used und cases from the third group are omitted. Programs that simultaneously estimate the multinomial contrasts make sure that Eq. 4-6 hold (Long and Cheng, 2004 p. 277). I don’t know exactly how this “simultaneous” model solving works – maybe someone can provide an explanation? Software that do simultaneous fitting of multilevel multinomial models include MLwiN (Steele 2013, p. 4) and STATA (xlmlogit command, Pope, 2014).


    References:

    Agresti, A. (2002). Categorical data analysis (2nd ed.). Hoboken, NJ: John Wiley & Sons.

    Allison, P. D. (1984). Event history analysis. Thousand Oaks, CA: Sage Publications.

    Begg, C. B., & Gray, R. (1984). Calculation of polychotomous logistic regression parameters using individualized regressions. Biometrika, 71(1), 11-18.

    Long, S. J., & Cheng, S. (2004). Regression models for categorical outcomes. In M. Hardy & A. Bryman (Eds.), Handbook of data analysis (pp. 258-285). London: SAGE Publications, Ltd.

    Pope, R. (2014). In the spotlight: Meet Stata's new xlmlogit command. Stata News, 29(2), 2-3.

    Sevcikova, H., & Raftery, A. (2012). Estimation of multinomial logit model using the Begg & Gray approximation.

    Steele, F. (2013). Module 10: Single-level and multilevel models for nominal responses concepts. Bristol, U.K,: Centre for Multilevel Modelling.

    0 讨论(0)
  • 2021-01-29 22:23

    Here's an implementation (not my own). I'd just work off this code. Plus, this way you'll really know what's going on under the hood.

    http://www.nhsilbert.net/docs/rcode/multilevel_multinomial_logistic_regression.R

    0 讨论(0)
提交回复
热议问题