mixed-models with two random effects - statsmodels

别等时光非礼了梦想. 提交于 2020-06-13 06:31:58

问题


import pandas as pd
import statsmodels.formula.api as smf

df = pd.read_csv('http://www.bodowinter.com/tutorial/politeness_data.csv')
df = df.drop(38)

In R I would do:

lmer(frequency ~ attitude + (1|subject) + (1|scenario), data=df)

which in R gives me:

Random effects:
 Groups   Name        Variance Std.Dev.
 scenario (Intercept)  219     14.80   
 subject  (Intercept) 4015     63.36   
 Residual              646     25.42   
Fixed effects:
            Estimate Std. Error t value
(Intercept)  202.588     26.754   7.572
attitudepol  -19.695      5.585  -3.527

I tried to do the same with statsmodels:

model = smf.mixedlm("frequency ~ attitude", data=df, groups=df[["subject","scenario"]]).fit()

But model.summary() gives me a different output:

      Mixed Linear Model Regression Results
=======================================================
Model:            MixedLM Dependent Variable: frequency
No. Observations: 83      Method:             REML     
No. Groups:       2       Scale:              0.0000   
Min. group size:  1       Likelihood:         inf      
Max. group size:  1       Converged:          Yes      
Mean group size:  1.0                                  
-------------------------------------------------------
                  Coef.  Std.Err. z P>|z| [0.025 0.975]
-------------------------------------------------------
Intercept        204.500                               
attitude[T.pol]    8.800                               
groups RE          0.000                               
=======================================================

回答1:


The code below reproduces the R results. Since this is a crossed model with no independent groups, you need to put everyone in the same group and specify the random effects using variance components.

import pandas as pd                                                                                                        
import statsmodels.api as sm                                                                                               

df = pd.read_csv('http://www.bodowinter.com/tutorial/politeness_data.csv')                                                 
df = df.dropna()                                                                                                           
df["group"] = 1                                                                                                            

vcf = {"scenario": "0 + C(scenario)", "subject": "0 + C(subject)"}                                                         
model = sm.MixedLM.from_formula("frequency ~ attitude", groups="group",                                                    
                                vc_formula=vcf, re_formula="0", data=df)                                                   
result = model.fit()  

Here are the results:

            Mixed Linear Model Regression Results
==============================================================
Model:               MixedLM   Dependent Variable:   frequency
No. Observations:    83        Method:               REML     
No. Groups:          1         Scale:                646.0163 
Min. group size:     83        Likelihood:           -396.7268
Max. group size:     83        Converged:            Yes      
Mean group size:     83.0                                     
--------------------------------------------------------------
                 Coef.   Std.Err.   z    P>|z|  [0.025  0.975]
--------------------------------------------------------------
Intercept        202.588   26.754  7.572 0.000 150.152 255.025
attitude[T.pol]  -19.695    5.585 -3.526 0.000 -30.641  -8.748
scenario Var     218.991    6.476                             
subject Var     4014.616  104.614                             
==============================================================



回答2:


The only way I could think of to semi-reproduce this is to simply concatenate your groups.

df["grp"] = df["subject"].astype(str) + df["scenario"].astype(str)
model = smf.mixedlm("frequency ~ attitude", data=df, groups=df["grp"]).fit()

model.summary()
Out[87]: 
<class 'statsmodels.iolib.summary2.Summary'>
"""
            Mixed Linear Model Regression Results
==============================================================
Model:               MixedLM   Dependent Variable:   frequency
No. Observations:    83        Method:               REML     
No. Groups:          42        Scale:                615.6961 
Min. group size:     1         Likelihood:           -430.8261
Max. group size:     2         Converged:            Yes      
Mean group size:     2.0                                      
--------------------------------------------------------------
                 Coef.   Std.Err.   z    P>|z|  [0.025  0.975]
--------------------------------------------------------------
Intercept        202.588   10.078 20.102 0.000 182.836 222.340
attitude[T.pol]  -19.618    5.476 -3.582 0.000 -30.350  -8.885
groups RE       3650.021   50.224                             
==============================================================

"""



回答3:


The lmer equivalent of your smf.mixedlm model would be something like this:

lmer(frequency ~ attitude + (1 + attitude|subject) + (1 + attitude|scenario), data = df)

Explanation of the terms:

  1. A global intercept (you can disable the global intercept with frequency ~ 0 + attitude + ...)
  2. An global slope for the fixed effect attitude.
  3. A random intercept vor subject (i.e. for each level of subject you get a deviation from the global intercept), and the deviation from the fixed effect slope for attitude within each level of subject, allowing for correlation between random intercept and slope.
  4. The equivalent random intercept and slope terms for scenario.

Note, if you want to allow random intercept and slope to vary freely (i.e. enforcing a zero correlation between intercept and slope) you'd have to replace (1 + attitude|subject) with (1|subject) + (0 + attitude|subject), and similarly for scenario.



来源:https://stackoverflow.com/questions/50052421/mixed-models-with-two-random-effects-statsmodels

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!