问题
import pandas as pd
import statsmodels.formula.api as smf
df = pd.read_csv('http://www.bodowinter.com/tutorial/politeness_data.csv')
df = df.drop(38)
In R
I would do:
lmer(frequency ~ attitude + (1|subject) + (1|scenario), data=df)
which in R
gives me:
Random effects:
Groups Name Variance Std.Dev.
scenario (Intercept) 219 14.80
subject (Intercept) 4015 63.36
Residual 646 25.42
Fixed effects:
Estimate Std. Error t value
(Intercept) 202.588 26.754 7.572
attitudepol -19.695 5.585 -3.527
I tried to do the same with statsmodels
:
model = smf.mixedlm("frequency ~ attitude", data=df, groups=df[["subject","scenario"]]).fit()
But model.summary()
gives me a different output:
Mixed Linear Model Regression Results
=======================================================
Model: MixedLM Dependent Variable: frequency
No. Observations: 83 Method: REML
No. Groups: 2 Scale: 0.0000
Min. group size: 1 Likelihood: inf
Max. group size: 1 Converged: Yes
Mean group size: 1.0
-------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
-------------------------------------------------------
Intercept 204.500
attitude[T.pol] 8.800
groups RE 0.000
=======================================================
回答1:
The code below reproduces the R results. Since this is a crossed model with no independent groups, you need to put everyone in the same group and specify the random effects using variance components.
import pandas as pd
import statsmodels.api as sm
df = pd.read_csv('http://www.bodowinter.com/tutorial/politeness_data.csv')
df = df.dropna()
df["group"] = 1
vcf = {"scenario": "0 + C(scenario)", "subject": "0 + C(subject)"}
model = sm.MixedLM.from_formula("frequency ~ attitude", groups="group",
vc_formula=vcf, re_formula="0", data=df)
result = model.fit()
Here are the results:
Mixed Linear Model Regression Results
==============================================================
Model: MixedLM Dependent Variable: frequency
No. Observations: 83 Method: REML
No. Groups: 1 Scale: 646.0163
Min. group size: 83 Likelihood: -396.7268
Max. group size: 83 Converged: Yes
Mean group size: 83.0
--------------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
--------------------------------------------------------------
Intercept 202.588 26.754 7.572 0.000 150.152 255.025
attitude[T.pol] -19.695 5.585 -3.526 0.000 -30.641 -8.748
scenario Var 218.991 6.476
subject Var 4014.616 104.614
==============================================================
回答2:
The only way I could think of to semi-reproduce this is to simply concatenate your groups.
df["grp"] = df["subject"].astype(str) + df["scenario"].astype(str)
model = smf.mixedlm("frequency ~ attitude", data=df, groups=df["grp"]).fit()
model.summary()
Out[87]:
<class 'statsmodels.iolib.summary2.Summary'>
"""
Mixed Linear Model Regression Results
==============================================================
Model: MixedLM Dependent Variable: frequency
No. Observations: 83 Method: REML
No. Groups: 42 Scale: 615.6961
Min. group size: 1 Likelihood: -430.8261
Max. group size: 2 Converged: Yes
Mean group size: 2.0
--------------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
--------------------------------------------------------------
Intercept 202.588 10.078 20.102 0.000 182.836 222.340
attitude[T.pol] -19.618 5.476 -3.582 0.000 -30.350 -8.885
groups RE 3650.021 50.224
==============================================================
"""
回答3:
The lmer
equivalent of your smf.mixedlm
model would be something like this:
lmer(frequency ~ attitude + (1 + attitude|subject) + (1 + attitude|scenario), data = df)
Explanation of the terms:
- A global intercept (you can disable the global intercept with
frequency ~ 0 + attitude + ...
) - An global slope for the fixed effect
attitude
. - A random intercept vor
subject
(i.e. for each level ofsubject
you get a deviation from the global intercept), and the deviation from the fixed effect slope forattitude
within each level ofsubject
, allowing for correlation between random intercept and slope. - The equivalent random intercept and slope terms for
scenario
.
Note, if you want to allow random intercept and slope to vary freely (i.e. enforcing a zero correlation between intercept and slope) you'd have to replace (1 + attitude|subject)
with (1|subject) + (0 + attitude|subject)
, and similarly for scenario
.
来源:https://stackoverflow.com/questions/50052421/mixed-models-with-two-random-effects-statsmodels