问题
I'm currently struggling with how to report, following APA-6 recommendations, the output of rstanarm::stan_lmer()
.
First, I'll fit a mixed model within the frequentist approach, then will try to do the same using the bayesian framework.
Here's the reproducible code to get the data:
library(tidyverse)
library(neuropsychology)
library(rstanarm)
library(lmerTest)
df <- neuropsychology::personality %>%
select(Study_Level, Sex, Negative_Affect) %>%
mutate(Study_Level=as.factor(Study_Level),
Negative_Affect=scale(Negative_Affect)) # I understood that scaling variables is important
Now, let's fit a linear mixed model in the "traditional" way to test the impact of Sex (male/female) on Negative Affect (negative mood) with the study level (years of education) as random factor.
fit <- lmer(Negative_Affect ~ Sex + (1|Study_Level), df)
summary(fit)
The output is the following:
Linear mixed model fit by REML t-tests use Satterthwaite approximations to degrees of
freedom [lmerMod]
Formula: Negative_Affect ~ Sex + (1 | Study_Level)
Data: df
REML criterion at convergence: 3709
Scaled residuals:
Min 1Q Median 3Q Max
-2.58199 -0.72973 0.02254 0.68668 2.92841
Random effects:
Groups Name Variance Std.Dev.
Study_Level (Intercept) 0.04096 0.2024
Residual 0.94555 0.9724
Number of obs: 1327, groups: Study_Level, 8
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 0.01564 0.08908 4.70000 0.176 0.868
SexM -0.46667 0.06607 1321.20000 -7.064 2.62e-12 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr)
SexM -0.149
To report it, I would say that "we fitted a linear mixed model with negative affect as outcome variable, sex as predictor and study level was entered as a random effect. Within this model, the male level led to a significant decrease of negative affect (beta = -0.47, t(1321)=-7.06, p < .001).
Is that correct?
Then, let's try to fit the model within a bayesian framework using rstanarm
:
fitB <- stan_lmer(Negative_Affect ~ Sex + (1|Study_Level),
data=df,
prior=normal(location=0, scale=1),
prior_intercept=normal(location=0, scale=1),
prior_PD=F)
print(fitB, digits=2)
This returns:
stan_lmer
family: gaussian [identity]
formula: Negative_Affect ~ Sex + (1 | Study_Level)
------
Estimates:
Median MAD_SD
(Intercept) 0.02 0.10
SexM -0.47 0.07
sigma 0.97 0.02
Error terms:
Groups Name Std.Dev.
Study_Level (Intercept) 0.278
Residual 0.973
Num. levels: Study_Level 8
Sample avg. posterior predictive
distribution of y (X = xbar):
Median MAD_SD
mean_PPD 0.00 0.04
------
For info on the priors used see help('prior_summary.stanreg').
I think than median
is the median of the posterior distribution of the coefficient and mad_sd
the equivalent of standart deviation. These parameters are close to the beta and standart error of the frequentist model, which is reassuring. However, I do not know how to formalize and put the output in words.
Moreover, if I do the summary of the model (summary(fitB, probs=c(.025, .975), digits=2)
), I get other features of the posterior distribution:
...
Estimates:
mean sd 2.5% 97.5%
(Intercept) 0.02 0.11 -0.19 0.23
SexM -0.47 0.07 -0.59 -0.34
...
Is something like the following good?
"we fitted a linear mixed model within the bayesian framework with negative affect as outcome variable, sex as predictor and study level was entered as a random effect. Priors for the coefficient and the intercept were set to normal (mean=0, sd=1). Within this model, the features of the posterior distribution of the coefficient associated with the male level suggest a decrease of negative affect (mean = -0.47, sd = 0.11, 95% CI[-0.59, -0.34]).
Thanks for your help.
回答1:
The following is personal opinion that may or may not be acceptable to a psychology journal.
To report it, I would say that "we fitted a linear mixed model with negative affect as outcome variable, sex as predictor and study level was entered as a random effect. Within this model, the male level led to a significant decrease of negative affect (beta = -0.47, t(1321)=-7.06, p < .001).
Is that correct?
That is considered correct from a frequentist perspective.
The key concepts from a Bayesian perspective are that (conditional on the model, of course)
- There is a 0.5 probability that the true effect is less than the posterior median and a 0.5 probability that the true effect is greater than the posterior median. Frequentists tend to see a posterior median as being like a numerical optimum.
- The
posterior_interval
function yields credible intervals around the median with a default probability of 0.9 (although a lower number produces more accurate estimates of the bounds). So, you can legitimately say that there is a probability of 0.9 that the true effect is between those bounds. Frequentists tend to see confidence intervals as being like credible intervals. - The
as.data.frame
function will give you access to the raw draws, somean(as.data.frame(fitB)$male > 0)
yields the probability that the expected difference in the outcome between men and women in the same study is positive. Frequentists tend to see these probabilities as being like p-values.
For a Bayesian approach, I would say
We fit a linear model using Markov Chain Monte Carlo with negative affect as the outcome variable, sex as predictor and the intercept was allowed to vary by study level.
And then talk about the estimates using the three concepts above.
来源:https://stackoverflow.com/questions/44693812/how-to-report-with-apa-style-a-bayesian-linear-mixed-models-using-rstanarm