问题
A simple question I hope.
I have an experimental design where I measure some response (let's say blood pressure) from two groups: a control group and an affected group, where both are given three treatments: t1, t2, t3. The data are not paired in any sense.
Here is an example data:
set.seed(1)
df <- data.frame(response = c(rnorm(5,10,1),rnorm(5,10,1),rnorm(5,10,1),
rnorm(5,7,1),rnorm(5,5,1),rnorm(5,10,1)),
group = as.factor(c(rep("control",15),rep("affected",15))),
treatment = as.factor(rep(c(rep("t1",5),rep("t2",5),rep("t3",5)),2)))
What I am interested in is quantifying the effect that each treatment has on the affected group relative to the control group. How would I model this, say using an linear model (for example lm in R)?
Am I wrong thinking that:
lm(response ~ 0 + treatment * group, data = df)
which is equivalent to:
lm(response ~ 0 + treatment + group + treatment:group, data = df)
is not what I need? I think that in this model the treatment:group interaction terms are relative to the mean over all baseline group and baseline treatment measurements.
I therefore thought that this model:
lm(response ~ 0 + treatment:group, data = df)
is what I need but it's quantifying each combination of treatment and group interaction terms: treatmentt1:groupcontrol treatmentt1:groupaffected treatmentt2:groupcontrol treatmentt2:groupaffected treatmentt3:groupcontrol treatmentt3:groupaffected
So perhaps this model:
lm(response ~ 0 + treatment + treatment:group, data = df)
is the correct one?
Although in addition to quantifying each combination of treatment and groupaffected interaction term it's also quantifying the effect of each treatment. I'm not sure what is the baseline each of the treatment and groupaffected interaction terms are compared to in this model.
Help would be appreciated.
Also, let's say I ran a fourth treatment which is actually the combination of two treatments, say t1+t3, where I don't know what the expectation of their combined effect is: additive/subtractive or synergistic. Is there any way this can be combined?
回答1:
The interaction term tells you that the difference between groups is dependent on treatment, that is, that the difference between affected and control is not the same for t1, t2 and t3.
I would model the intercept though.
lm(response ~ group + treatment + group:treatment, data=df)
After getting a significant interaction term I would use t.tests to further investigate and to help with interpretation.
As can be seen the interaction is driven by the larger effect of t2 relative to the others.
library(data.table)
library(dplyr)
library(ggplot2)
set.seed(1)
df <- data.frame(response = c(rnorm(5,10,1),rnorm(5,10,1),rnorm(5,10,1),rnorm(5,7,1),rnorm(5,5,1),rnorm(5,10,1)),
group = as.factor(c(rep("control",15),rep("affected",15))),
treatment = as.factor(rep(c(rep("t1",5),rep("t2",5),rep("t3",5)),2)))
# t tests of the desired comparisons to see if there is a difference and get 95% confidence intervals
t.test(df$response[df$treatment=="t1"] ~ df$group[df$treatment=="t1"])
t.test(df$response[df$treatment=="t2"] ~ df$group[df$treatment=="t2"])
t.test(df$response[df$treatment=="t3"] ~ df$group[df$treatment=="t3"])
# plot 95% C.I.
ci_plot <- matrix(nrow=3, ncol=3)
ci_plot <- as.data.frame(ci_plot)
colnames(ci_plot) <- c("treatment", "lci", "uci")
ci_plot[,1] <- c("t1", "t2", "t3")
ci_plot[,3] <- c(t.test(df$response[df$treatment=="t1"] ~ df$group[df$treatment=="t1"])$conf.int[1],
t.test(df$response[df$treatment=="t2"] ~ df$group[df$treatment=="t2"])$conf.int[1],
t.test(df$response[df$treatment=="t3"] ~ df$group[df$treatment=="t3"])$conf.int[1])
ci_plot[,4] <- c(t.test(df$response[df$treatment=="t1"] ~ df$group[df$treatment=="t1"])$conf.int[2],
t.test(df$response[df$treatment=="t2"] ~ df$group[df$treatment=="t2"])$conf.int[2],
t.test(df$response[df$treatment=="t3"] ~ df$group[df$treatment=="t3"])$conf.int[2])
ggplot(ci_plot, aes(x=treatment, y=uci)) +
geom_errorbar(aes(ymin=uci, ymax=lci), width=0.5, position=position_dodge(0.9), weight=0.5) +
xlab("Treatment") +
ylab("Change in mean relative to control (95% C.I.)") +
theme_bw() +
theme(panel.border = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.line = element_line(colour = "black"),
axis.text.x = element_text(angle = 90, hjust = 1))
回答2:
Your first specification is fine.
lm(response ~ 0 + treatment * group, data = df)
Call:
lm(formula = response ~ 0 + treatment * group, data = df)
Coefficients:
treatmentt1 treatmentt2 treatmentt3
7.460 5.081 9.651
groupcontrol treatmentt2:groupcontrol treatmentt3:groupcontrol
2.670 2.384 -2.283
The first coefficient, 7.460, represents the effect that occurs when a participant is both treated with t1 and affected. Going from left to right, the second coefficient, 5.081, represents when a participant is both treated with t2 and affected, etc...
So for example, when a participant is treated with t2 and in the control the effect is 5.081 + 2.384.
If I were doing this analysis, I would keep the intercept.
Call:
lm(formula = response ~ treatment * group, data = df)
Coefficients:
(Intercept) treatmentt2 treatmentt3
7.460 -2.378 2.192
groupcontrol treatmentt2:groupcontrol treatmentt3:groupcontrol
2.670 2.384 -2.283
Now the second coefficient, going from left to right, represents the effect of participants treated with t2 and affected relative to participants treated with t1 and affected. To see this notice that 7.460 - 2.378 = 5.081 (the second coefficient in the first specification). I like this approach because it makes it easier to interpret the relative effects.
That all being said @MrFlick is right. This is a question for Cross Validation.
来源:https://stackoverflow.com/questions/33644110/interpreting-interactions-in-a-regression-model