问题
I have a dataset for which each row is one visit to a store by a salesperson and the fields include "outlet" (store ID), "devices" (how many electronic devices the salesperson sold) and "weekday" (the day of the week on which the salesperson was in the store).
I want to work out whether one weekday is better than the others for sales, so instead of comparing all the days of the week to e.g. Monday I want to compare them to the mean of all the days of the week. I am using the lmerTest function (lme4::lmer with estimated p-values) for this.
I have tried the following code:
data$weekday <- factor(weekday_sales$weekday, levels=c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"))
contrasts(data$weekday) = contr.sum(7)
summary(lmerTest::lmer(data=data, devices~weekday + (1|outlet)))
which gives:
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 4.3681 0.6024 12.4472 7.251 8.24e-06 ***
weekday1 -1.0585 0.5129 145.7337 -2.064 0.04080 *
weekday2 -0.2830 0.4958 142.3214 -0.571 0.56913
weekday3 1.1884 0.4907 140.5545 2.422 0.01671 *
weekday4 0.1100 0.5025 145.1407 0.219 0.82707
weekday5 1.3589 0.5135 143.8204 2.646 0.00904 **
weekday6 -0.1629 0.5020 143.1605 -0.325 0.74600
However there were all seven weekdays in the dataset (one is missing) and the levels of the weekdays in the dataset are stored as "Monday", "Tuesday", "Wednesday" etc. not as "weekday1", "weekday2" etc.
Why is there one weekday missing and how do I know which one this is? Does this compare each weekday to the mean or is it doing something else? (And if so how do I change the contrasts to compare all levels to the mean of all levels?)
回答1:
The problem is that with sum contrasts, you can't compare all groups to the overall mean because they aren't independent. If you know the grand mean G
and then the means of days 1 -6, then the mean of day 7 can be calculated from the values you already have. So basically, you can't do it using contrasts - you'd need a post-hoc test of some kind.
With the standard treatment contrasts, you still only make six comparisons (1-2, 1-3, 1-4, 1-5, 1-6, 1-7) and the usual question is: hey, where did 1 go. The answer there is that it is the intercept. Here, you have G-1, G-2, G-3, G-4, G-5, G-6 and then lose G-7.
回答2:
You need to explicitly suppress the intercept:
devices~ -1 + weekday + (1|outlet))
or
devices ~ 0 + weekday + (1|outlet))
It's not particularly clear, but when you use sum-to-zero contrasts, the first parameter is (level 1 - mean), the second is (level 2 - mean), etc., so the comparison that's missing is the last level: "Sunday vs. mean".
set.seed(101)
w <- c("Monday", "Tuesday", "Wednesday", "Thursday",
"Friday", "Saturday", "Sunday")
dd <- data.frame(w=factor(rep(w,10),levels=w),y=rnorm(70))
m0 <- lm(y~w,dd, contrasts=list(w=contr.sum))
m1 <- lm(y~w-1,dd, contrasts=list(w=contr.sum))
来源:https://stackoverflow.com/questions/59250992/how-to-change-contrasts-to-compare-with-mean-of-all-levels-rather-than-reference