问题
I am trying to follow the tutorial by Datanovia for Two-way repeated measures ANOVA.
A quick overview of my dataset:
I have measured the number of different bacterial species in 12 samplingsunits over time. I have 16 time points and 2 groups. I have organised my data as a tibble called "richness";
# A tibble: 190 x 4
id selection.group Day value
<fct> <fct> <fct> <dbl>
1 KRH1 KR 2 111.
2 KRH2 KR 2 141.
3 KRH3 KR 2 110.
4 KRH1 KR 4 126
5 KRH2 KR 4 144
6 KRH3 KR 4 135.
7 KRH1 KR 6 115.
8 KRH2 KR 6 113.
9 KRH3 KR 6 107.
10 KRH1 KR 8 119.
The id refers to each sampling unit, and the selection group is of two factors (KR and RK).
richness <- tibble(
id = factor(c("KRH1", "KRH3", "KRH2", "RKH2", "RKH1", "RKH3")),
selection.group = factor(c("KR", "KR", "KR", "RK", "RK", "RK")),
Day = factor(c(2,2,4,2,4,4)),
value = c(111, 110, 144, 92, 85, 69)) # subset of original data
My tibble appears to be in an identical format as the one in the tutorial;
> str(selfesteem2)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 72 obs. of 4 variables:
$ id : Factor w/ 12 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
$ treatment: Factor w/ 2 levels "ctr","Diet": 1 1 1 1 1 1 1 1 1 1 ...
$ time : Factor w/ 3 levels "t1","t2","t3": 1 1 1 1 1 1 1 1 1 1 ...
$ score : num 83 97 93 92 77 72 92 92 95 92 ..
Before I can run the repeated measures ANOVA I must check for normality in my data. I copied the framework proposed in the tutorial.
#my code
richness %>%
group_by(selection.group, Day) %>%
shapiro_test(value)
#tutorial code
selfesteem2 %>%
group_by(treatment, time) %>%
shapiro_test(score)
But get the error message "Error: Column variable
is unknown" when I try to run the code. Does anyone know why this happens?
I tried to continue without insurance that my data is normally distributed and tried to run the ANOVA
res.aov <- rstatix::anova_test(
data = richness, dv = value, wid = id,
within = c(selection.group, Day)
)
But get this error message; Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases
I have checked for NA values with any(is.na(richness))
which returns FALSE. I have also checked table(richness$selection.group, richness$Day)
to be sure my setup is correct
2 4 6 8 12 16 20 24 28 29 30 32 36 40 44 50
KR 6 6 6 6 6 6 6 6 6 6 6 5 6 6 6 6
RK 6 6 6 6 6 5 6 6 6 6 6 6 6 6 6 6
And the setup appears correct. I would be very grateful for tips on solving this.
Best regards Madeleine
Below is a subset of my dataset in a reproducible format:
library(tidyverse)
library(rstatix)
library(tibble)
richness_subset = data.frame(
id = c("KRH1", "KRH3", "KRH2", "RKH2", "RKH1", "RKH3"),
selection.group = c("KR", "KR", "KR", "RK", "RK", "RK"),
Day = c(2,2,4,2,4,4),
value = c(111, 110, 144, 92, 85, 69))
richness_subset$Day = factor(richness$Day)
richness_subset$selection.group = factor(richness$selection.group)
richness_subset$id = factor(richness$id)
richness_subset = tibble::as_tibble(richness_subset)
richness_subset %>%
group_by(selection.group, Day) %>%
shapiro_test(value)
# gives Error: Column `variable` is unknown
res.aov <- rstatix::anova_test(
data = richness, dv = value, wid = id,
within = c(selection.group, Day)
)
# gives Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
# 0 (non-NA) cases
回答1:
I create something like the design of your data:
set.seed(111)
richness = data.frame(id=rep(c("KRH1","KRH2","KRH3"),6),
selection.group=rep(c("KR","RK"),each=9),
Day=rep(c(2,4,6),each=3,times=2),value=rpois(18,100))
richness$Day = factor(richness$Day)
richness$id = factor(richness$id)
First, shapiro_test, there's a bug in the script and the value you wanna test cannot be named "value":
# gives error Error: Column `variable` is unknown
richness %>% shapiro_test(value)
#works
richness %>% mutate(X = value) %>% shapiro_test(X)
# A tibble: 1 x 3
variable statistic p
<chr> <dbl> <dbl>
1 X 0.950 0.422
1 X 0.963 0.843
Second, for the anova, this works for me.
rstatix::anova_test(
data = richness, dv = value, wid = id,
within = c(selection.group, Day)
)
In my example every term can be estimated.. What I suspect is that one of your terms is a linear combination of the other. Using my example,
set.seed(111)
richness =
data.frame(id=rep(c("KRH1","KRH2","KRH3","KRH4","KRH5","KRH6"),3),
selection.group=rep(c("KR","RK"),each=9),
Day=rep(c(2,4,6),each=3,times=2),value=rpois(18,100))
richness$Day = factor(richness$Day)
richness$id = factor(richness$id)
rstatix::anova_test(
data = richness, dv = value, wid = id,
within = c(selection.group, Day)
)
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
0 (non-NA) cases
Gives the exact same error. This can be checked using:
lm(value~id+Day:selection.group,data=richness)
Call:
lm(formula = value ~ id + Day:selection.group, data = richness)
Coefficients:
(Intercept) id1 id2
101.667 -3.000 -6.000
id3 id4 id5
-6.000 1.889 11.556
Day2:selection.groupKR Day4:selection.groupKR Day6:selection.groupKR
1.667 -12.000 9.333
Day2:selection.groupRK Day4:selection.groupRK Day6:selection.groupRK
-1.667 NA NA
The Day4:selection.groupRK and Day6:selection.groupRK are not estimateable because they are covered by a linear combination of factors before.
回答2:
The solution for running the Shapiro_test proposed above worked.
And I figured out I have some linear combination by running lm(value~id+Day:selection.group,data=richness)
. However, I don't understand why? I know I have data points for each group (see graph). Where does this linear combination come from?
Repeated measure ANOVA appears so appropriate for me as I am following sampling units over time.
来源:https://stackoverflow.com/questions/60046993/unable-to-run-two-way-repeated-measures-anova-0-non-na-cases