Comparing two groups with multiple variables using anova or another testing method in r

前端未结

关注

 1  1956

遇见更好的自我 2021-01-25 07:00

Working on my master thesis right now. I have 2 groups: Showering as usual and Cold shower group. Variables are age, gender, weight, psychological wellbeing, physiological wellb

1条回答

花落未央 (楼主)

2021-01-25 07:50

Example data:

set.seed(100)
data1 = data.frame(
Code =sample(letters,100,replace=TRUE),
Gruppe=sample(1:2,100,replace=TRUE),
matrix(rpois(100*11,100),nrow=100)) 
colnames(data1)[-c(1:2)] = c("StudentBasel","Alter","Grösse",
"WHO1W","WHO4W","WHO8W","WHO12W","FEW1W","FEW4W","FEW8W","FEW12W")

You can select the columns you want to test:

test_columns = c("WHO4W","WHO8W","WHO12W")

So, if you just want to test say 4,8 and 12 together, for WHO4 series, you do, the select command essentially selects the columns you want to test:

library(tidyr)
library(dplyr)
library(broom)

data1 %>% 
select(c("Gruppe",test_columns)) %>% 
pivot_longer(-Gruppe)

# A tibble: 300 x 3
   Gruppe name   value
       
 1      2 WHO4W     97
 2      2 WHO8W     91
 3      2 WHO12W    93
 4      1 WHO4W     99
 5      1 WHO8W    103
 6      1 WHO12W    92
 7      2 WHO4W     91
 8      2 WHO8W    111
 9      2 WHO12W   120
10      1 WHO4W    119
# … with 290 more rows

In the above step, I basically repeated for joined every week with its corresponding Gruppe, this is called pivoting a table into long format.

So what you want to do, is a test for Gruppe, within every variable, and you can do it by grouping it first (group_by) followed by the aov as you do by contained within a "do", which means do aov on every group:

result = data1 %>% 
select(c("Gruppe",test_columns)) %>% 
pivot_longer(-Gruppe) %>% 
group_by(name) %>% 
do(tidy(aov(value ~ Gruppe,data=.))) 

# A tibble: 6 x 7
# Groups:   name [3]
  name   term         df    sumsq meansq statistic p.value
                       
1 WHO12W Gruppe        1   131.   131.      1.25     0.266
2 WHO12W Residuals    98 10247.   105.     NA       NA    
3 WHO4W  Gruppe        1   111.   111.      1.01     0.316
4 WHO4W  Residuals    98 10740.   110.     NA       NA    
5 WHO8W  Gruppe        1     1.63   1.63    0.0169   0.897
6 WHO8W  Residuals    98  9428.    96.2    NA       NA

Now we simply take out on terms that contain Gruppe, we are not interested in the residuals:

result %>% filter(term=="Gruppe")
# A tibble: 3 x 7
# Groups:   name [3]
  name   term      df  sumsq meansq statistic p.value
                  
1 WHO12W Gruppe     1 131.   131.      1.25     0.266
2 WHO4W  Gruppe     1 111.   111.      1.01     0.316
3 WHO8W  Gruppe     1   1.63   1.63    0.0169   0.897

I suggest this above because it is easier to explain to people what you have done (you cannot say I did an anova..), and easier to interpret. You can use a big aov and do a posthoc, but please read up and understand what anova is doing before applying this:

#pivot long like before
aov_df = data1 %>% 
select(c("Gruppe",test_columns)) %>% 
pivot_longer(-Gruppe)
# now we have a sub group for every measurement, eg. group 1 + wk4, group #2 + wk4 and so on
aov_df$subgroup = paste0(aov_df$name,aov_df$Gruppe)

result = TukeyHSD(aov(value ~ subgroup,data=aov_df))
# the below are the meaningful comparisons you need:
result$subgroup[c("WHO12W2-WHO12W1","WHO4W2-WHO4W1","WHO8W2-WHO8W1"),]
                      diff       lwr      upr     p adj
WHO12W2-WHO12W1  2.2938808 -3.560239 8.148000 0.8711455
WHO4W2-WHO4W1    2.1151369 -3.738983 7.969256 0.9052955
WHO8W2-WHO8W1   -0.2560386 -6.110158 5.598081 0.9999956

0 讨论(0)