Comparing two groups with multiple variables using anova or another testing method in r

前端 未结 1 1956
遇见更好的自我
遇见更好的自我 2021-01-25 07:00

Working on my master thesis right now. I have 2 groups: Showering as usual and Cold shower group. Variables are age, gender, weight, psychological wellbeing, physiological wellb

1条回答
  •  花落未央
    2021-01-25 07:50

    Example data:

    set.seed(100)
    data1 = data.frame(
    Code =sample(letters,100,replace=TRUE),
    Gruppe=sample(1:2,100,replace=TRUE),
    matrix(rpois(100*11,100),nrow=100)) 
    colnames(data1)[-c(1:2)] = c("StudentBasel","Alter","Grösse",
    "WHO1W","WHO4W","WHO8W","WHO12W","FEW1W","FEW4W","FEW8W","FEW12W") 
    

    You can select the columns you want to test:

    test_columns = c("WHO4W","WHO8W","WHO12W")
    

    So, if you just want to test say 4,8 and 12 together, for WHO4 series, you do, the select command essentially selects the columns you want to test:

    library(tidyr)
    library(dplyr)
    library(broom)
    
    data1 %>% 
    select(c("Gruppe",test_columns)) %>% 
    pivot_longer(-Gruppe)
    
    # A tibble: 300 x 3
       Gruppe name   value
           
     1      2 WHO4W     97
     2      2 WHO8W     91
     3      2 WHO12W    93
     4      1 WHO4W     99
     5      1 WHO8W    103
     6      1 WHO12W    92
     7      2 WHO4W     91
     8      2 WHO8W    111
     9      2 WHO12W   120
    10      1 WHO4W    119
    # … with 290 more rows
    

    In the above step, I basically repeated for joined every week with its corresponding Gruppe, this is called pivoting a table into long format.

    So what you want to do, is a test for Gruppe, within every variable, and you can do it by grouping it first (group_by) followed by the aov as you do by contained within a "do", which means do aov on every group:

    result = data1 %>% 
    select(c("Gruppe",test_columns)) %>% 
    pivot_longer(-Gruppe) %>% 
    group_by(name) %>% 
    do(tidy(aov(value ~ Gruppe,data=.))) 
    
    # A tibble: 6 x 7
    # Groups:   name [3]
      name   term         df    sumsq meansq statistic p.value
                           
    1 WHO12W Gruppe        1   131.   131.      1.25     0.266
    2 WHO12W Residuals    98 10247.   105.     NA       NA    
    3 WHO4W  Gruppe        1   111.   111.      1.01     0.316
    4 WHO4W  Residuals    98 10740.   110.     NA       NA    
    5 WHO8W  Gruppe        1     1.63   1.63    0.0169   0.897
    6 WHO8W  Residuals    98  9428.    96.2    NA       NA    
    

    Now we simply take out on terms that contain Gruppe, we are not interested in the residuals:

    result %>% filter(term=="Gruppe")
    # A tibble: 3 x 7
    # Groups:   name [3]
      name   term      df  sumsq meansq statistic p.value
                      
    1 WHO12W Gruppe     1 131.   131.      1.25     0.266
    2 WHO4W  Gruppe     1 111.   111.      1.01     0.316
    3 WHO8W  Gruppe     1   1.63   1.63    0.0169   0.897
    

    I suggest this above because it is easier to explain to people what you have done (you cannot say I did an anova..), and easier to interpret. You can use a big aov and do a posthoc, but please read up and understand what anova is doing before applying this:

    #pivot long like before
    aov_df = data1 %>% 
    select(c("Gruppe",test_columns)) %>% 
    pivot_longer(-Gruppe)
    # now we have a sub group for every measurement, eg. group 1 + wk4, group #2 + wk4 and so on
    aov_df$subgroup = paste0(aov_df$name,aov_df$Gruppe)
    
    result = TukeyHSD(aov(value ~ subgroup,data=aov_df))
    # the below are the meaningful comparisons you need:
    result$subgroup[c("WHO12W2-WHO12W1","WHO4W2-WHO4W1","WHO8W2-WHO8W1"),]
                          diff       lwr      upr     p adj
    WHO12W2-WHO12W1  2.2938808 -3.560239 8.148000 0.8711455
    WHO4W2-WHO4W1    2.1151369 -3.738983 7.969256 0.9052955
    WHO8W2-WHO8W1   -0.2560386 -6.110158 5.598081 0.9999956
    

    0 讨论(0)
提交回复
热议问题