How to apply the wilcox.test to a whole dataframe in R?

前端 未结 3 1378
再見小時候
再見小時候 2020-12-31 17:17

I have a data frame with one grouping factor (the first column) with multiple levels (more than two) and several columns with data. I want to apply the wilcox.test

相关标签:
3条回答
  • 2020-12-31 17:58

    The pairwise.wilcox.test function seems like it would be useful here; perhaps like this?

    out <- lapply(2:6, function(x) pairwise.wilcox.test(d[[x]], d$group))
    names(out) <- names(d)[2:6]
    out
    

    If you just want the p-values, you can go through and extract those and make a matrix.

    sapply(out, function(x) {
        p <- x$p.value
        n <- outer(rownames(p), colnames(p), paste, sep='v')
        p <- as.vector(p)
        names(p) <- n
        p
    })
    ##         var1      var2      var3 var4      var5
    ## 2v1 0.5414627 0.8205958 0.4851572    1 1.0000000
    ## 3v1 0.1778222 0.3479835 1.0000000    1 1.0000000
    ## 2v2        NA        NA        NA   NA        NA
    ## 3v2 0.5414627 0.3479835 0.3784941    1 0.6919826
    

    Also note that pairwise.wilcox.test adjusts for multiple comparisons using the Holm method; if you'd rather do something different, look at the p.adjust parameter.

    0 讨论(0)
  • 2020-12-31 18:17

    Updating my answer to work across columns

    test.fun <- function(dat, col) { 
    
     c1 <- combn(unique(dat$group),2)
     sigs <- list()
     for(i in 1:ncol(c1)) {
        sigs[[i]] <- wilcox.test(
                       dat[dat$group == c1[1,i],col],
                       dat[dat$group == c1[2,i],col]
                     )
        }
        names(sigs) <- paste("Group",c1[1,],"by Group",c1[2,])
    
     tests <- data.frame(Test=names(sigs),
                        W=unlist(lapply(sigs,function(x) x$statistic)),
                        p=unlist(lapply(sigs,function(x) x$p.value)),row.names=NULL)
    
     return(tests)
    }
    
    
    tests <- lapply(colnames(dat)[-1],function(x) test.fun(dat,x))
    names(tests) <- colnames(dat)[-1]
    # tests <- do.call(rbind, tests) reprints as data.frame
    
    # This solution is not "slow" and outperforms the other answers significantly: 
    system.time(
      rep(
       tests <- lapply(colnames(dat)[-1],function(x) test.fun(dat,x)),10000
      )
    )
    
    #   user  system elapsed 
    #  0.056   0.000   0.053 
    

    And the result:

    tests
    
    $var1
                    Test  W          p
    1 Group 1 by Group 2 28 0.36596737
    2 Group 1 by Group 3 39 0.05927406
    3 Group 2 by Group 3 38 0.27073136
    
    $var2
                    Test    W         p
    1 Group 1 by Group 2 19.0 0.8205958
    2 Group 1 by Group 3 36.5 0.1159945
    3 Group 2 by Group 3 40.5 0.1522726
    
    $var3
                    Test    W         p
    1 Group 1 by Group 2 13.0 0.2425786
    2 Group 1 by Group 3 23.5 1.0000000
    3 Group 2 by Group 3 41.0 0.1261647
    
    $var4
                    Test  W         p
    1 Group 1 by Group 2 26 0.4323470
    2 Group 1 by Group 3 30 0.3729664
    3 Group 2 by Group 3 29 0.9479518
    
    $var5
                    Test    W         p
    1 Group 1 by Group 2 24.0 0.7100968
    2 Group 1 by Group 3 19.0 0.5324295
    3 Group 2 by Group 3 17.5 0.2306609
    
    0 讨论(0)
  • 2020-12-31 18:19

    You can loop over the columns using apply and then pass the columns to whatever test you want to use using an anonymous function, like so (assuming the data frame is named df):

    apply(df[-1],2,function(x) kruskal.test(x,df$group))
    

    Note: I used the Kruskal-Wallis test because that works on multiple groups. The above would work just as well using the Wilcoxon test if there were only two groups.

    If you do want to do pairwise Wilcoxon tests on all variables, here's a two-liner that will loop through all columns and all pairs and return the results as a list:

    group.pairs <- combn(unique(df$group),2,simplify=FALSE)
    # this loops over the 2nd margin - the columns - of df and makes each column
    # available as x
    apply(df[-1], 2, function(x)
                 # this loops over the list of group pairs and makes each such pair
                 # available as an integer vector y
                 lapply(group.pairs, function(y)
                        wilcox.test(x[df$group %in% y],df$group[df$group %in% y])))
    
    0 讨论(0)
提交回复
热议问题