How to apply the wilcox.test to a whole dataframe in R?

前端未结

关注

 3  1378

I have a data frame with one grouping factor (the first column) with multiple levels (more than two) and several columns with data. I want to apply the wilcox.test

相关标签:

3条回答

星月不相逢

2020-12-31 17:58

The pairwise.wilcox.test function seems like it would be useful here; perhaps like this?

out <- lapply(2:6, function(x) pairwise.wilcox.test(d[[x]], d$group))
names(out) <- names(d)[2:6]
out

If you just want the p-values, you can go through and extract those and make a matrix.

sapply(out, function(x) {
    p <- x$p.value
    n <- outer(rownames(p), colnames(p), paste, sep='v')
    p <- as.vector(p)
    names(p) <- n
    p
})
##         var1      var2      var3 var4      var5
## 2v1 0.5414627 0.8205958 0.4851572    1 1.0000000
## 3v1 0.1778222 0.3479835 1.0000000    1 1.0000000
## 2v2        NA        NA        NA   NA        NA
## 3v2 0.5414627 0.3479835 0.3784941    1 0.6919826

Also note that pairwise.wilcox.test adjusts for multiple comparisons using the Holm method; if you'd rather do something different, look at the p.adjust parameter.

0 讨论(0)

星月不相逢

2020-12-31 18:17

Updating my answer to work across columns

test.fun <- function(dat, col) { 

 c1 <- combn(unique(dat$group),2)
 sigs <- list()
 for(i in 1:ncol(c1)) {
    sigs[[i]] <- wilcox.test(
                   dat[dat$group == c1[1,i],col],
                   dat[dat$group == c1[2,i],col]
                 )
    }
    names(sigs) <- paste("Group",c1[1,],"by Group",c1[2,])

 tests <- data.frame(Test=names(sigs),
                    W=unlist(lapply(sigs,function(x) x$statistic)),
                    p=unlist(lapply(sigs,function(x) x$p.value)),row.names=NULL)

 return(tests)
}


tests <- lapply(colnames(dat)[-1],function(x) test.fun(dat,x))
names(tests) <- colnames(dat)[-1]
# tests <- do.call(rbind, tests) reprints as data.frame

# This solution is not "slow" and outperforms the other answers significantly: 
system.time(
  rep(
   tests <- lapply(colnames(dat)[-1],function(x) test.fun(dat,x)),10000
  )
)

#   user  system elapsed 
#  0.056   0.000   0.053

And the result:

tests

$var1
                Test  W          p
1 Group 1 by Group 2 28 0.36596737
2 Group 1 by Group 3 39 0.05927406
3 Group 2 by Group 3 38 0.27073136

$var2
                Test    W         p
1 Group 1 by Group 2 19.0 0.8205958
2 Group 1 by Group 3 36.5 0.1159945
3 Group 2 by Group 3 40.5 0.1522726

$var3
                Test    W         p
1 Group 1 by Group 2 13.0 0.2425786
2 Group 1 by Group 3 23.5 1.0000000
3 Group 2 by Group 3 41.0 0.1261647

$var4
                Test  W         p
1 Group 1 by Group 2 26 0.4323470
2 Group 1 by Group 3 30 0.3729664
3 Group 2 by Group 3 29 0.9479518

$var5
                Test    W         p
1 Group 1 by Group 2 24.0 0.7100968
2 Group 1 by Group 3 19.0 0.5324295
3 Group 2 by Group 3 17.5 0.2306609

0 讨论(0)

轮回少年

2020-12-31 18:19
You can loop over the columns using apply and then pass the columns to whatever test you want to use using an anonymous function, like so (assuming the data frame is named df):
```
apply(df[-1],2,function(x) kruskal.test(x,df$group))
```
Note: I used the Kruskal-Wallis test because that works on multiple groups. The above would work just as well using the Wilcoxon test if there were only two groups.

If you do want to do pairwise Wilcoxon tests on all variables, here's a two-liner that will loop through all columns and all pairs and return the results as a list:
```
group.pairs <- combn(unique(df$group),2,simplify=FALSE)
# this loops over the 2nd margin - the columns - of df and makes each column
# available as x
apply(df[-1], 2, function(x)
             # this loops over the list of group pairs and makes each such pair
             # available as an integer vector y
             lapply(group.pairs, function(y)
                    wilcox.test(x[df$group %in% y],df$group[df$group %in% y])))
```
0 讨论(0)
发布评论:

提交评论
- 加载中...