I have a data frame with one grouping factor (the first column) with multiple levels (more than two) and several columns with data. I want to apply the wilcox.test
The pairwise.wilcox.test
function seems like it would be useful here; perhaps like this?
out <- lapply(2:6, function(x) pairwise.wilcox.test(d[[x]], d$group))
names(out) <- names(d)[2:6]
out
If you just want the p-values, you can go through and extract those and make a matrix.
sapply(out, function(x) {
p <- x$p.value
n <- outer(rownames(p), colnames(p), paste, sep='v')
p <- as.vector(p)
names(p) <- n
p
})
## var1 var2 var3 var4 var5
## 2v1 0.5414627 0.8205958 0.4851572 1 1.0000000
## 3v1 0.1778222 0.3479835 1.0000000 1 1.0000000
## 2v2 NA NA NA NA NA
## 3v2 0.5414627 0.3479835 0.3784941 1 0.6919826
Also note that pairwise.wilcox.test
adjusts for multiple comparisons using the Holm method; if you'd rather do something different, look at the p.adjust
parameter.
Updating my answer to work across columns
test.fun <- function(dat, col) {
c1 <- combn(unique(dat$group),2)
sigs <- list()
for(i in 1:ncol(c1)) {
sigs[[i]] <- wilcox.test(
dat[dat$group == c1[1,i],col],
dat[dat$group == c1[2,i],col]
)
}
names(sigs) <- paste("Group",c1[1,],"by Group",c1[2,])
tests <- data.frame(Test=names(sigs),
W=unlist(lapply(sigs,function(x) x$statistic)),
p=unlist(lapply(sigs,function(x) x$p.value)),row.names=NULL)
return(tests)
}
tests <- lapply(colnames(dat)[-1],function(x) test.fun(dat,x))
names(tests) <- colnames(dat)[-1]
# tests <- do.call(rbind, tests) reprints as data.frame
# This solution is not "slow" and outperforms the other answers significantly:
system.time(
rep(
tests <- lapply(colnames(dat)[-1],function(x) test.fun(dat,x)),10000
)
)
# user system elapsed
# 0.056 0.000 0.053
And the result:
tests
$var1
Test W p
1 Group 1 by Group 2 28 0.36596737
2 Group 1 by Group 3 39 0.05927406
3 Group 2 by Group 3 38 0.27073136
$var2
Test W p
1 Group 1 by Group 2 19.0 0.8205958
2 Group 1 by Group 3 36.5 0.1159945
3 Group 2 by Group 3 40.5 0.1522726
$var3
Test W p
1 Group 1 by Group 2 13.0 0.2425786
2 Group 1 by Group 3 23.5 1.0000000
3 Group 2 by Group 3 41.0 0.1261647
$var4
Test W p
1 Group 1 by Group 2 26 0.4323470
2 Group 1 by Group 3 30 0.3729664
3 Group 2 by Group 3 29 0.9479518
$var5
Test W p
1 Group 1 by Group 2 24.0 0.7100968
2 Group 1 by Group 3 19.0 0.5324295
3 Group 2 by Group 3 17.5 0.2306609
You can loop over the columns using apply
and then pass the columns to whatever test you want to use using an anonymous function, like so (assuming the data frame is named df
):
apply(df[-1],2,function(x) kruskal.test(x,df$group))
Note: I used the Kruskal-Wallis test because that works on multiple groups. The above would work just as well using the Wilcoxon test if there were only two groups.
If you do want to do pairwise Wilcoxon tests on all variables, here's a two-liner that will loop through all columns and all pairs and return the results as a list:
group.pairs <- combn(unique(df$group),2,simplify=FALSE)
# this loops over the 2nd margin - the columns - of df and makes each column
# available as x
apply(df[-1], 2, function(x)
# this loops over the list of group pairs and makes each such pair
# available as an integer vector y
lapply(group.pairs, function(y)
wilcox.test(x[df$group %in% y],df$group[df$group %in% y])))