A matrix version of cor.test()

后端未结

关注

 5  1097

Cor.test() takes vectors x and y as arguments, but I have an entire matrix of data that I want to test, pairwise. Cor() t

相关标签:

5条回答

一生所求

2020-11-27 03:55
corr.test in the psych package is designed to do this:
```
library("psych")
data(sat.act)
corr.test(sat.act)
```
As noted in the comments, to replicate the p-values from the base cor.test() function over the entire matrix, then you need to turn off adjustment of the p-values for multiple comparisons (the default is to use Holm's method of adjustment):
```
 corr.test(sat.act, adjust = "none")
```
[But be careful when interpreting those results!]
0 讨论(0)
发布评论:

提交评论
- 加载中...

一向

2020-11-27 04:00

If you're strictly after the pvalues in a matrix format from cor.test here's a solution shamelessly stolen from Vincent (LINK):

cor.test.p <- function(x){
    FUN <- function(x, y) cor.test(x, y)[["p.value"]]
    z <- outer(
      colnames(x), 
      colnames(x), 
      Vectorize(function(i,j) FUN(x[,i], x[,j]))
    )
    dimnames(z) <- list(colnames(x), colnames(x))
    z
}

cor.test.p(mtcars)

Note: Tommy also provides a faster solution though less easy to impliment. Oh and no for loops :)

Edit I have a function v_outer in my qdapTools package that makes this task pretty easy:

library(qdapTools)
(out <- v_outer(mtcars, function(x, y) cor.test(x, y)[["p.value"]]))
print(out, digits=4)  # for more digits

0 讨论(0)

感情败类

2020-11-27 04:00

"The accepted solution (corr.test function in the psych package) works, but is extremely slow for large matrices."

If you use ci=FALSE, then the speed is much faster. By default, confidence intervals are found. However, this leads to a slight slowdown of speed. So, for just the rs, ts and ps, set ci=FALSE.

0 讨论(0)
发布评论:

提交评论
- 加载中...
离开以前

2020-11-27 04:11

Probably the easiest way is to use the rcorr() from Hmisc. It will only take a matrix, so use rcorr(as.matrix(x)) if your data is in a data.frame. It will return you a list with: 1) matrix of r pairwise, 2) matrix of pairwise n, 3) matrix of p values for the r's. It automatically ignores missing data.

Ideally, a function of this kind should take data.frames too and also output confidence intervals in line with the 'New Statistics'.

0 讨论(0)
发布评论:

提交评论
- 加载中...

长发绾君心

2020-11-27 04:11

The accepted solution (corr.test function in the psych package) works, but is extremely slow for large matrices. I was working with a gene expression matrix (~20,000 by ~1,000) correlated to a drug sensitivity matrix (~1,000 by ~500) and I had to stop it because it was taking forever.

I took some code from the psych package and used the cor() function directly instead and got much better results:

# find (pairwise complete) correlation matrix between two matrices x and y
# compare to corr.test(x, y, adjust = "none")
n <- t(!is.na(x)) %*% (!is.na(y)) # same as count.pairwise(x,y) from psych package
r <- cor(x, y, use = "pairwise.complete.obs") # MUCH MUCH faster than corr.test()
cor2pvalue = function(r, n) {
  t <- (r*sqrt(n-2))/sqrt(1-r^2)
  p <- 2*(1 - pt(abs(t),(n-2)))
  se <- sqrt((1-r*r)/(n-2))
  out <- list(r, n, t, p, se)
  names(out) <- c("r", "n", "t", "p", "se")
  return(out)
}
# get a list with matrices of correlation, pvalues, standard error, etc.
result = cor2pvalue(r,n)

Even with two 100 x 200 matrices, the difference was staggering. A second or two versus 45 seconds.

> system.time(test_func(x,y))
   user  system elapsed 
  0.308   2.452   0.130 
> system.time(corr.test(x,y, adjust = "none"))
   user  system elapsed 
 45.004   3.276  45.814

0 讨论(0)