Cor.test()
takes vectors x
and y
as arguments, but I have an entire matrix of data that I want to test, pairwise. Cor()
t
corr.test
in the psych
package is designed to do this:
library("psych")
data(sat.act)
corr.test(sat.act)
As noted in the comments, to replicate the p-values from the base cor.test()
function over the entire matrix, then you need to turn off adjustment of the p-values for multiple comparisons (the default is to use Holm's method of adjustment):
corr.test(sat.act, adjust = "none")
[But be careful when interpreting those results!]
If you're strictly after the pvalues in a matrix format from cor.test
here's a solution shamelessly stolen from Vincent (LINK):
cor.test.p <- function(x){
FUN <- function(x, y) cor.test(x, y)[["p.value"]]
z <- outer(
colnames(x),
colnames(x),
Vectorize(function(i,j) FUN(x[,i], x[,j]))
)
dimnames(z) <- list(colnames(x), colnames(x))
z
}
cor.test.p(mtcars)
Note: Tommy also provides a faster solution though less easy to impliment. Oh and no for loops :)
Edit I have a function v_outer
in my qdapTools
package that makes this task pretty easy:
library(qdapTools)
(out <- v_outer(mtcars, function(x, y) cor.test(x, y)[["p.value"]]))
print(out, digits=4) # for more digits
"The accepted solution (corr.test
function in the psych package) works, but is extremely slow for large matrices."
If you use ci=FALSE
, then the speed is much faster.
By default, confidence intervals are found. However, this leads to a slight slowdown of speed. So, for just the rs
, ts
and ps
, set ci=FALSE
.
Probably the easiest way is to use the rcorr()
from Hmisc. It will only take a matrix, so use rcorr(as.matrix(x))
if your data is in a data.frame. It will return you a list with: 1) matrix of r pairwise, 2) matrix of pairwise n, 3) matrix of p values for the r's. It automatically ignores missing data.
Ideally, a function of this kind should take data.frames too and also output confidence intervals in line with the 'New Statistics'.
The accepted solution (corr.test function in the psych package) works, but is extremely slow for large matrices. I was working with a gene expression matrix (~20,000 by ~1,000) correlated to a drug sensitivity matrix (~1,000 by ~500) and I had to stop it because it was taking forever.
I took some code from the psych package and used the cor() function directly instead and got much better results:
# find (pairwise complete) correlation matrix between two matrices x and y
# compare to corr.test(x, y, adjust = "none")
n <- t(!is.na(x)) %*% (!is.na(y)) # same as count.pairwise(x,y) from psych package
r <- cor(x, y, use = "pairwise.complete.obs") # MUCH MUCH faster than corr.test()
cor2pvalue = function(r, n) {
t <- (r*sqrt(n-2))/sqrt(1-r^2)
p <- 2*(1 - pt(abs(t),(n-2)))
se <- sqrt((1-r*r)/(n-2))
out <- list(r, n, t, p, se)
names(out) <- c("r", "n", "t", "p", "se")
return(out)
}
# get a list with matrices of correlation, pvalues, standard error, etc.
result = cor2pvalue(r,n)
Even with two 100 x 200 matrices, the difference was staggering. A second or two versus 45 seconds.
> system.time(test_func(x,y))
user system elapsed
0.308 2.452 0.130
> system.time(corr.test(x,y, adjust = "none"))
user system elapsed
45.004 3.276 45.814