Is there a R function that applies a function to each pair of columns?

后端 未结 4 2093
小鲜肉
小鲜肉 2020-11-27 05:35

I often need to apply a function to each pair of columns in a dataframe/matrix and return the results in a matrix. Now I always write a loop to do this. For instance, to mak

相关标签:
4条回答
  • 2020-11-27 05:54

    92% of the time is being spent in cor.test.default and routines it calls so its hopeless trying to get faster results by simply rewriting Papply (other than the savings from computing only those above or below the diagonal assuming that your function is symmetric in x and y).

    > M <- matrix(rnorm(100*300),300,100)
    > Rprof(); junk <- Papply(M,function(x,y) cor.test( x, y)$p.value); Rprof(NULL)
    > summaryRprof()
    $by.self
                     self.time self.pct total.time total.pct
    cor.test.default      4.36    29.54      13.56     91.87
    # ... snip ...
    
    0 讨论(0)
  • 2020-11-27 05:57

    It wouldn't be faster, but you can use outer to simplify the code. It does require a vectorized function, so here I've used Vectorize to make a vectorized version of the function to get the correlation between two columns.

    df <- data.frame(x=rnorm(100),y=rnorm(100),z=rnorm(100))
    n <- ncol(df)
    
    corpij <- function(i,j,data) {cor.test(data[,i],data[,j])$p.value}
    corp <- Vectorize(corpij, vectorize.args=list("i","j"))
    outer(1:n,1:n,corp,data=df)
    
    0 讨论(0)
  • 2020-11-27 05:59

    I'm not sure if this addresses your problem in a proper manner, but take a look at William Revelle's psych package. corr.test returns list of matrices with correlation coefs, # of obs, t-test statistic, and p-value. I know I use it all the time (and AFAICS you're also a psychologist, so it may suite your needs as well). Writing loops is not the most elegant way of doing this.

    > library(psych)
    > ( k <- corr.test(mtcars[1:5]) )
    Call:corr.test(x = mtcars[1:5])
    Correlation matrix 
           mpg   cyl  disp    hp  drat
    mpg   1.00 -0.85 -0.85 -0.78  0.68
    cyl  -0.85  1.00  0.90  0.83 -0.70
    disp -0.85  0.90  1.00  0.79 -0.71
    hp   -0.78  0.83  0.79  1.00 -0.45
    drat  0.68 -0.70 -0.71 -0.45  1.00
    Sample Size 
         mpg cyl disp hp drat
    mpg   32  32   32 32   32
    cyl   32  32   32 32   32
    disp  32  32   32 32   32
    hp    32  32   32 32   32
    drat  32  32   32 32   32
    Probability value 
         mpg cyl disp   hp drat
    mpg    0   0    0 0.00 0.00
    cyl    0   0    0 0.00 0.00
    disp   0   0    0 0.00 0.00
    hp     0   0    0 0.00 0.01
    drat   0   0    0 0.01 0.00
    
    > str(k)
    List of 5
     $ r   : num [1:5, 1:5] 1 -0.852 -0.848 -0.776 0.681 ...
      ..- attr(*, "dimnames")=List of 2
      .. ..$ : chr [1:5] "mpg" "cyl" "disp" "hp" ...
      .. ..$ : chr [1:5] "mpg" "cyl" "disp" "hp" ...
     $ n   : num [1:5, 1:5] 32 32 32 32 32 32 32 32 32 32 ...
      ..- attr(*, "dimnames")=List of 2
      .. ..$ : chr [1:5] "mpg" "cyl" "disp" "hp" ...
      .. ..$ : chr [1:5] "mpg" "cyl" "disp" "hp" ...
     $ t   : num [1:5, 1:5] Inf -8.92 -8.75 -6.74 5.1 ...
      ..- attr(*, "dimnames")=List of 2
      .. ..$ : chr [1:5] "mpg" "cyl" "disp" "hp" ...
      .. ..$ : chr [1:5] "mpg" "cyl" "disp" "hp" ...
     $ p   : num [1:5, 1:5] 0.00 6.11e-10 9.38e-10 1.79e-07 1.78e-05 ...
      ..- attr(*, "dimnames")=List of 2
      .. ..$ : chr [1:5] "mpg" "cyl" "disp" "hp" ...
      .. ..$ : chr [1:5] "mpg" "cyl" "disp" "hp" ...
     $ Call: language corr.test(x = mtcars[1:5])
     - attr(*, "class")= chr [1:2] "psych" "corr.test"
    
    0 讨论(0)
  • 2020-11-27 06:02

    You can use mapply, but as the other answers state its unlikely to be much faster as most of the time is being used up by cor.test.

    matrix(mapply(function(x,y) cor.test(df[,x],df[,y])$p.value,rep(1:3,3),sort(rep(1:3,3))),nrow=3,ncol=3)
    

    You could reduce the amount of work mapply does by using the symmetry assumption and noting the zero diagonal, eg

    v <- mapply(function(x,y) cor.test(df[,x],df[,y])$p.value,rep(1:2,2:1),rev(rep(3:2,2:1)))
    m <- matrix(0,nrow=3,ncol=3)
    m[lower.tri(m)] <- v
    m[upper.tri(m)] <- v
    
    0 讨论(0)
提交回复
热议问题