fastest way to get Min from every column in a matrix?

自作多情 提交于 2019-12-03 19:52:39

问题


What is the fastest way to extract the min from each column in a matrix?


EDIT:

Moved all the benchmarks to the answer below.

Using a Tall, Short or Wide Matrix:

  ##  TEST DATA
  set.seed(1)
  matrix.inputs <- list(
        "Square Matrix"     = matrix(sample(seq(1e6), 4^2*1e4, T), ncol=400),   #  400 x  400
        "Tall Matrix"       = matrix(sample(seq(1e6), 4^2*1e4, T), nrow=4000),  # 4000 x   40
        "Wide-short Matrix" = matrix(sample(seq(1e6), 4^2*1e4, T), ncol=4000),  #   40 x 4000
        "Wide-tall Matrix"  = matrix(sample(seq(1e6), 4^2*1e5, T), ncol=4000),   #  400 x 4000
        "Tiny Sq Matrix"    = matrix(sample(seq(1e6), 4^2*1e2, T), ncol=40)     #   40 x   40
  )

回答1:


Here is one that is faster on square and wide matrices. It uses pmin on the rows of the matrix. (If you know a faster way of splitting the matrix into its rows, please feel free to edit)

do.call(pmin, lapply(1:nrow(mat), function(i)mat[i,]))

Using the same benchmark as @RicardoSaporta:

$`Square Matrix`
          test elapsed relative
3 pmin.on.rows   1.370    1.000
1          apl   1.455    1.062
2         cmin   2.075    1.515

$`Wide Matrix`
      test elapsed relative
3 pmin.on.rows   0.926    1.000
2         cmin   2.302    2.486
1          apl   5.058    5.462

$`Tall Matrix`
          test elapsed relative
1          apl   1.175    1.000
2         cmin   2.126    1.809
3 pmin.on.rows   5.813    4.947



回答2:


The sos package is great for answering these sorts of questions.

library("sos")
findFn("colMins")
library("matrixStats")
?colMins

http://finzi.psych.upenn.edu/R/library/matrixStats/html/rowRanges.html

Oddly enough, for the one example I tried colMins was slower. Perhaps someone can point out what's funny about my example?

set.seed(101); z <- matrix(runif(1e6),nrow=1000)
library(rbenchmark)
benchmark(colMins(z),apply(z,2,min))
##               test replications elapsed relative user.self sys.self
## 2 apply(z, 2, min)          100  14.290     1.00     7.216    7.057
## 1       colMins(z)          100  25.585     1.79    15.509    9.852



回答3:


Update 2014-12-17:

colMins() et al. were made significantly faster in a recent version of matrixStats. Here's an updated benchmark summary using matrixStats 0.12.2 showing that it ("cmin") is ~5-20 times faster than the second fastest approach:

$`Square Matrix`
     test elapsed relative
2    cmin   0.216    1.000
1     apl   4.200   19.444
5 pmn.int   4.604   21.315
4     pmn   5.136   23.778
3    lapl  12.546   58.083

$`Tall Matrix`
     test elapsed relative
2    cmin   0.262    1.000
1     apl   3.006   11.473
5 pmn.int  18.605   71.011
3    lapl  22.798   87.015
4     pmn  27.583  105.279

$`Wide-short Matrix`
     test elapsed relative
2    cmin   0.346    1.000
5 pmn.int   3.766   10.884
4     pmn   3.955   11.431
3    lapl  13.393   38.708
1     apl  19.187   55.454

$`Wide-tall Matrix`
     test elapsed relative
2    cmin   5.591    1.000
5 pmn.int  39.466    7.059
4     pmn  40.265    7.202
1     apl  67.151   12.011
3    lapl 158.035   28.266

$`Tiny Sq Matrix`
     test elapsed relative
2    cmin   0.011    1.000
5 pmn.int   0.135   12.273
4     pmn   0.178   16.182
1     apl   0.202   18.364
3    lapl   0.269   24.455

Previous comment 2013-10-09:
FYI, since matrixStats v0.8.7 (2013-07-28), colMins() is roughly twice as fast as before. The reason is that the function previously utilized colRanges(), which explains the "surprisingly slow" results reported here. Same speed is seen for colMaxs(), rowMins() and rowMaxs().




回答4:


lapply( split(mat, rep(1:dim(mat)[1], each=dim(mat)[2])), min)

which( ! apply(my.mat, 2, min, na.rm=T) ==
        sapply( split(my.mat, rep(1:dim(my.mat)[1], each=dim(my.mat)[2])), min) )
# named integer(0)



回答5:


Below is a collection of the answers thus far. This will be updated as more answers are contributed.

BENCHMARKS

  library(rbenchmark)
  library(matrixStats)  # for colMins


  list.of.tests <- list (
        ## Method 1: apply()  [original]
        apl =expression(apply(mat, 2, min, na.rm=T)),

        ## Method 2:  matrixStats::colMins [contributed by @Ben Bolker ]
        cmin = expression(colMins(mat)),

        ## Method 3: lapply() + split()  [contributed by @DWin ]
        lapl = expression(lapply( split(mat, rep(1:dim(mat)[1], each=dim(mat)[2])), min)),

        ## Method 4: pmin() / pmin.int()  [contributed by @flodel ]
        pmn = expression(do.call(pmin, lapply(1:nrow(mat), function(i)mat[i,]))),
        pmn.int = expression(do.call(pmin.int, lapply(1:nrow(mat), function(i)mat[i,]))) #,

        ## Method 5: ????
        #  e5 = expression(  ???  ),
        )  


  (times <- 
        lapply(matrix.inputs, function(mat)
            do.call(benchmark, args=c(list.of.tests, replications=500, order="relative"))[, c("test", "elapsed", "relative")]
  ))



  ############################# 
  #$         RESULTS         $#
  #$_________________________$#
  #############################

  # $`Square Matrix`
  #      test elapsed relative
  # 5 pmn.int   2.842    1.000
  # 4     pmn   3.622    1.274
  # 1     apl   3.670    1.291
  # 2    cmin   5.826    2.050
  # 3    lapl  41.817   14.714  

  # $`Tall Matrix`
  #      test elapsed relative
  # 1     apl   2.622    1.000
  # 2    cmin   5.561    2.121
  # 5 pmn.int  11.264    4.296
  # 4     pmn  18.142    6.919
  # 3    lapl  48.637   18.550  

  # $`Wide-short Matrix`
  #      test elapsed relative
  # 5 pmn.int   2.909    1.000
  # 4     pmn   3.018    1.037
  # 2    cmin   6.361    2.187
  # 1     apl  15.765    5.419
  # 3    lapl  41.479   14.259  

  # $`Wide-tall Matrix`
  #      test elapsed relative
  # 5 pmn.int  20.917    1.000
  # 4     pmn  26.188    1.252
  # 1     apl  38.635    1.847
  # 2    cmin  64.557    3.086
  # 3    lapl 434.761   20.785  

  # $`Tiny Sq Matrix`
  #      test elapsed relative
  # 5 pmn.int   0.112    1.000
  # 2    cmin   0.149    1.330
  # 4     pmn   0.174    1.554
  # 1     apl   0.180    1.607
  # 3    lapl   0.509    4.545



回答6:


mat[(1:ncol(mat)-1)*nrow(mat)+max.col(t(-mat))] seems pretty fast, and it's base R.



来源:https://stackoverflow.com/questions/13676878/fastest-way-to-get-min-from-every-column-in-a-matrix

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!