fastest way to get Min from every column in a matrix?

前端 未结 6 2064
执念已碎
执念已碎 2020-12-16 16:13

What is the fastest way to extract the min from each column in a matrix?


EDIT:

Moved all the benchmarks to the answer below.

Using

相关标签:
6条回答
  • 2020-12-16 16:53

    Here is one that is faster on square and wide matrices. It uses pmin on the rows of the matrix. (If you know a faster way of splitting the matrix into its rows, please feel free to edit)

    do.call(pmin, lapply(1:nrow(mat), function(i)mat[i,]))
    

    Using the same benchmark as @RicardoSaporta:

    $`Square Matrix`
              test elapsed relative
    3 pmin.on.rows   1.370    1.000
    1          apl   1.455    1.062
    2         cmin   2.075    1.515
    
    $`Wide Matrix`
          test elapsed relative
    3 pmin.on.rows   0.926    1.000
    2         cmin   2.302    2.486
    1          apl   5.058    5.462
    
    $`Tall Matrix`
              test elapsed relative
    1          apl   1.175    1.000
    2         cmin   2.126    1.809
    3 pmin.on.rows   5.813    4.947
    
    0 讨论(0)
  • 2020-12-16 16:53

    Below is a collection of the answers thus far. This will be updated as more answers are contributed.

    BENCHMARKS

      library(rbenchmark)
      library(matrixStats)  # for colMins
    
    
      list.of.tests <- list (
            ## Method 1: apply()  [original]
            apl =expression(apply(mat, 2, min, na.rm=T)),
    
            ## Method 2:  matrixStats::colMins [contributed by @Ben Bolker ]
            cmin = expression(colMins(mat)),
    
            ## Method 3: lapply() + split()  [contributed by @DWin ]
            lapl = expression(lapply( split(mat, rep(1:dim(mat)[1], each=dim(mat)[2])), min)),
    
            ## Method 4: pmin() / pmin.int()  [contributed by @flodel ]
            pmn = expression(do.call(pmin, lapply(1:nrow(mat), function(i)mat[i,]))),
            pmn.int = expression(do.call(pmin.int, lapply(1:nrow(mat), function(i)mat[i,]))) #,
    
            ## Method 5: ????
            #  e5 = expression(  ???  ),
            )  
    
    
      (times <- 
            lapply(matrix.inputs, function(mat)
                do.call(benchmark, args=c(list.of.tests, replications=500, order="relative"))[, c("test", "elapsed", "relative")]
      ))
    
    
    
      ############################# 
      #$         RESULTS         $#
      #$_________________________$#
      #############################
    
      # $`Square Matrix`
      #      test elapsed relative
      # 5 pmn.int   2.842    1.000
      # 4     pmn   3.622    1.274
      # 1     apl   3.670    1.291
      # 2    cmin   5.826    2.050
      # 3    lapl  41.817   14.714  
    
      # $`Tall Matrix`
      #      test elapsed relative
      # 1     apl   2.622    1.000
      # 2    cmin   5.561    2.121
      # 5 pmn.int  11.264    4.296
      # 4     pmn  18.142    6.919
      # 3    lapl  48.637   18.550  
    
      # $`Wide-short Matrix`
      #      test elapsed relative
      # 5 pmn.int   2.909    1.000
      # 4     pmn   3.018    1.037
      # 2    cmin   6.361    2.187
      # 1     apl  15.765    5.419
      # 3    lapl  41.479   14.259  
    
      # $`Wide-tall Matrix`
      #      test elapsed relative
      # 5 pmn.int  20.917    1.000
      # 4     pmn  26.188    1.252
      # 1     apl  38.635    1.847
      # 2    cmin  64.557    3.086
      # 3    lapl 434.761   20.785  
    
      # $`Tiny Sq Matrix`
      #      test elapsed relative
      # 5 pmn.int   0.112    1.000
      # 2    cmin   0.149    1.330
      # 4     pmn   0.174    1.554
      # 1     apl   0.180    1.607
      # 3    lapl   0.509    4.545
    
    0 讨论(0)
  • 2020-12-16 16:56
    lapply( split(mat, rep(1:dim(mat)[1], each=dim(mat)[2])), min)
    
    which( ! apply(my.mat, 2, min, na.rm=T) ==
            sapply( split(my.mat, rep(1:dim(my.mat)[1], each=dim(my.mat)[2])), min) )
    # named integer(0)
    
    0 讨论(0)
  • 2020-12-16 17:08

    mat[(1:ncol(mat)-1)*nrow(mat)+max.col(t(-mat))] seems pretty fast, and it's base R.

    0 讨论(0)
  • 2020-12-16 17:09

    Update 2014-12-17:

    colMins() et al. were made significantly faster in a recent version of matrixStats. Here's an updated benchmark summary using matrixStats 0.12.2 showing that it ("cmin") is ~5-20 times faster than the second fastest approach:

    $`Square Matrix`
         test elapsed relative
    2    cmin   0.216    1.000
    1     apl   4.200   19.444
    5 pmn.int   4.604   21.315
    4     pmn   5.136   23.778
    3    lapl  12.546   58.083
    
    $`Tall Matrix`
         test elapsed relative
    2    cmin   0.262    1.000
    1     apl   3.006   11.473
    5 pmn.int  18.605   71.011
    3    lapl  22.798   87.015
    4     pmn  27.583  105.279
    
    $`Wide-short Matrix`
         test elapsed relative
    2    cmin   0.346    1.000
    5 pmn.int   3.766   10.884
    4     pmn   3.955   11.431
    3    lapl  13.393   38.708
    1     apl  19.187   55.454
    
    $`Wide-tall Matrix`
         test elapsed relative
    2    cmin   5.591    1.000
    5 pmn.int  39.466    7.059
    4     pmn  40.265    7.202
    1     apl  67.151   12.011
    3    lapl 158.035   28.266
    
    $`Tiny Sq Matrix`
         test elapsed relative
    2    cmin   0.011    1.000
    5 pmn.int   0.135   12.273
    4     pmn   0.178   16.182
    1     apl   0.202   18.364
    3    lapl   0.269   24.455
    

    Previous comment 2013-10-09:
    FYI, since matrixStats v0.8.7 (2013-07-28), colMins() is roughly twice as fast as before. The reason is that the function previously utilized colRanges(), which explains the "surprisingly slow" results reported here. Same speed is seen for colMaxs(), rowMins() and rowMaxs().

    0 讨论(0)
  • 2020-12-16 17:13

    The sos package is great for answering these sorts of questions.

    library("sos")
    findFn("colMins")
    library("matrixStats")
    ?colMins
    

    http://finzi.psych.upenn.edu/R/library/matrixStats/html/rowRanges.html

    Oddly enough, for the one example I tried colMins was slower. Perhaps someone can point out what's funny about my example?

    set.seed(101); z <- matrix(runif(1e6),nrow=1000)
    library(rbenchmark)
    benchmark(colMins(z),apply(z,2,min))
    ##               test replications elapsed relative user.self sys.self
    ## 2 apply(z, 2, min)          100  14.290     1.00     7.216    7.057
    ## 1       colMins(z)          100  25.585     1.79    15.509    9.852
    
    0 讨论(0)
提交回复
热议问题