What is the fastest way to extract the min from each column in a matrix?
Moved all the benchmarks to the answer below.
Here is one that is faster on square and wide matrices. It uses pmin
on the rows of the matrix. (If you know a faster way of splitting the matrix into its rows, please feel free to edit)
do.call(pmin, lapply(1:nrow(mat), function(i)mat[i,]))
Using the same benchmark as @RicardoSaporta:
$`Square Matrix`
test elapsed relative
3 pmin.on.rows 1.370 1.000
1 apl 1.455 1.062
2 cmin 2.075 1.515
$`Wide Matrix`
test elapsed relative
3 pmin.on.rows 0.926 1.000
2 cmin 2.302 2.486
1 apl 5.058 5.462
$`Tall Matrix`
test elapsed relative
1 apl 1.175 1.000
2 cmin 2.126 1.809
3 pmin.on.rows 5.813 4.947
Below is a collection of the answers thus far. This will be updated as more answers are contributed.
library(rbenchmark)
library(matrixStats) # for colMins
list.of.tests <- list (
## Method 1: apply() [original]
apl =expression(apply(mat, 2, min, na.rm=T)),
## Method 2: matrixStats::colMins [contributed by @Ben Bolker ]
cmin = expression(colMins(mat)),
## Method 3: lapply() + split() [contributed by @DWin ]
lapl = expression(lapply( split(mat, rep(1:dim(mat)[1], each=dim(mat)[2])), min)),
## Method 4: pmin() / pmin.int() [contributed by @flodel ]
pmn = expression(do.call(pmin, lapply(1:nrow(mat), function(i)mat[i,]))),
pmn.int = expression(do.call(pmin.int, lapply(1:nrow(mat), function(i)mat[i,]))) #,
## Method 5: ????
# e5 = expression( ??? ),
)
(times <-
lapply(matrix.inputs, function(mat)
do.call(benchmark, args=c(list.of.tests, replications=500, order="relative"))[, c("test", "elapsed", "relative")]
))
#############################
#$ RESULTS $#
#$_________________________$#
#############################
# $`Square Matrix`
# test elapsed relative
# 5 pmn.int 2.842 1.000
# 4 pmn 3.622 1.274
# 1 apl 3.670 1.291
# 2 cmin 5.826 2.050
# 3 lapl 41.817 14.714
# $`Tall Matrix`
# test elapsed relative
# 1 apl 2.622 1.000
# 2 cmin 5.561 2.121
# 5 pmn.int 11.264 4.296
# 4 pmn 18.142 6.919
# 3 lapl 48.637 18.550
# $`Wide-short Matrix`
# test elapsed relative
# 5 pmn.int 2.909 1.000
# 4 pmn 3.018 1.037
# 2 cmin 6.361 2.187
# 1 apl 15.765 5.419
# 3 lapl 41.479 14.259
# $`Wide-tall Matrix`
# test elapsed relative
# 5 pmn.int 20.917 1.000
# 4 pmn 26.188 1.252
# 1 apl 38.635 1.847
# 2 cmin 64.557 3.086
# 3 lapl 434.761 20.785
# $`Tiny Sq Matrix`
# test elapsed relative
# 5 pmn.int 0.112 1.000
# 2 cmin 0.149 1.330
# 4 pmn 0.174 1.554
# 1 apl 0.180 1.607
# 3 lapl 0.509 4.545
lapply( split(mat, rep(1:dim(mat)[1], each=dim(mat)[2])), min)
which( ! apply(my.mat, 2, min, na.rm=T) ==
sapply( split(my.mat, rep(1:dim(my.mat)[1], each=dim(my.mat)[2])), min) )
# named integer(0)
mat[(1:ncol(mat)-1)*nrow(mat)+max.col(t(-mat))]
seems pretty fast, and it's base R.
Update 2014-12-17:
colMins()
et al. were made significantly faster in a recent version of matrixStats. Here's an updated benchmark summary using matrixStats 0.12.2 showing that it ("cmin") is ~5-20 times faster than the second fastest approach:
$`Square Matrix`
test elapsed relative
2 cmin 0.216 1.000
1 apl 4.200 19.444
5 pmn.int 4.604 21.315
4 pmn 5.136 23.778
3 lapl 12.546 58.083
$`Tall Matrix`
test elapsed relative
2 cmin 0.262 1.000
1 apl 3.006 11.473
5 pmn.int 18.605 71.011
3 lapl 22.798 87.015
4 pmn 27.583 105.279
$`Wide-short Matrix`
test elapsed relative
2 cmin 0.346 1.000
5 pmn.int 3.766 10.884
4 pmn 3.955 11.431
3 lapl 13.393 38.708
1 apl 19.187 55.454
$`Wide-tall Matrix`
test elapsed relative
2 cmin 5.591 1.000
5 pmn.int 39.466 7.059
4 pmn 40.265 7.202
1 apl 67.151 12.011
3 lapl 158.035 28.266
$`Tiny Sq Matrix`
test elapsed relative
2 cmin 0.011 1.000
5 pmn.int 0.135 12.273
4 pmn 0.178 16.182
1 apl 0.202 18.364
3 lapl 0.269 24.455
Previous comment 2013-10-09:
FYI, since matrixStats v0.8.7 (2013-07-28), colMins()
is roughly twice as fast as before. The reason is that the function previously utilized colRanges()
, which explains the "surprisingly slow" results reported here. Same speed is seen for colMaxs()
, rowMins()
and rowMaxs()
.
The sos
package is great for answering these sorts of questions.
library("sos")
findFn("colMins")
library("matrixStats")
?colMins
http://finzi.psych.upenn.edu/R/library/matrixStats/html/rowRanges.html
Oddly enough, for the one example I tried colMins
was slower. Perhaps someone can point out what's funny about my example?
set.seed(101); z <- matrix(runif(1e6),nrow=1000)
library(rbenchmark)
benchmark(colMins(z),apply(z,2,min))
## test replications elapsed relative user.self sys.self
## 2 apply(z, 2, min) 100 14.290 1.00 7.216 7.057
## 1 colMins(z) 100 25.585 1.79 15.509 9.852