I have a matrix filled with TRUE
/FALSE
values and I am trying to find the index position of the first TRUE
value on each row (or retur
Not sure this is any better, but this is one solution:
> x2 <- t(t(matrix(as.numeric(x), nrow=10)) * 1:3)
> x2[x2 == 0] <- Inf
> rowMins(x2)
[1] 2 1 1 2 1 1 2 1 1 2
Edit: Here's a better solution using base R:
> x2 <- (x2 <- which(x, arr=TRUE))[order(x2[,1]),]
> x2[as.logical(c(1,diff(x2[,1]) != 0)),2]
[1] 2 1 1 2 1 1 2 1 1 2
A couple of years later, I want to add two alternative approaches.
1) With max.col
:
> max.col(x, "first")
[1] 2 1 1 2 1 1 2 1 1 2
2) With aggregate
:
> aggregate(col ~ row, data = which(x, arr.ind = TRUE), FUN = min)$col
[1] 2 1 1 2 1 1 2 1 1 2
As performance is an issue, let's test the different methods on a larger dataset. First create a function for each method:
abiel <- function(n){apply(n, 1, function(y) which(y)[1])}
maxcol <- function(n){max.col(n, "first")}
aggr.min <- function(n){aggregate(col ~ row, data = which(n, arr.ind = TRUE), FUN = min)$col}
shane.bR <- function(n){x2 <- (x2 <- which(n, arr=TRUE))[order(x2[,1]),]; x2[as.logical(c(1,diff(x2[,1]) != 0)),2]}
joris <- function(n){z <- which(t(n))-1;((z%%ncol(n))+1)[match(1:nrow(n), (z%/%ncol(n))+1)]}
Second, create a larger dataset:
xl <- matrix(sample(c(F,T),9e5,replace=TRUE), nrow=1e5)
Third, run the benchmark:
library(microbenchmark)
microbenchmark(abiel(xl), maxcol(xl), aggr.min(xl), shane.bR(xl), joris(xl),
unit = 'relative')
which results in:
Unit: relative
expr min lq mean median uq max neval cld
abiel(xl) 55.102815 33.458994 15.781460 33.243576 33.196486 2.911675 100 d
maxcol(xl) 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 100 a
aggr.min(xl) 439.863935 262.595535 118.436328 263.387427 256.815607 16.709754 100 e
shane.bR(xl) 12.477856 8.522470 7.389083 13.549351 24.626431 1.748501 100 c
joris(xl) 7.922274 5.449662 4.418423 5.964554 9.855588 1.491417 100 b
You can gain a lot of speed by using %%
and %/%
:
x <- matrix(rep(c(F,T,T),10), nrow=10)
z <- which(t(x))-1
((z%%ncol(x))+1)[match(1:nrow(x), (z%/%ncol(x))+1)]
This can be adapted as needed: if you want to do this for columns, you don't have to transpose the matrix.
Tried out on a 1,000,000 X 5 matrix :
x <- matrix(sample(c(F,T),5000000,replace=T), ncol=5)
system.time(apply(x,1,function(y) which(y)[1]))
#> user system elapsed
#> 12.61 0.07 12.70
system.time({
z <- which(t(x))-1
(z%%ncol(x)+1)[match(1:nrow(x), (z%/%ncol(x))+1)]}
)
#> user system elapsed
#> 1.11 0.00 1.11
You could gain quite a lot this way.