How to vectorize this operation on every row of a matrix

前端 未结 3 611
醉酒成梦
醉酒成梦 2021-01-20 13:59

I have a matrix filled with TRUE/FALSE values and I am trying to find the index position of the first TRUE value on each row (or retur

3条回答
  •  小鲜肉
    小鲜肉 (楼主)
    2021-01-20 14:38

    A couple of years later, I want to add two alternative approaches.

    1) With max.col:

    > max.col(x, "first")
     [1] 2 1 1 2 1 1 2 1 1 2
    

    2) With aggregate:

    > aggregate(col ~ row, data = which(x, arr.ind = TRUE), FUN = min)$col
     [1] 2 1 1 2 1 1 2 1 1 2
    

    As performance is an issue, let's test the different methods on a larger dataset. First create a function for each method:

    abiel <- function(n){apply(n, 1, function(y) which(y)[1])}
    maxcol <- function(n){max.col(n, "first")}
    aggr.min <- function(n){aggregate(col ~ row, data = which(n, arr.ind = TRUE), FUN = min)$col}
    shane.bR <- function(n){x2 <- (x2 <- which(n, arr=TRUE))[order(x2[,1]),]; x2[as.logical(c(1,diff(x2[,1]) != 0)),2]}
    joris <- function(n){z <- which(t(n))-1;((z%%ncol(n))+1)[match(1:nrow(n), (z%/%ncol(n))+1)]}
    

    Second, create a larger dataset:

    xl <- matrix(sample(c(F,T),9e5,replace=TRUE), nrow=1e5)
    

    Third, run the benchmark:

    library(microbenchmark)
    microbenchmark(abiel(xl), maxcol(xl), aggr.min(xl), shane.bR(xl), joris(xl),
                   unit = 'relative')
    

    which results in:

    Unit: relative
             expr        min         lq       mean     median         uq       max neval   cld
        abiel(xl)  55.102815  33.458994  15.781460  33.243576  33.196486  2.911675   100    d 
       maxcol(xl)   1.000000   1.000000   1.000000   1.000000   1.000000  1.000000   100 a    
     aggr.min(xl) 439.863935 262.595535 118.436328 263.387427 256.815607 16.709754   100     e
     shane.bR(xl)  12.477856   8.522470   7.389083  13.549351  24.626431  1.748501   100   c  
        joris(xl)   7.922274   5.449662   4.418423   5.964554   9.855588  1.491417   100  b   
    

提交回复
热议问题