How to vectorize this operation on every row of a matrix

前端 未结 3 614
醉酒成梦
醉酒成梦 2021-01-20 13:59

I have a matrix filled with TRUE/FALSE values and I am trying to find the index position of the first TRUE value on each row (or retur

相关标签:
3条回答
  • 2021-01-20 14:36

    Not sure this is any better, but this is one solution:

    > x2 <- t(t(matrix(as.numeric(x), nrow=10)) * 1:3)
    > x2[x2 == 0] <- Inf
    > rowMins(x2)
     [1] 2 1 1 2 1 1 2 1 1 2
    

    Edit: Here's a better solution using base R:

    > x2 <- (x2 <- which(x, arr=TRUE))[order(x2[,1]),]
    > x2[as.logical(c(1,diff(x2[,1]) != 0)),2]
     [1] 2 1 1 2 1 1 2 1 1 2
    
    0 讨论(0)
  • 2021-01-20 14:38

    A couple of years later, I want to add two alternative approaches.

    1) With max.col:

    > max.col(x, "first")
     [1] 2 1 1 2 1 1 2 1 1 2
    

    2) With aggregate:

    > aggregate(col ~ row, data = which(x, arr.ind = TRUE), FUN = min)$col
     [1] 2 1 1 2 1 1 2 1 1 2
    

    As performance is an issue, let's test the different methods on a larger dataset. First create a function for each method:

    abiel <- function(n){apply(n, 1, function(y) which(y)[1])}
    maxcol <- function(n){max.col(n, "first")}
    aggr.min <- function(n){aggregate(col ~ row, data = which(n, arr.ind = TRUE), FUN = min)$col}
    shane.bR <- function(n){x2 <- (x2 <- which(n, arr=TRUE))[order(x2[,1]),]; x2[as.logical(c(1,diff(x2[,1]) != 0)),2]}
    joris <- function(n){z <- which(t(n))-1;((z%%ncol(n))+1)[match(1:nrow(n), (z%/%ncol(n))+1)]}
    

    Second, create a larger dataset:

    xl <- matrix(sample(c(F,T),9e5,replace=TRUE), nrow=1e5)
    

    Third, run the benchmark:

    library(microbenchmark)
    microbenchmark(abiel(xl), maxcol(xl), aggr.min(xl), shane.bR(xl), joris(xl),
                   unit = 'relative')
    

    which results in:

    Unit: relative
             expr        min         lq       mean     median         uq       max neval   cld
        abiel(xl)  55.102815  33.458994  15.781460  33.243576  33.196486  2.911675   100    d 
       maxcol(xl)   1.000000   1.000000   1.000000   1.000000   1.000000  1.000000   100 a    
     aggr.min(xl) 439.863935 262.595535 118.436328 263.387427 256.815607 16.709754   100     e
     shane.bR(xl)  12.477856   8.522470   7.389083  13.549351  24.626431  1.748501   100   c  
        joris(xl)   7.922274   5.449662   4.418423   5.964554   9.855588  1.491417   100  b   
    
    0 讨论(0)
  • 2021-01-20 14:39

    You can gain a lot of speed by using %% and %/%:

    x <- matrix(rep(c(F,T,T),10), nrow=10)
    
    z <- which(t(x))-1
    ((z%%ncol(x))+1)[match(1:nrow(x), (z%/%ncol(x))+1)]
    

    This can be adapted as needed: if you want to do this for columns, you don't have to transpose the matrix.

    Tried out on a 1,000,000 X 5 matrix :

    x <- matrix(sample(c(F,T),5000000,replace=T), ncol=5)
    
    system.time(apply(x,1,function(y) which(y)[1]))
    
    #>   user  system elapsed 
    #>  12.61    0.07   12.70 
    
    system.time({
     z <- which(t(x))-1
     (z%%ncol(x)+1)[match(1:nrow(x), (z%/%ncol(x))+1)]}
    )
    
    #>   user  system elapsed 
    #>   1.11    0.00    1.11 
    

    You could gain quite a lot this way.

    0 讨论(0)
提交回复
热议问题