How should I count the number of unique rows in a 'binary' matrix?

后端 未结 3 869
青春惊慌失措
青春惊慌失措 2021-01-04 13:51

Suppose I have a matrix whose entries are only 0 and 1, e.g.

set.seed(123)
m <- matrix( sample(0:1, 10, TRUE), nrow=5 )
<         


        
3条回答
  •  攒了一身酷
    2021-01-04 14:19

    I was curious how a pure R solution would perform:

    set.seed(123)
    m <- matrix( sample(0:1, 1E5, TRUE), ncol=5 )
    
    rowCountsR <- function(x) {
      ## calculate hash
      h <- m %*% matrix(2^(0:(ncol(x)-1)), ncol=1)
      i <- which(!duplicated(h))
      counts <- tabulate(h+1)
      counts[order(h[i])] <- counts
      list(counts=counts, idx=i)
    }
    
    library("rbenchmark")
    benchmark(rowCounts(m), rowCountsR(m))
    #            test replications elapsed relative user.self sys.self user.child sys.child
    # 1  rowCounts(m)          100   0.189    1.000     0.188        0          0         0
    # 2 rowCountsR(m)          100   0.258    1.365     0.256        0          0         0
    

    Edit: more columns, thanks @Arun for pointing this out.

    set.seed(123)
    m <- matrix( sample(0:1, 1e7, TRUE), ncol=10)
    benchmark(rowCounts(m), rowCountsR(m), replications=100)
    #           test replications elapsed relative user.self sys.self user.child sys.child
    #1  rowCounts(m)          100  20.659    1.077    20.533    0.024          0         0
    #2 rowCountsR(m)          100  19.183    1.000    15.641    3.408          0         0
    

提交回复
热议问题