Suppose I have a matrix whose entries are only 0
and 1
, e.g.
set.seed(123)
m <- matrix( sample(0:1, 10, TRUE), nrow=5 )
<
I was curious how a pure R solution would perform:
set.seed(123)
m <- matrix( sample(0:1, 1E5, TRUE), ncol=5 )
rowCountsR <- function(x) {
## calculate hash
h <- m %*% matrix(2^(0:(ncol(x)-1)), ncol=1)
i <- which(!duplicated(h))
counts <- tabulate(h+1)
counts[order(h[i])] <- counts
list(counts=counts, idx=i)
}
library("rbenchmark")
benchmark(rowCounts(m), rowCountsR(m))
# test replications elapsed relative user.self sys.self user.child sys.child
# 1 rowCounts(m) 100 0.189 1.000 0.188 0 0 0
# 2 rowCountsR(m) 100 0.258 1.365 0.256 0 0 0
Edit: more columns, thanks @Arun for pointing this out.
set.seed(123)
m <- matrix( sample(0:1, 1e7, TRUE), ncol=10)
benchmark(rowCounts(m), rowCountsR(m), replications=100)
# test replications elapsed relative user.self sys.self user.child sys.child
#1 rowCounts(m) 100 20.659 1.077 20.533 0.024 0 0
#2 rowCountsR(m) 100 19.183 1.000 15.641 3.408 0 0