I have a vector that provides how many \"1\" each row of a matrix has. Now I have to create this matrix out of the vector.
For example, let say I want to create a 4 x 9
Here is my approach using sapply
and do.call
and some timings on a small sample.
library(microbenchmark)
library(Matrix)
v <- c(2,6,3,9)
microbenchmark(
roman = {
xy <- sapply(v, FUN = function(x, ncols) {
c(rep(1, x), rep(0, ncols - x))
}, ncols = 9, simplify = FALSE)
xy <- do.call("rbind", xy)
},
fourtytwo = {
t(vapply(v, function(y) { x <- numeric( length=9); x[1:y] <- 1;x}, numeric(9) ) )
},
akrun = {
m1 <- sparseMatrix(i = rep(seq_along(v), v), j = sequence(v), x = 1)
m1 <- as.matrix(m1)
})
Unit: microseconds
expr min lq mean median uq
roman 26.436 30.0755 36.42011 36.2055 37.930
fourtytwo 43.676 47.1250 55.53421 54.7870 57.852
akrun 1261.634 1279.8330 1501.81596 1291.5180 1318.720
and for a bit larger sample
v <- sample(2:9, size = 10e3, replace = TRUE)
Unit: milliseconds
expr min lq mean median uq
roman 33.52430 35.80026 37.28917 36.46881 37.69137
fourtytwo 37.39502 40.10257 41.93843 40.52229 41.52205
akrun 10.00342 10.34306 10.66846 10.52773 10.72638
With a growing object size, the benefits of spareMatrix
come to light.
Update on 2016-11-24
I got another solution when answering Ragged rowSums in R today:
outer(v, 1:9, ">=") + 0L
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
#[1,] 1 1 0 0 0 0 0 0 0
#[2,] 1 1 1 1 1 1 0 0 0
#[3,] 1 1 1 0 0 0 0 0 0
#[4,] 1 1 1 1 1 1 1 1 1
This has the same memory usage to the f
function in my initial answer, and it won't be any slower than f
. Consider the benchmark in my original answer:
microbenchmark(my_old = f(v, n), my_new = outer(v, n, ">=") + 0L, unit = "ms")
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# my_old 109.3422 111.0355 121.0382120 111.16752 112.44472 210.36808 100 b
# my_new 0.3094 0.3199 0.3691904 0.39816 0.40608 0.45556 100 a
Note how much faster this new method is, yet my old method is already the fastest among existing solutions (see below)!!!
Original answer on 2016-11-07
Here is my "awkward" solution:
f <- function (v, n) {
# n <- 9 ## total number of column
# v <- c(2,6,3,9) ## number of 1 each row
u <- n - v ## number of 0 each row
m <- length(u) ## number of rows
d <- rep.int(c(1,0), m) ## discrete value for each row
asn <- rbind(v, u) ## assignment of `d`
fill <- rep.int(d, asn) ## matrix elements
matrix(fill, byrow = TRUE, ncol = n)
}
n <- 9 ## total number of column
v <- c(2,6,3,9) ## number of 1 each row
f(v, n)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
#[1,] 1 1 0 0 0 0 0 0 0
#[2,] 1 1 1 1 1 1 0 0 0
#[3,] 1 1 1 0 0 0 0 0 0
#[4,] 1 1 1 1 1 1 1 1 1
We consider a benchmark of big problem size:
n <- 500 ## 500 columns
v <- sample.int(n, 10000, replace = TRUE) ## 10000 rows
microbenchmark(
my_bad = f(v, n),
roman = {
xy <- sapply(v, FUN = function(x, ncols) {
c(rep(1, x), rep(0, ncols - x))
}, ncols = n, simplify = FALSE)
do.call("rbind", xy)
},
fourtytwo = {
t(vapply(v, function(y) { x <- numeric( length=n); x[1:y] <- 1;x}, numeric(n) ) )
},
akrun = {
sparseMatrix(i = rep(seq_along(v), v), j = sequence(v), x = 1)
},
unit = "ms")
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# my_bad 105.7507 118.6946 160.6818 138.5855 186.3762 327.3808 100 a
# roman 176.9003 194.7467 245.0450 213.8680 305.9537 435.5974 100 b
# fourtytwo 235.0930 256.5129 307.3099 273.2280 358.8224 587.3256 100 c
# akrun 316.7131 351.6184 408.5509 389.9576 456.0704 604.2667 100 d
My method is in fact the fastest!!
vapply
is usually faster than sapply
. This assigns the desired number of ones to a length-9 vector and then transposes.
> t( vapply( c(2,6,3,9), function(y) { x <- numeric( length=9); x[1:y] <- 1;x}, numeric(9) ) )
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1 1 0 0 0 0 0 0 0
[2,] 1 1 1 1 1 1 0 0 0
[3,] 1 1 1 0 0 0 0 0 0
[4,] 1 1 1 1 1 1 1 1 1
Less than 5 seconds on an old Mac.
system.time( M <- t( vapply( sample(1:500, 100000, rep=TRUE), function(y) { x <- numeric( length=500); x[1:y] <- 1;x}, numeric(500) ) ) )
user system elapsed
3.531 1.208 4.676
One option is sparseMatrix
from Matrix
library(Matrix)
m1 <- sparseMatrix(i = rep(seq_along(v), v), j = sequence(v), x = 1)
m1
#4 x 9 sparse Matrix of class "dgCMatrix"
#[1,] 1 1 . . . . . . .
#[2,] 1 1 1 1 1 1 . . .
#[3,] 1 1 1 . . . . . .
#[4,] 1 1 1 1 1 1 1 1 1
This can be converted to matrix
with as.matrix
as.matrix(m1)