Fast way to create a binary matrix with known number of 1 each row in R

前端未结

关注

 4  701

忘了有多久 2021-01-28 23:10

I have a vector that provides how many \"1\" each row of a matrix has. Now I have to create this matrix out of the vector.

For example, let say I want to create a 4 x 9

4条回答

执念已碎 (楼主)

2021-01-29 00:10

Update on 2016-11-24

I got another solution when answering Ragged rowSums in R today:

outer(v, 1:9, ">=") + 0L

#     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
#[1,]    1    1    0    0    0    0    0    0    0
#[2,]    1    1    1    1    1    1    0    0    0
#[3,]    1    1    1    0    0    0    0    0    0
#[4,]    1    1    1    1    1    1    1    1    1

This has the same memory usage to the f function in my initial answer, and it won't be any slower than f. Consider the benchmark in my original answer:

microbenchmark(my_old = f(v, n), my_new = outer(v, n, ">=") + 0L, unit = "ms")

#Unit: milliseconds
#   expr      min       lq        mean    median        uq       max neval cld
# my_old 109.3422 111.0355 121.0382120 111.16752 112.44472 210.36808   100   b
# my_new   0.3094   0.3199   0.3691904   0.39816   0.40608   0.45556   100  a

Note how much faster this new method is, yet my old method is already the fastest among existing solutions (see below)!!!

Original answer on 2016-11-07

Here is my "awkward" solution:

f <- function (v, n) {
  # n <- 9    ## total number of column
  # v <- c(2,6,3,9)  ## number of 1 each row
  u <- n - v   ## number of 0 each row
  m <- length(u)  ## number of rows
  d <- rep.int(c(1,0), m)  ## discrete value for each row
  asn <- rbind(v, u) ## assignment of `d`
  fill <- rep.int(d, asn)  ## matrix elements
  matrix(fill, byrow = TRUE, ncol = n)
  }

n <- 9    ## total number of column
v <- c(2,6,3,9)  ## number of 1 each row

f(v, n)
#     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
#[1,]    1    1    0    0    0    0    0    0    0
#[2,]    1    1    1    1    1    1    0    0    0
#[3,]    1    1    1    0    0    0    0    0    0
#[4,]    1    1    1    1    1    1    1    1    1

We consider a benchmark of big problem size:

n <- 500    ## 500 columns
v <- sample.int(n, 10000, replace = TRUE)    ## 10000 rows

microbenchmark(
  my_bad = f(v, n),
  roman = {
    xy <- sapply(v, FUN = function(x, ncols) {
      c(rep(1, x), rep(0, ncols - x))
    }, ncols = n, simplify = FALSE)

    do.call("rbind", xy)
  },
  fourtytwo = {
    t(vapply(v, function(y) { x <- numeric( length=n); x[1:y] <- 1;x}, numeric(n) ) )
  },
  akrun = {
    sparseMatrix(i = rep(seq_along(v), v), j = sequence(v), x = 1)
  },
  unit = "ms")

#Unit: milliseconds
#      expr      min       lq     mean   median       uq      max neval  cld
#    my_bad 105.7507 118.6946 160.6818 138.5855 186.3762 327.3808   100 a   
#     roman 176.9003 194.7467 245.0450 213.8680 305.9537 435.5974   100  b  
# fourtytwo 235.0930 256.5129 307.3099 273.2280 358.8224 587.3256   100   c 
#     akrun 316.7131 351.6184 408.5509 389.9576 456.0704 604.2667   100    d

My method is in fact the fastest!!

0 讨论(0)

查看其它4个回答