Select a subset of combinations

前端 未结 2 878
孤街浪徒
孤街浪徒 2021-01-20 14:53

Suppose that I have a 20 X 5 matrix, I would like to select subsets of the matrix and do some computation with them. Further suppose that each sub-matrix is 7 X 5. I could o

相关标签:
2条回答
  • 2021-01-20 15:29

    I ended up doing what @Roland suggested, by modifying combn(), and byte-compiling the code:

    combn_sub <- function (x, m, nset = 5000, seed=123, simplify = TRUE, ...) {
        stopifnot(length(m) == 1L)
        if (m < 0) 
            stop("m < 0", domain = NA)
        if (is.numeric(x) && length(x) == 1L && x > 0 && trunc(x) == 
            x) 
            x <- seq_len(x)
        n <- length(x)
        if (n < m) 
            stop("n < m", domain = NA)
        m <- as.integer(m)
        e <- 0
        h <- m
        a <- seq_len(m)
        len.r <- length(r <-  x[a] )
        count <- as.integer(round(choose(n, m)))
        if( count < nset ) nset <- count
        dim.use <- c(m, nset)       
    
        ##-----MOD 1: Change the output matrix size--------------
        out <- matrix(r, nrow = len.r, ncol = nset) 
    
        if (m > 0) {
            i <- 2L
            nmmp1 <- n - m + 1L
    
            ##----MOD 2: Select a subset of indices
            set.seed(seed)
            samp <- sort(c(1, sample( 2:count, nset - 1 )))  
    
            ##----MOD 3: Start a counter.
            counter <- 2L    
    
            while (a[1L] != nmmp1 ) {
                if (e < n - h) {
                    h <- 1L
                    e <- a[m]
                    j <- 1L
                }
                else {
                    e <- a[m - h]
                    h <- h + 1L
                    j <- 1L:h
                }
                a[m - h + j] <- e + j
    
                #-----MOD 4: Whenever the counter matches an index in samp, 
                #a combination of row indices is produced and stored in the matrix `out`
                if(samp[i] == counter){ 
                    out[, i] <- x[a]
                    if( i == nset ) break
                    i <- i + 1L
                }
                #-----Increase the counter by 1 for each iteration of the while-loop
                counter <- counter + 1L
            }
        }
        array(out, dim.use)
    }
    
    library("compiler")
    comb_sub <- cmpfun(comb_sub)
    
    0 讨论(0)
  • 2021-01-20 15:54

    Your approach:

    op <- function(){
        ncomb <- combn(20, 7)
        ncombsub <- ncomb[, sample(choose(20,7), 5000)]
        return(ncombsub)
    }
    

    A different strategy that simply samples seven rows from the original matrix 5000 times (replacing any duplicate samples with a new sample until 5000 unique row combinations are found):

    me <- function(){
      rowsample <- replicate(5000,sort(sample(1:20,7,FALSE)),simplify=FALSE)
      while(length(unique(rowsample))<5000){
         rowsample <- unique(rowsample)
         rowsample <- c(rowsample,
                        replicate(5000-length(rowsample),
                                  sort(sample(1:20,7,FALSE)),simplify=FALSE))
      }
      return(do.call(cbind,rowsample))
    }
    

    This should be more efficient because it prevents you from having to calculate all of the combinations first, which will get costly as the matrix gets larger.

    And yet, some benchmarking reveals that is not the case. At least on this matrix:

    library(microbenchmark)
    microbenchmark(op(),me())
    
    Unit: milliseconds
     expr      min       lq   median      uq      max neval
     op() 184.5998 201.9861 206.3408 241.430 299.9245   100
     me() 411.7213 422.9740 429.4767 474.047 490.3177   100
    
    0 讨论(0)
提交回复
热议问题