Reducing nested for loop to single loop in R

前端 未结 1 986
臣服心动
臣服心动 2021-01-26 15:08

This nested for loop can take quite some time to run depending on inputs to specs, perms and K. \'pop\' is just an array to store all values. Perms is a large value, say 10,000.

相关标签:
1条回答
  • 2021-01-26 15:54

    Let's use RcppArmadillo. But first, I need to change 2 things to your code:

    • It is easier (and faster) to work with pop as an array of integers rather than characters. It is easy to make a correspondence table using unique and match.
    • I need to permute the first two dimensions of pop so that the accesses are more contiguous.

    New code to generate pop:

    K <- 1 
    N <- 100 
    Hstar <- 10 
    perms <- 10000
    specs <- 1:N 
    pop <- array(dim = c(N, perms, K))
    haps <- 1:Hstar
    probs <- rep(1/Hstar, Hstar) 
    
    for(j in 1:perms){
      for(i in 1:K){ 
        if(i == 1){
          pop[, j, i] <- sample(haps, size = N, replace = TRUE, prob = probs)
        }
        else{
          pop[, j, 1] <- sample(haps[s1], size = N, replace = TRUE, prob = probs[s1])
          pop[, j, 2] <- sample(haps[s2], size = N, replace = TRUE, prob = probs[s2])
        }
      }
    }
    

    RcppArmadillo code to generate HAC.mat:

    // [[Rcpp::depends(RcppArmadillo)]]
    #define ARMA_DONT_PRINT_OPENMP_WARNING
    #include <RcppArmadillo.h>
    #include <RcppArmadilloExtensions/sample.h>
    #include <set>
    using namespace Rcpp;
    
    
    int sample_one(int n) {
      return n * unif_rand();
    } 
    
    int sample_n_distinct(const IntegerVector& x, 
                          int k,
                          const int * pop_ptr) {
    
      IntegerVector ind_index = RcppArmadillo::sample(x, k, false); 
      std::set<int> distinct_container;
    
      for (int i = 0; i < k; i++) {
        distinct_container.insert(pop_ptr[ind_index[i]]);
      }
    
      return distinct_container.size();
    }
    
    // [[Rcpp::export]]
    arma::Cube<int> fillCube(const arma::Cube<int>& pop,
                             const IntegerVector& specs,
                             int perms,
                             int K) {
    
      int N = specs.size();
      arma::Cube<int> res(perms, N, K);
    
      IntegerVector specs_C = specs - 1;
      const int * pop_ptr;
      int i, j, k;
    
      for (i = 0; i < K; i++) {
        for (k = 0; k < N; k++) {
          for (j = 0; j < perms; j++) {
            pop_ptr = &(pop(0, sample_one(perms), sample_one(K)));
            res(j, k, i) = sample_n_distinct(specs_C, k + 1, pop_ptr);
          }
        }
      }
    
      return res;
    }
    

    In R:

    Rcpp::sourceCpp('cube-sample.cpp')
    HAC.mat <- fillCube(pop, specs, perms, K)
    

    This is 10 times as fast as your version on my computer.

    0 讨论(0)
提交回复
热议问题