This nested for loop can take quite some time to run depending on inputs to specs, perms and K. \'pop\' is just an array to store all values. Perms is a large value, say 10,000.
Let's use RcppArmadillo. But first, I need to change 2 things to your code:
pop
as an array of integers rather than characters. It is easy to make a correspondence table using unique
and match
.pop
so that the accesses are more contiguous.New code to generate pop
:
K <- 1
N <- 100
Hstar <- 10
perms <- 10000
specs <- 1:N
pop <- array(dim = c(N, perms, K))
haps <- 1:Hstar
probs <- rep(1/Hstar, Hstar)
for(j in 1:perms){
for(i in 1:K){
if(i == 1){
pop[, j, i] <- sample(haps, size = N, replace = TRUE, prob = probs)
}
else{
pop[, j, 1] <- sample(haps[s1], size = N, replace = TRUE, prob = probs[s1])
pop[, j, 2] <- sample(haps[s2], size = N, replace = TRUE, prob = probs[s2])
}
}
}
RcppArmadillo code to generate HAC.mat
:
// [[Rcpp::depends(RcppArmadillo)]]
#define ARMA_DONT_PRINT_OPENMP_WARNING
#include <RcppArmadillo.h>
#include <RcppArmadilloExtensions/sample.h>
#include <set>
using namespace Rcpp;
int sample_one(int n) {
return n * unif_rand();
}
int sample_n_distinct(const IntegerVector& x,
int k,
const int * pop_ptr) {
IntegerVector ind_index = RcppArmadillo::sample(x, k, false);
std::set<int> distinct_container;
for (int i = 0; i < k; i++) {
distinct_container.insert(pop_ptr[ind_index[i]]);
}
return distinct_container.size();
}
// [[Rcpp::export]]
arma::Cube<int> fillCube(const arma::Cube<int>& pop,
const IntegerVector& specs,
int perms,
int K) {
int N = specs.size();
arma::Cube<int> res(perms, N, K);
IntegerVector specs_C = specs - 1;
const int * pop_ptr;
int i, j, k;
for (i = 0; i < K; i++) {
for (k = 0; k < N; k++) {
for (j = 0; j < perms; j++) {
pop_ptr = &(pop(0, sample_one(perms), sample_one(K)));
res(j, k, i) = sample_n_distinct(specs_C, k + 1, pop_ptr);
}
}
}
return res;
}
In R:
Rcpp::sourceCpp('cube-sample.cpp')
HAC.mat <- fillCube(pop, specs, perms, K)
This is 10 times as fast as your version on my computer.