Alternative for sample | 易学教程

问题

I have the following sample code that uses sapply which takes long to process (since executed many times):

samples = sapply(rowIndices, function(idx){
  sample(vectorToDrawFrom, 1, TRUE, weights[idx, ])
})

The issue is that I have to draw from the weights which are in the matrix, dependent on the indices in rowIndices.

Does somebody have a better idea in mind to draw from the rows of the matrix?

Reproducable example:

rowIndices = floor(runif(1000, 1, 100))
vectorToDrawFrom = runif(5000, 0.0, 2.0)
weights = matrix(runif(100 * 5000, 1, 10), nrow = 100, ncol = 5000)

timer = 0
for (i in 1:2500){
  ptm = proc.time()
  samples = sapply(rowIndices, function(idx){
    sample(vectorToDrawFrom, 1, TRUE, weights[idx, ])
  })
  timer = timer + (proc.time() - ptm)[3]
}

print(timer) # too long!!

回答1:

So here is a way I would speed up your code. One thing to note: the sampled value will not "match" with rowIndices though it would be trivial to get things in the right order. 2) You only store the last iteration, though maybe that is just because this a Minimal Reproducible example...

Basically you should only need to call sample once per value of rowIndices since rowIndices ranges from 1-99, that's 99 calls instead of 1000, which provides a huge speed up.

We can just sort the row indices before we start

rowIndices <- sort(rowIndices) ##sort the row indices and then loop
for (i in 1:15){
    samples = unlist(sapply(unique(rowIndices), 
        function(idx){
            sample(vectorToDrawFrom, sum(rowIndices %in% idx), 
                TRUE, weights[idx, ])
    }))       
}

Unit: milliseconds

expr
                      min       lq     mean   median       uq      max neval cld
 newForLoop      263.5668 266.6329 292.8301 268.8920 275.3378  515.899   100  a 
 OriginalForLoop 698.2982 705.6911 792.2846 712.9985 887.9447 1263.779   100   b

Edit

The way to maintain the original vector ordering is to save the index or the orignal rowIndices vector. Then sort the row indices and proceed.

set.seed(8675309)
weights = matrix(c(1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0), 
                 nrow = 5, ncol = 3, byrow = T)

rowIndices = c(2,1,2,4)
vectorToDrawFrom = runif(3, 0.0, 2.0)

set.seed(8675309)
##This is the origal code
sample2 = sapply(rowIndices, function(idx){       
  sample(vectorToDrawFrom, 1, TRUE, weights[idx, ])
})

rowIndx <- order(rowIndices)   #get ordering index
rowIndices <- sort(rowIndices) 

set.seed(8675309)
samples = unlist(sapply(unique(rowIndices), function(idx){
  sample(vectorToDrawFrom, sum(rowIndices %in% idx), TRUE, weights[idx, ])
}))

samples = samples[order(rowIndx)]
all(samples == sample2)
#[1] TRUE

来源：https://stackoverflow.com/questions/46771186/alternative-for-sample

标签

loops

sapply