问题
I have the following sample
code that uses sapply
which takes long to process (since executed many times):
samples = sapply(rowIndices, function(idx){
sample(vectorToDrawFrom, 1, TRUE, weights[idx, ])
})
The issue is that I have to draw from the weights which are in the matrix, dependent on the indices in rowIndices
.
Does somebody have a better idea in mind to draw from the rows of the matrix?
Reproducable example:
rowIndices = floor(runif(1000, 1, 100))
vectorToDrawFrom = runif(5000, 0.0, 2.0)
weights = matrix(runif(100 * 5000, 1, 10), nrow = 100, ncol = 5000)
timer = 0
for (i in 1:2500){
ptm = proc.time()
samples = sapply(rowIndices, function(idx){
sample(vectorToDrawFrom, 1, TRUE, weights[idx, ])
})
timer = timer + (proc.time() - ptm)[3]
}
print(timer) # too long!!
回答1:
So here is a way I would speed up your code. One thing to note: the sampled value will not "match" with rowIndices
though it would be trivial to get things in the right order. 2) You only store the last iteration, though maybe that is just because this a Minimal Reproducible example...
Basically you should only need to call sample
once per value of rowIndices
since rowIndices
ranges from 1-99, that's 99 calls instead of 1000, which provides a huge speed up.
We can just sort the row indices before we start
rowIndices <- sort(rowIndices) ##sort the row indices and then loop
for (i in 1:15){
samples = unlist(sapply(unique(rowIndices),
function(idx){
sample(vectorToDrawFrom, sum(rowIndices %in% idx),
TRUE, weights[idx, ])
}))
}
Unit: milliseconds
expr
min lq mean median uq max neval cld
newForLoop 263.5668 266.6329 292.8301 268.8920 275.3378 515.899 100 a
OriginalForLoop 698.2982 705.6911 792.2846 712.9985 887.9447 1263.779 100 b
Edit
The way to maintain the original vector ordering is to save the index or the orignal rowIndices
vector. Then sort the row indices and proceed.
set.seed(8675309)
weights = matrix(c(1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0),
nrow = 5, ncol = 3, byrow = T)
rowIndices = c(2,1,2,4)
vectorToDrawFrom = runif(3, 0.0, 2.0)
set.seed(8675309)
##This is the origal code
sample2 = sapply(rowIndices, function(idx){
sample(vectorToDrawFrom, 1, TRUE, weights[idx, ])
})
rowIndx <- order(rowIndices) #get ordering index
rowIndices <- sort(rowIndices)
set.seed(8675309)
samples = unlist(sapply(unique(rowIndices), function(idx){
sample(vectorToDrawFrom, sum(rowIndices %in% idx), TRUE, weights[idx, ])
}))
samples = samples[order(rowIndx)]
all(samples == sample2)
#[1] TRUE
来源:https://stackoverflow.com/questions/46771186/alternative-for-sample