I have the following matlab code:
randarray = gpuArray(rand(N,1));
N = 1000;
tic
g=0;
for i=1:N
if randarray(i)>10
g=g+1;
end
end
toc
Using MATLAB R2011b and Parallel Computing Toolbox on a now rather old GPU (Tesla C1060), here's what I see:
>> g = 100*parallel.gpu.GPUArray.rand(1, 1000);
>> tic, sum(g>10); toc
Elapsed time is 0.000474 seconds.
Operating on scalar elements of a gpuArray
one at a time is always going to be slow, so using the sum
method is much quicker.