I have the following matlab code:
randarray = gpuArray(rand(N,1));
N = 1000;
tic
g=0;
for i=1:N
if randarray(i)>10
g=g+1;
end
end
toc
No expert on the Matlab gpuArray
implementation, but I would suspect that each randarray(i)
access in the first loop triggers a PCI-e transaction to retrieve a value from GPU memory, which will incur a very large latency penalty. You might be better served by calling gather
to transfer the whole array in a single transaction instead and then loop over a local copy in host memory.