matlab if statements with CUDA

前端 未结 4 1421
鱼传尺愫
鱼传尺愫 2021-01-20 01:59

I have the following matlab code:

randarray = gpuArray(rand(N,1));
N = 1000;

tic
g=0;
for i=1:N

    if randarray(i)>10
        g=g+1;
    end

end
toc

         


        
4条回答
  •  面向向阳花
    2021-01-20 03:02

    I cannot comment on a prior solution because I'm too new, but extending on the solution from Pavan. The nnz function is (not yet) implemented for gpuArrays, at least on the Matlab version I'm using (R2012a).

    In general, it is much better to vectorize Matlab code. However, in some cases looped code can run fast in Matlab bercause of the JIT compilation.

    Check the results from

    N = 1000;
    randarray_cpu = rand(N,1);
    randarray_gpu = gpuArray(randarray_cpu);
    threshold     = 0.5;
    
    % CPU: looped
    g=0;
    tic
    for i=1:N
        if randarray_cpu(i)>threshold
            g=g+1;
        end
    end
    toc
    
    % CPU: vectorized
    tic
    g = nnz(randarray_cpu>threshold);
    toc
    
    % GPU: looped
    tic
    g=0;
    for i=1:N
        if randarray_gpu(i)>threshold
            g=g+1;
        end
    end
    toc
    
    % GPU: vectorized
    tic
    g_d = sum(randarray_gpu > threshold);
    g = gather(g_d); % I'm assuming that you want this in the CPU at some point
    toc
    

    Which is (on my core i7+ GeForce 560Ti):

    Elapsed time is 0.000014 seconds.
    Elapsed time is 0.000580 seconds.
    Elapsed time is 0.310218 seconds.
    Elapsed time is 0.000558 seconds.
    

    So what we see from this case is:

    Loops in Matlab are not considered good praxis, but in your particular case, it does run fast because Matlab somehow "precompiles" it internally. I changed your threshold from 10 to 0.5, as rand will never give you a value higher than 1.

    The looped GPU version performs horribly because at each loop iteration, a kernel is launched (or data is read from the GPU, however TMW implemented that...), which is slow. A lot of small memory transfers while calculating basically nothing are the worst thing one could do on the GPU.

    From the last (best) GPU result the answer would be: unless the data is already on the GPU, it doesn't make sense to calculate this on the GPU. Since the arithmetic complexity of your operation is basically nonexistent, the memory transfer overhead does not pay off in any way. If this is part of a bigger GPU calculation, it's OK. If not... better stick to the CPU ;)

提交回复
热议问题