copy from GPU to CPU is slower than copying CPU to GPU

前端 未结 2 1209
生来不讨喜
生来不讨喜 2021-01-18 16:18

I have started learning cuda for a while and I have the following problem

See how I am doing below:

Copy GPU

int* B;
// ...         


        
2条回答
  •  广开言路
    2021-01-18 16:47

    As for your second question

     B[ind(tid,1,Nel)]=j// j in most cases do no go all the way to the Nel reach
    

    When performing calculation on the GPU, due to sync reasons, every thread which has finished his job does not perform any calculations until all the thread in the same workgroup have finished.

    In other words, the time you need to perform this calculation will be that of the worst case, it doesn't matter if most of the threads don't go all the way down.

    I am not sure about your first question, how do you measure the time? I am not too familiar with cuda, but I think that when copying from CPU to GPU the implementation bufferize your data, hiding the effective time involved.

提交回复
热议问题