CUDA device to host copy very slow

后端 未结 2 1694
太阳男子
太阳男子 2021-01-03 08:17

I\'m running windows 7 64 bits, cuda 4.2, visual studio 2010.

First, I run some code on cuda, then download the data back to host. Then do some processing and move b

相关标签:
2条回答
  • 2021-01-03 08:22

    I suggest you to use cudpp, in my opinion is faster than thrust (I'm writing master thesis about optimization and I tried both libraries). If copy is very slow, you can try to write your own kernel to copy data.

    0 讨论(0)
  • 2021-01-03 08:42

    The problem is one of timing, not of any change in copy performance. Kernel launches are asynchronous in CUDA, so what you are measuring is not just the time for thrust::copy but also for the prior kernel you launched to complete. If you change you code for timing the copy operation to something like this:

    cudaDeviceSynchronize(); // wait until prior kernel is finished
    start=clock();
    thrust::copy(d_b.begin(), d_b.end(), h_a.begin());
    end=clock();
    cout<<"Time Spent:"<<end-start<<endl;
    

    You should find the transfer times are restored to their previous performance. So you real question isn't "why is thrust::copy slow", it is "why is my kernel slow". And based on the rather terrible pseudo code you posted, the answer is "because it is full of atomicExch() calls which serialise kernel memory transactions".

    0 讨论(0)
提交回复
热议问题