发表新帖

发表新帖

CUDA device to host copy very slow

后端未结

关注

 2  1694

I\'m running windows 7 64 bits, cuda 4.2, visual studio 2010.

First, I run some code on cuda, then download the data back to host. Then do some processing and move b

相关标签:

2条回答

心在旅途

2021-01-03 08:22

I suggest you to use cudpp, in my opinion is faster than thrust (I'm writing master thesis about optimization and I tried both libraries). If copy is very slow, you can try to write your own kernel to copy data.

0 讨论(0)
发布评论:

提交评论
- 加载中...
Happy的楠姐

2021-01-03 08:42
The problem is one of timing, not of any change in copy performance. Kernel launches are asynchronous in CUDA, so what you are measuring is not just the time for thrust::copy but also for the prior kernel you launched to complete. If you change you code for timing the copy operation to something like this:
```
cudaDeviceSynchronize(); // wait until prior kernel is finished
start=clock();
thrust::copy(d_b.begin(), d_b.end(), h_a.begin());
end=clock();
cout<<"Time Spent:"<<end-start<<endl;
```
You should find the transfer times are restored to their previous performance. So you real question isn't "why is thrust::copy slow", it is "why is my kernel slow". And based on the rather terrible pseudo code you posted, the answer is "because it is full of atomicExch() calls which serialise kernel memory transactions".
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题