How CudaMalloc work?

前端未结

关注

 3  814

I am trying to modify the imageDenosing class in CUDA SDK, I need to repeat the filter many time incase to capture the time. But my code doesn\'t work properly.

//st

相关标签:

3条回答

你的背包

2020-12-11 09:23
I already answered this for you when you posted the same question previously - you need to wait for a kernel to complete before running it again - add:
```
cudaThreadSynchronize(); // *** wait for kernel to complete ***
```
after the kernel call.
0 讨论(0)
发布评论:

提交评论
- 加载中...

南笙

2020-12-11 09:32

Your kernel is running asynchronously - you need to wait for it to complete, e.g.

cudaMalloc((void **)&dst2, size);
cudaMemcpy(dst2, dst, imageW * imageH * sizeof(TColor), cudaMemcpyHostToDevice);
F1D<<<grid2, threads2>>>(dst, imageW, imageH, dst2);
cudaThreadSynchronize(); // *** wait for kernel to complete ***
cudaFree(dst2);

0 讨论(0)

夕颜

2020-12-11 09:32
The statement
```
image[imageW * iy + ix] =   buffer[imageW * iy + ix];
```
is causing the problem. You are overwriting your input image in the kernel. So depending on thread execution order, you would be further blurring parts of the image.

Also, I don't see the purpose of
```
cudaMemcpy(dst2, dst, imageW*imageH*sizeof(TColor),cudaMemcpyHostToDevice);
```
dst looks to be device memory since you have access to it in the cuda kernal.
0 讨论(0)
发布评论:

提交评论
- 加载中...