How CudaMalloc work?

前端 未结 3 814
忘了有多久
忘了有多久 2020-12-11 08:52

I am trying to modify the imageDenosing class in CUDA SDK, I need to repeat the filter many time incase to capture the time. But my code doesn\'t work properly.

//st

相关标签:
3条回答
  • 2020-12-11 09:23

    I already answered this for you when you posted the same question previously - you need to wait for a kernel to complete before running it again - add:

    cudaThreadSynchronize(); // *** wait for kernel to complete ***
    

    after the kernel call.

    0 讨论(0)
  • 2020-12-11 09:32

    Your kernel is running asynchronously - you need to wait for it to complete, e.g.

    cudaMalloc((void **)&dst2, size);
    cudaMemcpy(dst2, dst, imageW * imageH * sizeof(TColor), cudaMemcpyHostToDevice);
    F1D<<<grid2, threads2>>>(dst, imageW, imageH, dst2);
    cudaThreadSynchronize(); // *** wait for kernel to complete ***
    cudaFree(dst2);
    
    0 讨论(0)
  • 2020-12-11 09:32

    The statement

    image[imageW * iy + ix] =   buffer[imageW * iy + ix];
    

    is causing the problem. You are overwriting your input image in the kernel. So depending on thread execution order, you would be further blurring parts of the image.

    Also, I don't see the purpose of

    cudaMemcpy(dst2, dst, imageW*imageH*sizeof(TColor),cudaMemcpyHostToDevice);
    

    dst looks to be device memory since you have access to it in the cuda kernal.

    0 讨论(0)
提交回复
热议问题