CUDA global memory access speed

后端 未结 1 635
太阳男子
太阳男子 2021-01-28 09:49

here is simple cuda code.
I am testing the time of accessing global memory. read and right.

below is kernel function(test1()).

enter code here

__glo         


        
相关标签:
1条回答
  • 2021-01-28 10:32

    When you delete the code line:

    direct_map[index] = -1; 
    

    your kernel isn't doing anything useful. The compiler can recognize this and eliminate most of the code associated with the kernel launch. That modification to the kernel code means that the kernel no longer affects any global state and the code is effectively useless, from the compiler's perspective.

    You can verify this by dumping the assembly code that the compiler generates in each case, for example with cuobjdump -sass myexecutable

    Anytime you make a small change to the code and see a large change in timing, you should suspect that the change you made has allowed the compiler to make different optimization decisions.

    0 讨论(0)
提交回复
热议问题