cuda — out of memory (threads and blocks issue) --Address is out of bounds

前端 未结 2 1940
天命终不由人
天命终不由人 2021-01-28 16:39

I am using 63 registers/thread ,so (32768 is maximum) i can use about 520 threads.I am using now 512 threads in this example.

(The parallelism is in the function \"comp

2条回答
  •  面向向阳花
    2021-01-28 17:08

    Using R=1000 and then

    block=R/2,1,1 and grid=1,1 everything ok

    If i try R=10000 and

    block=R/20,1,1 and grid=20,1 ,then it show me "out of memory"

    I'm not familiar with pycuda and didn't read into your code too deeply. However you have more blocks and more threads, so it will

    • local memory (probably the kernel's stack, it's allocated per thread),

    • shared memory (allocated per block), or

    • global memory that gets allocated based on grid or gridDim.

    You can reduce the stack size calling

    cudeDeviceSetLimit(cudaLimitStackSize, N));
    

    (the code is for the C runtime API, but the pycuda equivalent shouldn't be too hard to find).

提交回复
热议问题