发表新帖

发表新帖

cuda — out of memory (threads and blocks issue) --Address is out of bounds

前端未结

关注

 2  1940

天命终不由人 2021-01-28 16:39

I am using 63 registers/thread ,so (32768 is maximum) i can use about 520 threads.I am using now 512 threads in this example.

(The parallelism is in the function \"comp

2条回答

面向向阳花 (楼主)

2021-01-28 17:08
Using R=1000 and then

block=R/2,1,1 and grid=1,1 everything ok

If i try R=10000 and

block=R/20,1,1 and grid=20,1 ,then it show me "out of memory"

I'm not familiar with pycuda and didn't read into your code too deeply. However you have more blocks and more threads, so it will
- local memory (probably the kernel's stack, it's allocated per thread),
- shared memory (allocated per block), or
- global memory that gets allocated based on grid or gridDim.
You can reduce the stack size calling
```
cudeDeviceSetLimit(cudaLimitStackSize, N));
```
(the code is for the C runtime API, but the pycuda equivalent shouldn't be too hard to find).
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

热议问题