发表新帖

发表新帖

Where does CUDA allocate the stack frame for kernels?

后端未结

关注

 2  1482

My kernel call fails with \"out of memory\". It makes significant usage of the stack frame and I was wondering if this is the reason for its failure.

When invoking n

相关标签:

2条回答

温柔的废话

2020-12-19 07:03

Stack frame is most likely in the local memory.

I believe there is some limitation of the local memory usage, but even without it, I think CUDA driver might allocate more local memory than just for one thread in your <<<1,1>>> launch configuration.

One way or another, even if you manage to actually run your code, I fear it may be actually quite slow because of all those stack operations. Try to reduce the number of function calls (e.g. by inlining those functions).

0 讨论(0)
发布评论:

提交评论
- 加载中...
余生分开走

2020-12-19 07:19

Stack is allocated in local memory. Allocation is per physical thread (GTX480: 15 SM * 1536 threads/SM = 23040 threads). You are requesting 150,352 bytes/thread => ~3.4 GB of stack space. CUDA may reduce the maximum physical threads per launch if the size is that high. The CUDA language is not designed to have a large per thread stack.

In terms of registers GTX480 is limited to 63 registers per thread and 32K registers per SM.

0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题