program fails for array 30 x 30

前端未结

关注

 2  619

南旧 2021-01-24 21:05

This is program for matrix multiplication on CUDA architecture. This code is working fine when size of array is 30 x 30 but giving output as a series of 0\'s when size is greate

2条回答

小鲜肉 (楼主)

2021-01-24 21:51
You probably have a max of 1024 threads per block on your GPU. 30 x 30 = 900, so that should be OK, but e.g. 40 x 40 would results in a kernel launch failure (take-home message: always check for errors !).

You probably want to consider organizing your data differently, e.g. SIZE blocks of SIZE threads and then call the kernel as:
```
matrix_multiply<<>>(c_input1,c_input2,c_result,SIZE);
```
(Obviously you'll need to modify your array indexing within the kernel code, e.g. use the block index as the row and the thread index as the column.)
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...