program fails for array 30 x 30

前端未结

关注

 2  667

This is program for matrix multiplication on CUDA architecture. This code is working fine when size of array is 30 x 30 but giving output as a series of 0\'s when size is greate

相关标签:

2条回答

小鲜肉

2021-01-24 21:51
You probably have a max of 1024 threads per block on your GPU. 30 x 30 = 900, so that should be OK, but e.g. 40 x 40 would results in a kernel launch failure (take-home message: always check for errors !).

You probably want to consider organizing your data differently, e.g. SIZE blocks of SIZE threads and then call the kernel as:
```
matrix_multiply<<<SIZE, SIZE>>>(c_input1,c_input2,c_result,SIZE);
```
(Obviously you'll need to modify your array indexing within the kernel code, e.g. use the block index as the row and the thread index as the column.)
0 讨论(0)
发布评论:

提交评论
- 加载中...
忘掉有多难

2021-01-24 21:51
You are invoking the kernel with a configuration of 1 grid with size 30x30:
```
matrix_multiply<<<1, SIZE * SIZE>>>(c_input1,c_input2,c_result,SIZE);
```
There are not enough threads to process more.
0 讨论(0)
发布评论:

提交评论
- 加载中...