program fails for array 30 x 30

前端 未结 2 619
南旧
南旧 2021-01-24 21:05

This is program for matrix multiplication on CUDA architecture. This code is working fine when size of array is 30 x 30 but giving output as a series of 0\'s when size is greate

2条回答
  •  小鲜肉
    小鲜肉 (楼主)
    2021-01-24 21:51

    You probably have a max of 1024 threads per block on your GPU. 30 x 30 = 900, so that should be OK, but e.g. 40 x 40 would results in a kernel launch failure (take-home message: always check for errors !).

    You probably want to consider organizing your data differently, e.g. SIZE blocks of SIZE threads and then call the kernel as:

    matrix_multiply<<>>(c_input1,c_input2,c_result,SIZE);
    

    (Obviously you'll need to modify your array indexing within the kernel code, e.g. use the block index as the row and the thread index as the column.)

提交回复
热议问题