program fails for array 30 x 30

前端 未结 2 624
南旧
南旧 2021-01-24 21:05

This is program for matrix multiplication on CUDA architecture. This code is working fine when size of array is 30 x 30 but giving output as a series of 0\'s when size is greate

相关标签:
2条回答
  • 2021-01-24 21:51

    You probably have a max of 1024 threads per block on your GPU. 30 x 30 = 900, so that should be OK, but e.g. 40 x 40 would results in a kernel launch failure (take-home message: always check for errors !).

    You probably want to consider organizing your data differently, e.g. SIZE blocks of SIZE threads and then call the kernel as:

    matrix_multiply<<<SIZE, SIZE>>>(c_input1,c_input2,c_result,SIZE);
    

    (Obviously you'll need to modify your array indexing within the kernel code, e.g. use the block index as the row and the thread index as the column.)

    0 讨论(0)
  • 2021-01-24 21:51

    You are invoking the kernel with a configuration of 1 grid with size 30x30:

    matrix_multiply<<<1, SIZE * SIZE>>>(c_input1,c_input2,c_result,SIZE);
    

    There are not enough threads to process more.

    0 讨论(0)
提交回复
热议问题