This is program for matrix multiplication on CUDA architecture. This code is working fine when size of array is 30 x 30 but giving output as a series of 0\'s when size is greate
You probably have a max of 1024 threads per block on your GPU. 30 x 30 = 900, so that should be OK, but e.g. 40 x 40 would results in a kernel launch failure (take-home message: always check for errors !).
You probably want to consider organizing your data differently, e.g. SIZE
blocks of SIZE
threads and then call the kernel as:
matrix_multiply<<>>(c_input1,c_input2,c_result,SIZE);
(Obviously you'll need to modify your array indexing within the kernel code, e.g. use the block index as the row and the thread index as the column.)