Matrix Multiplication giving wrong output [duplicate]
问题 This question already has an answer here : Unable to execute device kernel in CUDA (1 answer) Closed 4 years ago . What I am attempting to do is Multiply Matrix A & Matrix B and then from the product matrix I get the index of the maximum value per column. But unfortunately, only the first 128*128 values of the matrix multiplication are correct while others are just garbage. I do not quite understand how this works. I request you to kindly guide me with this .. #include<stdio.h> #include "cuda