I am making some benchmarks with CUDA, C++, C#, Java, and using MATLAB for verification and matrix generation. When I perform matrix multiplication with MATLAB, 2048x
2048x
This is why. MATLAB doesn't perform a naive matrix multiplication by looping over every single element the way you did in your C++ code.
Of course I'm assuming that you just used C=A*B instead of writing a multiplication function yourself.
C=A*B