I am making some benchmarks with CUDA, C++, C#, Java, and using MATLAB for verification and matrix generation. When I perform matrix multiplication with MATLAB, 2048x
When doing matrix multiplying, you use naive multiplication method which takes time of O(n^3)
.
There exist matrix multiplication algorithm which takes O(n^2.4)
. Which means that at n=2000
your algorithm requires ~100 times as much computation as the best algorithm.
You should really check the wikipedia page for matrix multiplication for further information on the efficient ways to implement it.