I tried to implement the Strassen algorithm for matrix multiplication with C++, but the result isn\'t that, what I expected. As you can see strassen always takes more time then
I am actually shocked at how much faster my Stassen multiplcation implementation is:
http://ezekiel.vancouver.wsu.edu/~cs330/lectures/linear_algebra/mm/mm.c
I get an almost 16x speedup on my machine when n=1024. The only way I can explain this much of a speed-up is that my algorithm is more cache-friendly -- i.e., it focuses on small pieces of the matrices and thus the data is more localized.
The overhead in your C++ implementation is probably too high -- the compiler generates more temporaries than what is really necessary. My implementation tries to minimizing this by reusing memory whenever possible.