I tried to implement the Strassen algorithm for matrix multiplication with C++, but the result isn\'t that, what I expected. As you can see strassen always takes more time then
Long shot, but have you considered that the standard multiplication may be optimised by the compiler? Could you switch off optimisations?