I have a school project to create 2 versions of a javacode that multiplies two square matrices. To make it easier, they only have to work for 2x2, 4x4, 8x8 etc. We have a pseudo
The fork-join framework in Java 7 api is designed for doing these kind of problems very fast (by using all CPU-cores in your computer) by calling the multiply function recursively. Look at http://docs.oracle.com/javase/tutorial/essential/concurrency/forkjoin.html.
You have to replace the split in the fork-join framework by matrix partition in your code and each time dividing into 4 sub-tasks (instead of 2 in the example given at the link above). Do not copy elements to create smaller matrices, it will slow down the program considerably (and require lot of memory!). Just change the start and end to define sub-matrix while passing to function. The threshold in this case is going to be 1 when you update C matrix by multiplying the scalars.
Tip: Test code with very small non-symmetrical matrices with size say 4x4 so that you can manually calculate and compare answers.