Fortran code is better for matrix and vector type operation in general. But you also can produce similar performance with c/c++ code by passing hints/suggestions to the compiler to produce similar quality vector instructions. One option that gave me good boost was not to assume memory aliasing among input variables that are array objects. This way, the compiler can aggressively do inner loop unrolling and pipelining for ILP where it can overlap loads and store operation across loop iteration with right prefetches.