I tried to rewrite codes from Fortran to C++ with a 2000*2000 matrix multiplication implements through Eigen library. I found that for loop in Eigen is much slower (>3x)