I was looking over the performance benchmarks: http://eigen.tuxfamily.org/index.php?title=Benchmark
I could not help but notice that eigen appears to consistently outper
Generic code can be fast because Compile Time Function Evaluation (CTFE) allows to choose optimal register blocking strategy (small temporary sub-matrixes stored in CPU registers).
Mir GLAS and Intel MKL are faster than Eigen and OpenBLAS. Mir GLAS is more generic compared to Eigen. See also the benchmark and reddit thread.