I was looking over the performance benchmarks: http://eigen.tuxfamily.org/index.php?title=Benchmark
I could not help but notice that eigen appears to consistently outper
Benchmarks are designed to be misinterpreted.
Let's look at the matrix * matrix product. The benchmark available on this page from the Eigen website tells you than Eigen (with its own BLAS) gives timings similar to the MKL for large matrices (n = 1000). I've compared Eigen 3.2.6 with MKL 11.3 on my computer (a laptop with a core i7) and the MKL is 3 times faster than Eigen for such matrices using one thread, and 10 times faster than Eigen using 4 threads. This looks like a completely different conclusion. There are two reasons for this. Eigen 3.2.6 (its internal BLAS) does not use AVX. Moreover, it does not seem to make a good usage of multithreading. This benchmark hides this as they use a CPU that does not have AVX support without multithreading.
Usually, those C++ libraries (Eigen, Armadillo, Blaze) bring two things:
Usually, when those libraries provide benchmarks against the MKL, they usually use old hardware, and do not turn on multithreading so they can be on par with the MKL. They might also compare BLAS level 1 operations such as y = alpha x1 + beta x2 with 2 calls to a BLAS level 1 function which is a stupid thing to do anyway.
In a nutshell, those libraries are extremely convenient for their overloading of + and * which is extremely difficult to do without losing performance. They usually do a good job on this. But when they give you benchmark saying that they can be on par or beat the MKL with their own BLAS, be careful and do your own benchmark. You'll usually get different results ;-).