I was looking over the performance benchmarks: http://eigen.tuxfamily.org/index.php?title=Benchmark
I could not help but notice that eigen appears to consistently outper
Eigen has lazy evaluation. From How does Eigen compare to BLAS/LAPACK?:
For operations involving complex expressions, Eigen is inherently faster than any BLAS implementation because it can handle and optimize a whole operation globally -- while BLAS forces the programmer to split complex operations into small steps that match the BLAS fixed-function API, which incurs inefficiency due to introduction of temporaries. See for instance the benchmark result of a Y = aX + bY operation which involves two calls to BLAS level1 routines while Eigen automatically generates a single vectorized loop.
The second chart in the benchmarks is Y = a*X + b*Y
, which Eigen was specially designed to handle. It should be no wonder that a library wins at a benchmark it was created for. You'll notice that the more generic benchmarks, like matrix-matrix multiplication, don't show any advantage for Eigen.