How can the C++ Eigen library perform better than specialized vendor libraries?

前端未结

关注

 6  1270

执笔经年

I was looking over the performance benchmarks: http://eigen.tuxfamily.org/index.php?title=Benchmark

I could not help but notice that eigen appears to consistently outper

相关标签:

6条回答

生来不讨喜

2021-01-29 23:05

I sent the same question to the ATLAS mailing list some time ago:

http://sourceforge.net/mailarchive/message.php?msg_id=28711667

Clint (the ATLAS developer) does not trust these benchmarks. He suggested some trustworthy benchmark procedure. As soon as I have some free time I will do this kind of benchmarking.

If the BLAS functionality of Eigen is actually faster then that of GotoBLAS/GotoBLAS, ATLAS, MKL then they should provide a standard BLAS interface anyway. This would allow linking of LAPACK against such an Eigen-BLAS. In this case, it would also be an interesting option for Matlab and friends.

0 讨论(0)
发布评论:

提交评论
- 加载中...
没有蜡笔的小新

2021-01-29 23:08

It doesn't seem to consistently outperform other libraries, as can be seen on the graphs further down on that page you linked. So the different libraries are optimized for different use cases, and different libraries are faster for different problems.

This is not surprising, since you usually cannot optimize perfectly for all use cases. Optimizing for one specific operation usually limits the optimization options for other use cases.

0 讨论(0)
发布评论:

提交评论
- 加载中...
梦如初夏

2021-01-29 23:10

Eigen has lazy evaluation. From How does Eigen compare to BLAS/LAPACK?:

For operations involving complex expressions, Eigen is inherently faster than any BLAS implementation because it can handle and optimize a whole operation globally -- while BLAS forces the programmer to split complex operations into small steps that match the BLAS fixed-function API, which incurs inefficiency due to introduction of temporaries. See for instance the benchmark result of a Y = aX + bY operation which involves two calls to BLAS level1 routines while Eigen automatically generates a single vectorized loop.

The second chart in the benchmarks is Y = a*X + b*Y, which Eigen was specially designed to handle. It should be no wonder that a library wins at a benchmark it was created for. You'll notice that the more generic benchmarks, like matrix-matrix multiplication, don't show any advantage for Eigen.

0 讨论(0)
发布评论:

提交评论
- 加载中...
一整个雨季

2021-01-29 23:14
About the comparison ATLAS vs. Eigen

Have a look at this thread on the Eigen mailing list starting here:
- http://listengine.tuxfamily.org/lists.tuxfamily.org/eigen/2012/07/msg00052.html
It shows for instance that ATLAS outperforms Eigen on the matrix-matrix product by 46%:
- http://listengine.tuxfamily.org/lists.tuxfamily.org/eigen/2012/07/msg00062.html
More benchmarks results and details on how the benchmarks were done can be found here:
- Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz:
  
  http://www.mathematik.uni-ulm.de/~lehn/bench_FLENS/index.html
- http://sourceforge.net/tracker/index.php?func=detail&aid=3540928&group_id=23725&atid=379483
Edit:

For my lecture "Software Basics for High Performance Computing" I created a little framework called ulmBLAS. It contains the ATLAS benchmark suite and students could implement their own matrix-matrix product based on the BLIS papers. You can have a look at the final benchmarks which also measure Eigen:
- http://apfel.mathematik.uni-ulm.de/~lehn/sghpc/gemm/page14/index.html#toc5
You can use the ulmBLAS framework to make your own benchmarks.

Also have a look at
- Matrix-Matrix Product Experiments with uBLAS
- Matrix-Matrix Product Experiments with BLAZE
0 讨论(0)
发布评论:

提交评论
- 加载中...
抹茶落季

2021-01-29 23:15

Generic code can be fast because Compile Time Function Evaluation (CTFE) allows to choose optimal register blocking strategy (small temporary sub-matrixes stored in CPU registers).

Mir GLAS and Intel MKL are faster than Eigen and OpenBLAS. Mir GLAS is more generic compared to Eigen. See also the benchmark and reddit thread.

0 讨论(0)
发布评论:

提交评论
- 加载中...
一整个雨季

2021-01-29 23:17
Benchmarks are designed to be misinterpreted.

Let's look at the matrix * matrix product. The benchmark available on this page from the Eigen website tells you than Eigen (with its own BLAS) gives timings similar to the MKL for large matrices (n = 1000). I've compared Eigen 3.2.6 with MKL 11.3 on my computer (a laptop with a core i7) and the MKL is 3 times faster than Eigen for such matrices using one thread, and 10 times faster than Eigen using 4 threads. This looks like a completely different conclusion. There are two reasons for this. Eigen 3.2.6 (its internal BLAS) does not use AVX. Moreover, it does not seem to make a good usage of multithreading. This benchmark hides this as they use a CPU that does not have AVX support without multithreading.

Usually, those C++ libraries (Eigen, Armadillo, Blaze) bring two things:
- Nice operator overloading: You can use +, * with vectors and matrices. In order to get nice performance, they have to use tricky techniques known as "Smart Template expression" in order to avoid temporary when they reduce the timing (such as y = alpha x1 + beta x2 with y, x1, x2 vectors) and introduce them when they are useful (such as A = B * C with A, B, C matrices). They can also reorder operations for less computations, for instance, if A, B, C are matrices A * B * C can be computed as (A * B) * C or A * (B * C) depending upon their sizes.
- Internal BLAS: To compute the product of 2 matrices, they can either rely on their internal BLAS or one externally provided (MKL, OpenBLAS, ATLAS). On Intel chips with large matrices, the MKL il almost impossible to beat. For small matrices, one can beat the MKL as it was not designed for that kind of problems.
Usually, when those libraries provide benchmarks against the MKL, they usually use old hardware, and do not turn on multithreading so they can be on par with the MKL. They might also compare BLAS level 1 operations such as y = alpha x1 + beta x2 with 2 calls to a BLAS level 1 function which is a stupid thing to do anyway.

In a nutshell, those libraries are extremely convenient for their overloading of + and * which is extremely difficult to do without losing performance. They usually do a good job on this. But when they give you benchmark saying that they can be on par or beat the MKL with their own BLAS, be careful and do your own benchmark. You'll usually get different results ;-).
0 讨论(0)
发布评论:

提交评论
- 加载中...

How can the C++ Eigen library perform better than specialized vendor libraries?

About the comparison ATLAS vs. Eigen