问题
I did a comparison between blitz++, armadillo, boost::MultiArray with the following code (borrowed from an old post)
#include <iostream>
using namespace std;
#include <windows.h>
#define _SCL_SECURE_NO_WARNINGS
#define BOOST_DISABLE_ASSERTS
#include <boost/multi_array.hpp>
#include <blitz/array.h>
#include <armadillo>
int main(int argc, char* argv[])
{
const int X_SIZE = 1000;
const int Y_SIZE = 1000;
const int ITERATIONS = 100;
unsigned int startTime = 0;
unsigned int endTime = 0;
// Create the boost array
//------------------Measure boost Loop------------------------------------------
{
typedef boost::multi_array<double, 2> ImageArrayType;
ImageArrayType boostMatrix(boost::extents[X_SIZE][Y_SIZE]);
startTime = ::GetTickCount();
for (int i = 0; i < ITERATIONS; ++i)
{
for (int x = 0; x < X_SIZE; ++x)
{
for (int y = 0; y < Y_SIZE; ++y)
{
boostMatrix[x][y] = 1.0001;
}
}
}
endTime = ::GetTickCount();
printf("[Boost Loop] Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);
}
//------------------Measure blitz Loop-------------------------------------------
{
blitz::Array<double, 2> blitzArray( X_SIZE, Y_SIZE );
startTime = ::GetTickCount();
for (int i = 0; i < ITERATIONS; ++i)
{
for (int x = 0; x < X_SIZE; ++x)
{
for (int y = 0; y < Y_SIZE; ++y)
{
blitzArray(x,y) = 1.0001;
}
}
}
endTime = ::GetTickCount();
printf("[Blitz Loop] Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);
}
//------------------Measure armadillo loop----------------------------------------
{
arma::mat matArray( X_SIZE, Y_SIZE );
startTime = ::GetTickCount();
for (int i = 0; i < ITERATIONS; ++i)
{
for (int y = 0; y < Y_SIZE; ++y)
{
for (int x = 0; x < X_SIZE; ++x)
{
matArray(x,y) = 1.0001;
}
}
}
endTime = ::GetTickCount();
printf("[arma Loop] Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);
}
//------------------Measure native loop----------------------------------------
// Create the native array
{
double *nativeMatrix = new double [X_SIZE * Y_SIZE];
startTime = ::GetTickCount();
for (int i = 0; i < ITERATIONS; ++i)
{
for (int y = 0; y < Y_SIZE*X_SIZE; ++y)
{
nativeMatrix[y] = 1.0001;
}
}
endTime = ::GetTickCount();
printf("[Native Loop]Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);
delete[] nativeMatrix;
}
//------------------Measure boost computation-----------------------------------
{
typedef boost::multi_array<double, 2> ImageArrayType;
ImageArrayType boostMatrix(boost::extents[X_SIZE][Y_SIZE]);
for (int x = 0; x < X_SIZE; ++x)
{
for (int y = 0; y < Y_SIZE; ++y)
{
boostMatrix[x][y] = 1.0001;
}
}
startTime = ::GetTickCount();
for (int i = 0; i < ITERATIONS; ++i)
{
for (int x = 0; x < X_SIZE; ++x)
{
for (int y = 0; y < Y_SIZE; ++y)
{
boostMatrix[x][y] += boostMatrix[x][y] * 0.5;
}
}
}
endTime = ::GetTickCount();
printf("[Boost computation] Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);
}
//------------------Measure blitz computation-----------------------------------
{
blitz::Array<double, 2> blitzArray( X_SIZE, Y_SIZE );
blitzArray = 1.0001;
startTime = ::GetTickCount();
for (int i = 0; i < ITERATIONS; ++i)
{
blitzArray += blitzArray*0.5;
}
endTime = ::GetTickCount();
printf("[Blitz computation] Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);
}
//------------------Measure armadillo computation-------------------------------
{
arma::mat matArray( X_SIZE, Y_SIZE );
matArray.fill(1.0001);
startTime = ::GetTickCount();
for (int i = 0; i < ITERATIONS; ++i)
{
//matArray.fill(1.0001);
matArray += matArray*0.5;
}
endTime = ::GetTickCount();
printf("[arma computation] Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);
}
//------------------Measure native computation------------------------------------------
// Create the native array
{
double *nativeMatrix = new double [X_SIZE * Y_SIZE];
for (int y = 0; y < Y_SIZE*X_SIZE; ++y)
{
nativeMatrix[y] = 1.0001;
}
startTime = ::GetTickCount();
for (int i = 0; i < ITERATIONS; ++i)
{
for (int y = 0; y < Y_SIZE*X_SIZE; ++y)
{
nativeMatrix[y] += nativeMatrix[y] * 0.5;
}
}
endTime = ::GetTickCount();
printf("[Native computation]Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);
delete[] nativeMatrix;
}
return 0;
}
On windows, VS2010, results are
[Boost Loop] Elapsed time: 1.217 seconds
[Blitz Loop] Elapsed time: 0.046 seconds
[arma Loop] Elapsed time: 0.078 seconds
[Native Loop]Elapsed time: 0.172 seconds
[Boost computation] Elapsed time: 2.152 seconds
[Blitz computation] Elapsed time: 0.156 seconds
[arma computation] Elapsed time: 0.078 seconds
[Native computation]Elapsed time: 0.078 seconds
On windows, intel c++, results are
[Boost Loop] Elapsed time: 0.468 seconds
[Blitz Loop] Elapsed time: 0.125 seconds
[arma Loop] Elapsed time: 0.046 seconds
[Native Loop]Elapsed time: 0.047 seconds
[Boost computation] Elapsed time: 0.796 seconds
[Blitz computation] Elapsed time: 0.109 seconds
[arma computation] Elapsed time: 0.078 seconds
[Native computation]Elapsed time: 0.062 seconds
Something strange:
(1) with VS2010, native computation (including loop) is faster than native loop
(2) blitz loop behave so different under VS2010 and intel C++.
To compile blitz++ with intel c++ compiler, a file called bzconfig.h is required in blitz/intel/ folder. But there isn't. I just copy the one in blitz/ms/bzconfig.h in. That may give an non-optimal configuration. Anyone can tell me how to compile blitz++ with intel c++ compiler? In the manual, it said run bzconfig script to get the right bzconfig.h. But I don't understand what it means.
Thanks a lot!
Add some of my conclusion:
1. Boost multi array is the slowest.
2. With intel c++ compiler, native pointers are very fast.
3. With intel c++ compiler, armadillo can achieve the performance of native pointers.
4. Also test eigen, it is x0% slower than armadillo in my simple cases.
5. Curious about blitz++'s behavior in intel c++ compiler with proper configuration.
Please see my question.
回答1:
Short answer: ./configure CXX=icpc
, found by reading the Blitz++ User's Guide.
Long answer:
To compile blitz++ with intel c++ compiler, a file called bzconfig.h is required in blitz/intel/ folder. But there isn't.
Yes and yes. Blitz++ is supposed to generate the file itself. According to the Blitz++ User's Guide blitz.pdf
included in blitz-0.10.tar.gz
, section "Installation",
Blitz++ uses GNU Autoconf, which handles rewriting Makefiles for various platforms and compilers.
More accurately, Blitz++ uses the GNU autotools tool chain (automake, autoconf, configure), which can generate makefiles, configure scripts, header files and more. The bzconfig.h
files are supposed to be generated by the configure
script, which comes with Blitz++, ready to use.
I just copy the one in blitz/ms/bzconfig.h in. That may give an non-optimal configuration.
If "non-optimal" means "non-working" to you, then yes. :-)
You need an intel/bzconfig.h
that accurately represents your compiler.
Anyone can tell me how to compile blitz++ with intel c++ compiler?
Read and follow the fine manual, in particular the section "Installation" mentioned above.
go into the ‘blitz-VERSION’ directory, and type:
./configure CXX=[compiler]
where [compiler] is one of xlc++, icpc, pathCC, xlC, cxx, aCC, CC, g++, KCC, pgCC or FCC. (If you do not choose a C++ compiler, the configure script will attempt to find an appropriate compiler for the current platform.)
Have you done this? For the Intel compiler, you would need to use
./configure CXX=icpc
.
In the manual, it said run bzconfig script to get the right bzconfig.h. But I don't understand what it means.
I assume that by "it" you mean "that". What do you mean by "manual"? My copy of the Blitz++ User's Guide does not mention bzconfig
. Are you sure that you are using the manual that corresponds to your Blitz++ version?
PS: Looking for "bzconfig" in the contents of blitz-0.10, it looks like "bzconfig" is no longer part of Blitz++, but used to be:
find . -name bzconfig
-> No results
find . -print0 | xargs -0 grep -a -i -n -e bzconfig
:
./blitz/compiler.h:44: #error In <blitz/config.h>: A working template implementation is required by Blitz++ (you may need to rerun the compiler/bzconfig script)
That needs to be updated.
./blitz/gnu/bzconfig.h:4:/* blitz/gnu/bzconfig.h. Generated automatically at end of configure. */
./configure.ac:159:# autoconf replacement of bzconfig
There you have it, these bzconfig.h
files should be generated by configure
.
./ChangeLog.1:1787: will now replace the old file that was generate with the bzconfig
That may be the change that switched to autoconf.
./INSTALL:107: 2. Go into the compiler subdirectory and run the bzconfig
That needs to be updated. Is this what made you look for bzconfig
?
./README:27:compiler Compiler tests (used with obsolete bzconfig script)
Needs updating, a compiler
directory is no longer included.
回答2:
As far as I can tell, you are judging the performance of each matrix library by measuring the speed of multiplying a single matrix by a scalar. Due to its template-based policy, Armadillo will do a very good job at this by breaking down each multiply into parallelizable code for most compilers.
But I suggest you need to rethink your test scope and methodology. For example, you've left out every BLAS implementation. The BLAS function you'd need would be dscal. A vendor-provided implementation for your specific CPU would probably do a good job.
More relevantly, there are many more things any reasonable vector library would need to be able to do: matrix multiplies, dot products, vector lengths, transposes, and so forth, which aren't addressed by your test. Your test addresses exactly two things: element assignment, which practically speaking is never a bottleneck for vector libraries, and scalar/vector multiplication, which is a BLAS level 1 function provided by every CPU manufacturer.
There is a discussion of BLAS level 1 vs. compiler-emitted code here.
tl:dr; use Armadillo with BLAS and LAPACK native libraries linked in for your platform.
回答3:
My test showed boost arrays had the same performance as the native/hardcoded C++ code.
You need to compare them using compiler optimisations activated. That is:
-O3
-DNDEBUG
-DBOOST_UBLAS_NDEBUG
-DBOOST_DISABLE_ASSERTS
-DARMA_NO_DEBUG
...
When I tested (em++), Boost performed at least 10X faster when you deactivate its asserts, enable level 3 optimisation using -O3
, etc. Any fair comparison should use these flags.
来源:https://stackoverflow.com/questions/14414906/compare-blitz-armadillo-boostmultiarray