I am trying to compare the performance of boost::multi_array to native dynamically allocated arrays, with the following test program:
#include
Are you building release or debug?
If running in debug mode, the boost array might be really slow because their template magic isn't inlined properly giving lots of overhead in function calls. I'm not sure how multi array is implemented though so this might be totally off :)
Perhaps there is some difference in storage order as well so you might be having your image stored column by column and writing it row by row. This would give poor cache behavior and may slow down things.
Try switching the order of the X and Y loop and see if you gain anything. There is some info on the storage ordering here: http://www.boost.org/doc/libs/1_37_0/libs/multi_array/doc/user.html
EDIT: Since you seem to be using the two dimensional array for image processing you might be interested in checking out boosts image processing library gil.
It might have arrays with less overhead that works perfectly for your situation.
I've compiled the code (with slight modifications) under VC++ 2010 with optimisation turned on ("Maximize Speed" together with inlining "Any Suitable" functions and "Favoring fast code") and got times 0.015/0.391. I've generated assembly listing and, though I'm a terrible assembly noob, there's one line inside the boost-measuring loop which doesn't look good to me:
call ??A?$multi_array_ref@N$01@boost@@QAE?AV?$sub_array@N$00@multi_array@detail@1@H@Z ; boost::multi_array_ref<double,2>::operator[]
One of the [] operators didn't get inlined! The called procedure makes another call, this time to multi_array::value_accessor_n<...>::access<...>()
:
call ??$access@V?$sub_array@N$00@multi_array@detail@boost@@PAN@?$value_accessor_n@N$01@multi_array@detail@boost@@IBE?AV?$sub_array@N$00@123@U?$type@V?$sub_array@N$00@multi_array@detail@boost@@@3@HPANPBIPBH3@Z ; boost::detail::multi_array::value_accessor_n<double,2>::access<boost::detail::multi_array::sub_array<double,1>,double *>
Altogether, the two procedures are quite a lot of code for simply accessing a single element in the array. My general impression is that the library is so complex and high-level that Visual Studio is unable to optimise it as much as we would like (posters using gcc apparently have got better results).
IMHO, a good compiler really should have inlined and optimised the two procedures - both are pretty short and straight-forward, don't contain any loops etc. A lot of time may be wasted simply on passing their arguments and results.
I think I know what the problem is...maybe.
In order for the boost implementation to have a syntax like: matrix[x][y]. that means that matrix[x] has to return a reference to an object which acts like a 1D array column, at which point reference[y] gives you your element.
The problem here is that you are iterating in row major order (which is typical in c/c++ since native arrays are row major IIRC. The compiler has to re-execute matrix[x] for each y in this case. If you iterated in column major order when using the boost matrix, you may see better performance.
Just a theory.
EDIT: on my linux system (with some minor changes) I tested my theory, and did show some performance improvement by switching x and y, but it was still slower than a native array. This might be a simple issue of the compiler not being able to optimize away the temporary reference type.
A similar question was asked and answered here:
http://www.codeguru.com/forum/archive/index.php/t-300014.html
The short answer is that it is easiest for the compiler to optimize the simple arrays, and not so easy to optimize the Boost version. Hence, a particular compiler may not give the Boost version all the same optimization benefits.
Compilers can also vary in how well they will optimize vs. how conservative they will be (e.g. with templated code or other complications).
I modified the above code in visual studio 2008 v9.0.21022 and applied the container routines from the Numerical Recipe routines for C and C++
http://www.nrbook.com/nr3/ using their licensed routines dmatrix and MatDoub respectively
dmatrix uses the out of date syntax malloc operator and is not recommended... MatDoub uses the New command
The speed in seconds are in Release version:
Boost: 0.437
Native: 0.032
Numerical Recipes C: 0.031
Numerical recipes C++: 0.031
So from the above blitz looks like the best free alternative.
I am wondering two things:
1) bounds check: define the BOOST_DISABLE_ASSERTS preprocessor macro prior to including multi_array.hpp in your application. This turns off bound checking. not sure if this this is disables when NDEBUG is.
2) base index: MultiArray can index arrays from bases different from 0. That means that multi_array stores a base number (in each dimension) and uses a more complicated formula to obtain the exact location in memory, I am wondering if it is all about that.
Otherwise I don't understand why multiarray should be slower than C-arrays.