I\'ve always thought it\'s the general wisdom that std::vector
is \"implemented as an array,\" blah blah blah. Today I went down and tested it, and it seems to
Try with this:
void UseVectorCtor()
{
TestTimer t("UseConstructor");
for(int i = 0; i < 1000; ++i)
{
int dimension = 999;
std::vector<Pixel> pixels(dimension * dimension, Pixel(255, 0, 0));
}
}
I get almost exactly the same performance as with array.
The thing about vector
is that it's a much more general tool than an array. And that means you have to consider how you use it. It can be used in a lot of different ways, providing functionality that an array doesn't even have. And if you use it "wrong" for your purpose, you incur a lot of overhead, but if you use it correctly, it is usually basically a zero-overhead data structure. In this case, the problem is that you separately initialized the vector (causing all elements to have their default ctor called), and then overwriting each element individually with the correct value. That is much harder for the compiler to optimize away than when you do the same thing with an array. Which is why the vector provides a constructor which lets you do exactly that: initialize N
elements with value X
.
And when you use that, the vector is just as fast as an array.
So no, you haven't busted the performance myth. But you have shown that it's only true if you use the vector optimally, which is a pretty good point too. :)
On the bright side, it's really the simplest usage that turns out to be fastest. If you contrast my code snippet (a single line) with John Kugelman's answer, containing heaps and heaps of tweaks and optimizations, which still don't quite eliminate the performance difference, it's pretty clear that vector
is pretty cleverly designed after all. You don't have to jump through hoops to get speed equal to an array. On the contrary, you have to use the simplest possible solution.
Here's how the push_back
method in vector works:
After calling push_back
X items:
Repeat. If you're not reserving
space its definitely going to be slower. More than that, if it's expensive to copy the item then 'push_back' like that is going to eat you alive.
As to the vector
versus array thing, I'm going to have to agree with the other people. Run in release, turn optimizations on, and put in a few more flags so that the friendly people at Microsoft don't #@%$^ it up for ya.
One more thing, if you don't need to resize, use Boost.Array.
It was hardly a fair comparison when I first looked at your code; I definitely thought you weren't comparing apples with apples. So I thought, let's get constructors and destructors being called on all tests; and then compare.
const size_t dimension = 1000;
void UseArray() {
TestTimer t("UseArray");
for(size_t j = 0; j < dimension; ++j) {
Pixel* pixels = new Pixel[dimension * dimension];
for(size_t i = 0 ; i < dimension * dimension; ++i) {
pixels[i].r = 255;
pixels[i].g = 0;
pixels[i].b = (unsigned char) (i % 255);
}
delete[] pixels;
}
}
void UseVector() {
TestTimer t("UseVector");
for(size_t j = 0; j < dimension; ++j) {
std::vector<Pixel> pixels(dimension * dimension);
for(size_t i = 0; i < dimension * dimension; ++i) {
pixels[i].r = 255;
pixels[i].g = 0;
pixels[i].b = (unsigned char) (i % 255);
}
}
}
int main() {
TestTimer t1("The whole thing");
UseArray();
UseVector();
return 0;
}
My thoughts were, that with this setup, they should be exactly the same. It turns out, I was wrong.
UseArray completed in 3.06 seconds
UseVector completed in 4.087 seconds
The whole thing completed in 10.14 seconds
So why did this 30% performance loss even occur? The STL has everything in headers, so it should have been possible for the compiler to understand everything that was required.
My thoughts were that it is in how the loop initialises all values to the default constructor. So I performed a test:
class Tester {
public:
static int count;
static int count2;
Tester() { count++; }
Tester(const Tester&) { count2++; }
};
int Tester::count = 0;
int Tester::count2 = 0;
int main() {
std::vector<Tester> myvec(300);
printf("Default Constructed: %i\nCopy Constructed: %i\n", Tester::count, Tester::count2);
return 0;
}
The results were as I suspected:
Default Constructed: 1
Copy Constructed: 300
This is clearly the source of the slowdown, the fact that the vector uses the copy constructor to initialise the elements from a default constructed object.
This means, that the following pseudo-operation order is happening during construction of the vector:
Pixel pixel;
for (auto i = 0; i < N; ++i) vector[i] = pixel;
Which, due to the implicit copy constructor made by the compiler, is expanded to the following:
Pixel pixel;
for (auto i = 0; i < N; ++i) {
vector[i].r = pixel.r;
vector[i].g = pixel.g;
vector[i].b = pixel.b;
}
So the default Pixel
remains un-initialised, while the rest are initialised with the default Pixel
's un-initialised values.
Compared to the alternative situation with New[]
/Delete[]
:
int main() {
Tester* myvec = new Tester[300];
printf("Default Constructed: %i\nCopy Constructed:%i\n", Tester::count, Tester::count2);
delete[] myvec;
return 0;
}
Default Constructed: 300
Copy Constructed: 0
They are all left to their un-initialised values, and without the double iteration over the sequence.
Armed with this information, how can we test it? Let's try over-writing the implicit copy constructor.
Pixel(const Pixel&) {}
And the results?
UseArray completed in 2.617 seconds
UseVector completed in 2.682 seconds
The whole thing completed in 5.301 seconds
So in summary, if you're making hundreds of vectors very often: re-think your algorithm.
In any case, the STL implementation isn't slower for some unknown reason, it just does exactly what you ask; hoping you know better.
My laptop is Lenova G770 (4 GB RAM).
The OS is Windows 7 64-bit (the one with laptop)
Compiler is MinGW 4.6.1.
The IDE is Code::Blocks.
I test the source codes of the first post.
O2 optimization
UseArray completed in 2.841 seconds
UseVector completed in 2.548 seconds
UseVectorPushBack completed in 11.95 seconds
The whole thing completed in 17.342 seconds
system pause
O3 optimization
UseArray completed in 1.452 seconds
UseVector completed in 2.514 seconds
UseVectorPushBack completed in 12.967 seconds
The whole thing completed in 16.937 seconds
It looks like the performance of vector is worse under O3 optimization.
If you change the loop to
pixels[i].r = i;
pixels[i].g = i;
pixels[i].b = i;
The speed of array and vector under O2 and O3 are almost the same.
With the right options, vectors and arrays can generate identical asm. In these cases, they are of course the same speed, because you get the same executable file either way.
Some profiler data (pixel is aligned to 32 bits):
g++ -msse3 -O3 -ftree-vectorize -g test.cpp -DNDEBUG && ./a.out
UseVector completed in 3.123 seconds
UseArray completed in 1.847 seconds
UseVectorPushBack completed in 9.186 seconds
The whole thing completed in 14.159 seconds
Blah
andrey@nv:~$ opannotate --source libcchem/src/a.out | grep "Total samples for file" -A3
Overflow stats not available
* Total samples for file : "/usr/include/c++/4.4/ext/new_allocator.h"
*
* 141008 52.5367
*/
--
* Total samples for file : "/home/andrey/libcchem/src/test.cpp"
*
* 61556 22.9345
*/
--
* Total samples for file : "/usr/include/c++/4.4/bits/stl_vector.h"
*
* 41956 15.6320
*/
--
* Total samples for file : "/usr/include/c++/4.4/bits/stl_uninitialized.h"
*
* 20956 7.8078
*/
--
* Total samples for file : "/usr/include/c++/4.4/bits/stl_construct.h"
*
* 2923 1.0891
*/
In allocator
:
: // _GLIBCXX_RESOLVE_LIB_DEFECTS
: // 402. wrong new expression in [some_] allocator::construct
: void
: construct(pointer __p, const _Tp& __val)
141008 52.5367 : { ::new((void *)__p) _Tp(__val); }
vector
:
:void UseVector()
:{ /* UseVector() total: 60121 22.3999 */
...
:
:
10790 4.0201 : for (int i = 0; i < dimension * dimension; ++i) {
:
495 0.1844 : pixels[i].r = 255;
:
12618 4.7012 : pixels[i].g = 0;
:
2253 0.8394 : pixels[i].b = 0;
:
: }
array
:void UseArray()
:{ /* UseArray() total: 35191 13.1114 */
:
...
:
136 0.0507 : for (int i = 0; i < dimension * dimension; ++i) {
:
9897 3.6874 : pixels[i].r = 255;
:
3511 1.3081 : pixels[i].g = 0;
:
21647 8.0652 : pixels[i].b = 0;
Most of the overhead is in the copy constructor. For example,
std::vector < Pixel > pixels;//(dimension * dimension, Pixel());
pixels.reserve(dimension * dimension);
for (int i = 0; i < dimension * dimension; ++i) {
pixels[i].r = 255;
pixels[i].g = 0;
pixels[i].b = 0;
}
It has the same performance as an array.