Is std::vector so much slower than plain arrays?

后端未结

关注

 22  2402

I\'ve always thought it\'s the general wisdom that std::vector is \"implemented as an array,\" blah blah blah. Today I went down and tested it, and it seems to

相关标签:

22条回答

花落未央

2020-11-22 12:24
Try with this:
```
void UseVectorCtor()
{
    TestTimer t("UseConstructor");

    for(int i = 0; i < 1000; ++i)
    {
        int dimension = 999;

        std::vector<Pixel> pixels(dimension * dimension, Pixel(255, 0, 0));
    }
}
```
I get almost exactly the same performance as with array.

The thing about vector is that it's a much more general tool than an array. And that means you have to consider how you use it. It can be used in a lot of different ways, providing functionality that an array doesn't even have. And if you use it "wrong" for your purpose, you incur a lot of overhead, but if you use it correctly, it is usually basically a zero-overhead data structure. In this case, the problem is that you separately initialized the vector (causing all elements to have their default ctor called), and then overwriting each element individually with the correct value. That is much harder for the compiler to optimize away than when you do the same thing with an array. Which is why the vector provides a constructor which lets you do exactly that: initialize N elements with value X.

And when you use that, the vector is just as fast as an array.

So no, you haven't busted the performance myth. But you have shown that it's only true if you use the vector optimally, which is a pretty good point too. :)

On the bright side, it's really the simplest usage that turns out to be fastest. If you contrast my code snippet (a single line) with John Kugelman's answer, containing heaps and heaps of tweaks and optimizations, which still don't quite eliminate the performance difference, it's pretty clear that vector is pretty cleverly designed after all. You don't have to jump through hoops to get speed equal to an array. On the contrary, you have to use the simplest possible solution.
0 讨论(0)
发布评论:

提交评论
- 加载中...
花落未央

2020-11-22 12:26
Here's how the push_back method in vector works:
1. The vector allocates X amount of space when it is initialized.
2. As stated below it checks if there is room in the current underlying array for the item.
3. It makes a copy of the item in the push_back call.
After calling push_back X items:
1. The vector reallocates kX amount of space into a 2nd array.
2. It Copies the entries of the first array onto the second.
3. Discards the first array.
4. Now uses the second array as storage until it reaches kX entries.
Repeat. If you're not reserving space its definitely going to be slower. More than that, if it's expensive to copy the item then 'push_back' like that is going to eat you alive.

As to the vector versus array thing, I'm going to have to agree with the other people. Run in release, turn optimizations on, and put in a few more flags so that the friendly people at Microsoft don't #@%$^ it up for ya.

One more thing, if you don't need to resize, use Boost.Array.
0 讨论(0)
发布评论:

提交评论
- 加载中...

星月不相逢

2020-11-22 12:27

It was hardly a fair comparison when I first looked at your code; I definitely thought you weren't comparing apples with apples. So I thought, let's get constructors and destructors being called on all tests; and then compare.

const size_t dimension = 1000;

void UseArray() {
    TestTimer t("UseArray");
    for(size_t j = 0; j < dimension; ++j) {
        Pixel* pixels = new Pixel[dimension * dimension];
        for(size_t i = 0 ; i < dimension * dimension; ++i) {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = (unsigned char) (i % 255);
        }
        delete[] pixels;
    }
}

void UseVector() {
    TestTimer t("UseVector");
    for(size_t j = 0; j < dimension; ++j) {
        std::vector<Pixel> pixels(dimension * dimension);
        for(size_t i = 0; i < dimension * dimension; ++i) {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = (unsigned char) (i % 255);
        }
    }
}

int main() {
    TestTimer t1("The whole thing");

    UseArray();
    UseVector();

    return 0;
}

My thoughts were, that with this setup, they should be exactly the same. It turns out, I was wrong.

UseArray completed in 3.06 seconds
UseVector completed in 4.087 seconds
The whole thing completed in 10.14 seconds

So why did this 30% performance loss even occur? The STL has everything in headers, so it should have been possible for the compiler to understand everything that was required.

My thoughts were that it is in how the loop initialises all values to the default constructor. So I performed a test:

class Tester {
public:
    static int count;
    static int count2;
    Tester() { count++; }
    Tester(const Tester&) { count2++; }
};
int Tester::count = 0;
int Tester::count2 = 0;

int main() {
    std::vector<Tester> myvec(300);
    printf("Default Constructed: %i\nCopy Constructed: %i\n", Tester::count, Tester::count2);

    return 0;
}

The results were as I suspected:

Default Constructed: 1
Copy Constructed: 300

This is clearly the source of the slowdown, the fact that the vector uses the copy constructor to initialise the elements from a default constructed object.

This means, that the following pseudo-operation order is happening during construction of the vector:

Pixel pixel;
for (auto i = 0; i < N; ++i) vector[i] = pixel;

Which, due to the implicit copy constructor made by the compiler, is expanded to the following:

Pixel pixel;
for (auto i = 0; i < N; ++i) {
    vector[i].r = pixel.r;
    vector[i].g = pixel.g;
    vector[i].b = pixel.b;
}

So the default Pixel remains un-initialised, while the rest are initialised with the default Pixel's un-initialised values.

Compared to the alternative situation with New[]/Delete[]:

int main() {
    Tester* myvec = new Tester[300];

    printf("Default Constructed: %i\nCopy Constructed:%i\n", Tester::count, Tester::count2);

    delete[] myvec;

    return 0;
}

Default Constructed: 300
Copy Constructed: 0

They are all left to their un-initialised values, and without the double iteration over the sequence.

Armed with this information, how can we test it? Let's try over-writing the implicit copy constructor.

Pixel(const Pixel&) {}

And the results?

UseArray completed in 2.617 seconds
UseVector completed in 2.682 seconds
The whole thing completed in 5.301 seconds

So in summary, if you're making hundreds of vectors very often: re-think your algorithm.

In any case, the STL implementation isn't slower for some unknown reason, it just does exactly what you ask; hoping you know better.

0 讨论(0)

难免孤独

2020-11-22 12:27
My laptop is Lenova G770 (4 GB RAM).

The OS is Windows 7 64-bit (the one with laptop)

Compiler is MinGW 4.6.1.

The IDE is Code::Blocks.

I test the source codes of the first post.

The results

O2 optimization

UseArray completed in 2.841 seconds

UseVector completed in 2.548 seconds

UseVectorPushBack completed in 11.95 seconds

The whole thing completed in 17.342 seconds

system pause

O3 optimization

UseArray completed in 1.452 seconds

UseVector completed in 2.514 seconds

UseVectorPushBack completed in 12.967 seconds

The whole thing completed in 16.937 seconds

It looks like the performance of vector is worse under O3 optimization.

If you change the loop to
```
    pixels[i].r = i;
    pixels[i].g = i;
    pixels[i].b = i;
```
The speed of array and vector under O2 and O3 are almost the same.
0 讨论(0)
发布评论:

提交评论
- 加载中...
忘掉有多难

2020-11-22 12:28

With the right options, vectors and arrays can generate identical asm. In these cases, they are of course the same speed, because you get the same executable file either way.

0 讨论(0)
发布评论:

提交评论
- 加载中...

没有蜡笔的小新

2020-11-22 12:29

Some profiler data (pixel is aligned to 32 bits):

g++ -msse3 -O3 -ftree-vectorize -g test.cpp -DNDEBUG && ./a.out
UseVector completed in 3.123 seconds
UseArray completed in 1.847 seconds
UseVectorPushBack completed in 9.186 seconds
The whole thing completed in 14.159 seconds

Blah

andrey@nv:~$ opannotate --source libcchem/src/a.out  | grep "Total samples for file" -A3
Overflow stats not available
 * Total samples for file : "/usr/include/c++/4.4/ext/new_allocator.h"
 *
 * 141008 52.5367
 */
--
 * Total samples for file : "/home/andrey/libcchem/src/test.cpp"
 *
 *  61556 22.9345
 */
--
 * Total samples for file : "/usr/include/c++/4.4/bits/stl_vector.h"
 *
 *  41956 15.6320
 */
--
 * Total samples for file : "/usr/include/c++/4.4/bits/stl_uninitialized.h"
 *
 *  20956  7.8078
 */
--
 * Total samples for file : "/usr/include/c++/4.4/bits/stl_construct.h"
 *
 *   2923  1.0891
 */

In allocator:

               :      // _GLIBCXX_RESOLVE_LIB_DEFECTS
               :      // 402. wrong new expression in [some_] allocator::construct
               :      void
               :      construct(pointer __p, const _Tp& __val)
141008 52.5367 :      { ::new((void *)__p) _Tp(__val); }

vector:

               :void UseVector()
               :{ /* UseVector() total:  60121 22.3999 */
...
               :
               :
 10790  4.0201 :        for (int i = 0; i < dimension * dimension; ++i) {
               :
   495  0.1844 :            pixels[i].r = 255;
               :
 12618  4.7012 :            pixels[i].g = 0;
               :
  2253  0.8394 :            pixels[i].b = 0;
               :
               :        }

array

               :void UseArray()
               :{ /* UseArray() total:  35191 13.1114 */
               :
...
               :
   136  0.0507 :        for (int i = 0; i < dimension * dimension; ++i) {
               :
  9897  3.6874 :            pixels[i].r = 255;
               :
  3511  1.3081 :            pixels[i].g = 0;
               :
 21647  8.0652 :            pixels[i].b = 0;

Most of the overhead is in the copy constructor. For example,

    std::vector < Pixel > pixels;//(dimension * dimension, Pixel());

    pixels.reserve(dimension * dimension);

    for (int i = 0; i < dimension * dimension; ++i) {

        pixels[i].r = 255;

        pixels[i].g = 0;

        pixels[i].b = 0;
    }

It has the same performance as an array.

0 讨论(0)