Using arrays or std::vectors in C++, what's the performance gap?

后端 未结 19 1460
忘了有多久
忘了有多久 2020-11-22 03:23

In our C++ course they suggest not to use C++ arrays on new projects anymore. As far as I know Stroustroup himself suggests not to use arrays. But are there significant perf

相关标签:
19条回答
  • 2020-11-22 04:14

    There is definitely a performance impact to using an std::vector vs a raw array when you want an uninitialized buffer (e.g. to use as destination for memcpy()). An std::vector will initialize all its elements using the default constructor. A raw array will not.

    The c++ spec for the std:vector constructor taking a count argument (it's the third form) states:

    `Constructs a new container from a variety of data sources, optionally using a user supplied allocator alloc.

    1. Constructs the container with count default-inserted instances of T. No copies are made.

    Complexity

    2-3) Linear in count

    A raw array does not incur this initialization cost.

    Note that with a custom allocator, it is possible to avoid "initialization" of the vector's elements (i.e. to use default initialization instead of value initialization). See these questions for more details:

    • Is this behavior of vector::resize(size_type n) under C++11 and Boost.Container correct?
    • How can I avoid std::vector<> to initialize all its elements?
    0 讨论(0)
  • Using C++ arrays with new (that is, using dynamic arrays) should be avoided. There is the problem you have to keep track of the size, and you need to delete them manually and do all sort of housekeeping.

    Using arrays on the stack is also discouraged because you don't have range checking, and passing the array around will lose any information about its size (array to pointer conversion). You should use boost::array in that case, which wraps a C++ array in a small class and provides a size function and iterators to iterate over it.

    Now the std::vector vs. native C++ arrays (taken from the internet):

    // Comparison of assembly code generated for basic indexing, dereferencing, 
    // and increment operations on vectors and arrays/pointers.
    
    // Assembly code was generated by gcc 4.1.0 invoked with  g++ -O3 -S  on a 
    // x86_64-suse-linux machine.
    
    #include <vector>
    
    struct S
    {
      int padding;
    
      std::vector<int> v;
      int * p;
      std::vector<int>::iterator i;
    };
    
    int pointer_index (S & s) { return s.p[3]; }
      // movq    32(%rdi), %rax
      // movl    12(%rax), %eax
      // ret
    
    int vector_index (S & s) { return s.v[3]; }
      // movq    8(%rdi), %rax
      // movl    12(%rax), %eax
      // ret
    
    // Conclusion: Indexing a vector is the same damn thing as indexing a pointer.
    
    int pointer_deref (S & s) { return *s.p; }
      // movq    32(%rdi), %rax
      // movl    (%rax), %eax
      // ret
    
    int iterator_deref (S & s) { return *s.i; }
      // movq    40(%rdi), %rax
      // movl    (%rax), %eax
      // ret
    
    // Conclusion: Dereferencing a vector iterator is the same damn thing 
    // as dereferencing a pointer.
    
    void pointer_increment (S & s) { ++s.p; }
      // addq    $4, 32(%rdi)
      // ret
    
    void iterator_increment (S & s) { ++s.i; }
      // addq    $4, 40(%rdi)
      // ret
    
    // Conclusion: Incrementing a vector iterator is the same damn thing as 
    // incrementing a pointer.
    

    Note: If you allocate arrays with new and allocate non-class objects (like plain int) or classes without a user defined constructor and you don't want to have your elements initialized initially, using new-allocated arrays can have performance advantages because std::vector initializes all elements to default values (0 for int, for example) on construction (credits to @bernie for reminding me).

    0 讨论(0)
  • 2020-11-22 04:16

    Preamble for micro-optimizer people

    Remember:

    "Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%".

    (Thanks to metamorphosis for the full quote)

    Don't use a C array instead of a vector (or whatever) just because you believe it's faster as it is supposed to be lower-level. You would be wrong.

    Use by default vector (or the safe container adapted to your need), and then if your profiler says it is a problem, see if you can optimize it, either by using a better algorithm, or changing container.

    This said, we can go back to the original question.

    Static/Dynamic Array?

    The C++ array classes are better behaved than the low-level C array because they know a lot about themselves, and can answer questions C arrays can't. They are able to clean after themselves. And more importantly, they are usually written using templates and/or inlining, which means that what appears to a lot of code in debug resolves to little or no code produced in release build, meaning no difference with their built-in less safe competition.

    All in all, it falls on two categories:

    Dynamic arrays

    Using a pointer to a malloc-ed/new-ed array will be at best as fast as the std::vector version, and a lot less safe (see litb's post).

    So use a std::vector.

    Static arrays

    Using a static array will be at best:

    • as fast as the std::array version
    • and a lot less safe.

    So use a std::array.

    Uninitialized memory

    Sometimes, using a vector instead of a raw buffer incurs a visible cost because the vector will initialize the buffer at construction, while the code it replaces didn't, as remarked bernie by in his answer.

    If this is the case, then you can handle it by using a unique_ptr instead of a vector or, if the case is not exceptional in your codeline, actually write a class buffer_owner that will own that memory, and give you easy and safe access to it, including bonuses like resizing it (using realloc?), or whatever you need.

    0 讨论(0)
  • 2020-11-22 04:21

    To respond to something Mehrdad said:

    However, there might be cases where you still need arrays. When interfacing with low level code (i.e. assembly) or old libraries that require arrays, you might not be able to use vectors.

    Not true at all. Vectors degrade nicely into arrays/pointers if you use:

    vector<double> vector;
    vector.push_back(42);
    
    double *array = &(*vector.begin());
    
    // pass the array to whatever low-level code you have
    

    This works for all major STL implementations. In the next standard, it will be required to work (even though it does just fine today).

    0 讨论(0)
  • 2020-11-22 04:22

    You have even fewer reasons to use plain arrays in C++11.

    There are 3 kind of arrays in nature from fastest to slowest, depending on the features they have (of course the quality of implementation can make things really fast even for case 3 in the list):

    1. Static with size known at compile time. --- std::array<T, N>
    2. Dynamic with size known at runtime and never resized. The typical optimization here is, that if the array can be allocated in the stack directly. -- Not available. Maybe dynarray in C++ TS after C++14. In C there are VLAs
    3. Dynamic and resizable at runtime. --- std::vector<T>

    For 1. plain static arrays with fixed number of elements, use std::array<T, N> in C++11.

    For 2. fixed size arrays specified at runtime, but that won't change their size, there is discussion in C++14 but it has been moved to a technical specification and made out of C++14 finally.

    For 3. std::vector<T> will usually ask for memory in the heap. This could have performance consequences, though you could use std::vector<T, MyAlloc<T>> to improve the situation with a custom allocator. The advantage compared to T mytype[] = new MyType[n]; is that you can resize it and that it will not decay to a pointer, as plain arrays do.

    Use the standard library types mentioned to avoid arrays decaying to pointers. You will save debugging time and the performance is exactly the same as with plain arrays if you use the same set of features.

    0 讨论(0)
  • 2020-11-22 04:22

    The following simple test:

    C++ Array vs Vector performance test explanation

    contradicts the conclusions from "Comparison of assembly code generated for basic indexing, dereferencing, and increment operations on vectors and arrays/pointers."

    There must be a difference between the arrays and vectors. The test says so... just try it, the code is there...

    0 讨论(0)
提交回复
热议问题