Why is std::vector slower than an array? [duplicate]

问题

This question already has answers here:

Performance: memset (2 answers)

Why might std::vector be faster than a raw dynamically allocated array? (2 answers)

Why is iterating though `std::vector` faster than iterating though `std::array`? (2 answers)

Idiomatic way of performance evaluation? (1 answer)

Closed last month.

When I run the following program (with optimization on), the for loop with the std::vector takes about 0.04 seconds while the for loop with the array takes 0.0001 seconds.

#include <iostream>
#include <vector>
#include <chrono>

int main()
{
    int len = 800000;
    int* Data = new int[len];

    int arr[3] = { 255, 0, 0 };
    std::vector<int> vec = { 255, 0, 0 };

    auto start = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < len; i++) {
        Data[i] = vec[0];
    }
    auto finish = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> elapsed = finish - start;
    std::cout << "The vector took " << elapsed.count() << "seconds\n";

    start = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < len; i++) {
        Data[i] = arr[0];
    }
    finish = std::chrono::high_resolution_clock::now();
    elapsed = finish - start;
    std::cout << "The array took " << elapsed.count() << "seconds \n";

    char s;
    std::cin >> s;
    delete[] Data;
}

The code is a simplified version of a performance issue I was having while writing a raycaster. The len variable corresponds to how many times the loop in the original program needs to run (400 pixels * 400 pixels * 50 maximum render distance). For complicated reasons (perhaps that I don't fully understand how to use arrays) I have to use a vector rather than an array in the actual raycaster. However, as this program demonstrates, that would only give me 20 frames per second as opposed to the envied 10,000 frames per second that using an array would supposedly give me (obviously, this is just a simplified performance test). But regardless of how accurate those numbers are, I still want to boost my frame rate as much as possible. So, why is the vector performing so much slower than the array? Is there a way to speed it up? Thanks for your help. And if there's anything else I'm doing weirdly that might be affecting performance, please let me know. I didn't even know about optimization until researching an answer for this question, so if there are any more things like that which might boost the performance, please let me know (and I'd prefer if you explained where those settings are in the properties manager rather than command line since I don't yet know how to use the command line)

回答1:

Let us observe how GCC optimizes this test program:

#include <vector>

int main()
{
    int len = 800000;
    int* Data = new int[len];

    int arr[3] = { 255, 0, 0 };
    std::vector<int> vec = { 255, 0, 0 };

    for (int i = 0; i < len; i++) {
        Data[i] = vec[0];
    }
    for (int i = 0; i < len; i++) {
        Data[i] = arr[0];
    }
    delete[] Data;
}

The compiler rightly notices that the vector is constant, and eliminates it. Exactly same code is generated for both loops. Therefore it should be irrelevant whether the first loop uses array or vector.

.L2:
    movups  XMMWORD PTR [rcx], xmm0
    add     rcx, 16
    cmp     rsi, rcx
    jne     .L2

What makes difference in your test program is the order of loops. The comments point out that when a third loop is added to the beginning, both loops take the same time.

I would expect that with a modern compiler accessing a vector would be approximately as fast as accessing an array, when optimization is enabled and debug is disabled. If there is an observable difference in your actual program, the problem lies somewhere else.

回答2:

It is about caches. I dont know how it works detailed but Data[] is getting known better by cpu while it is used. If you reverse the order of calculation you can see 'vector is faster'.

But actually, you are testing neither vector nor array. Let's assume that vec[0] resides at 0x01 memory location, arr[0] resides at 0xf1. Only difference is reading a word from different single memory adresses. So you are testing how fast can I assign a value to elements of dynamically allocated array.

Note: std::chrono::high_resolution_clock might not be sufficient to measure ticks. It is better to use steady_clock as cppreference says.

来源：https://stackoverflow.com/questions/60293633/why-is-stdvector-slower-than-an-array

标签

c++

arrays

vector

benchmarking