I\'ve always thought it\'s the general wisdom that std::vector
is \"implemented as an array,\" blah blah blah. Today I went down and tested it, and it seems to
The vector ones are additionally calling Pixel constructors.
Each is causing almost a million ctor runs that you're timing.
edit: then there's the outer 1...1000 loop, so make that a billion ctor calls!
edit 2: it'd be interesting to see the disassembly for the UseArray case. An optimizer could optimize the whole thing away, since it has no effect other than burning CPU.
I just want to mention that vector (and smart_ptr) is just a thin layer add on top of raw arrays (and raw pointers). And actually the access time of an vector in continuous memory is faster than array. The following code shows the result of initialize and access vector and array.
#include <boost/date_time/posix_time/posix_time.hpp>
#include <iostream>
#include <vector>
#define SIZE 20000
int main() {
srand (time(NULL));
vector<vector<int>> vector2d;
vector2d.reserve(SIZE);
int index(0);
boost::posix_time::ptime start_total = boost::posix_time::microsec_clock::local_time();
// timer start - build + access
for (int i = 0; i < SIZE; i++) {
vector2d.push_back(vector<int>(SIZE));
}
boost::posix_time::ptime start_access = boost::posix_time::microsec_clock::local_time();
// timer start - access
for (int i = 0; i < SIZE; i++) {
index = rand()%SIZE;
for (int j = 0; j < SIZE; j++) {
vector2d[index][index]++;
}
}
boost::posix_time::ptime end = boost::posix_time::microsec_clock::local_time();
boost::posix_time::time_duration msdiff = end - start_total;
cout << "Vector total time: " << msdiff.total_milliseconds() << "milliseconds.\n";
msdiff = end - start_acess;
cout << "Vector access time: " << msdiff.total_milliseconds() << "milliseconds.\n";
int index(0);
int** raw2d = nullptr;
raw2d = new int*[SIZE];
start_total = boost::posix_time::microsec_clock::local_time();
// timer start - build + access
for (int i = 0; i < SIZE; i++) {
raw2d[i] = new int[SIZE];
}
start_access = boost::posix_time::microsec_clock::local_time();
// timer start - access
for (int i = 0; i < SIZE; i++) {
index = rand()%SIZE;
for (int j = 0; j < SIZE; j++) {
raw2d[index][index]++;
}
}
end = boost::posix_time::microsec_clock::local_time();
msdiff = end - start_total;
cout << "Array total time: " << msdiff.total_milliseconds() << "milliseconds.\n";
msdiff = end - start_acess;
cout << "Array access time: " << msdiff.total_milliseconds() << "milliseconds.\n";
for (int i = 0; i < SIZE; i++) {
delete [] raw2d[i];
}
return 0;
}
The output is:
Vector total time: 925milliseconds.
Vector access time: 4milliseconds.
Array total time: 30milliseconds.
Array access time: 21milliseconds.
So the speed will be almost the same if you use it properly. (as others mentioned using reserve() or resize()).
Using the following:
g++ -O3 Time.cpp -I <MyBoost>
./a.out
UseArray completed in 2.196 seconds
UseVector completed in 4.412 seconds
UseVectorPushBack completed in 8.017 seconds
The whole thing completed in 14.626 seconds
So array is twice as quick as vector.
But after looking at the code in more detail this is expected; as you run across the vector twice and the array only once. Note: when you resize()
the vector you are not only allocating the memory but also running through the vector and calling the constructor on each member.
Re-Arranging the code slightly so that the vector only initializes each object once:
std::vector<Pixel> pixels(dimensions * dimensions, Pixel(255,0,0));
Now doing the same timing again:
g++ -O3 Time.cpp -I <MyBoost>
./a.out
UseVector completed in 2.216 seconds
The vector now performance only slightly worse than the array. IMO this difference is insignificant and could be caused by a whole bunch of things not associated with the test.
I would also take into account that you are not correctly initializing/Destroying the Pixel object in the UseArrray()
method as neither constructor/destructor is not called (this may not be an issue for this simple class but anything slightly more complex (ie with pointers or members with pointers) will cause problems.
GNU's STL (and others), given vector<T>(n)
, default constructs a prototypal object T()
- the compiler will optimise away the empty constructor - but then a copy of whatever garbage happened to be in the memory addresses now reserved for the object is taken by the STL's __uninitialized_fill_n_aux
, which loops populating copies of that object as the default values in the vector. So, "my" STL is not looping constructing, but constructing then loop/copying. It's counter intuitive, but I should have remembered as I commented on a recent stackoverflow question about this very point: the construct/copy can be more efficient for reference counted objects etc..
So:
vector<T> x(n);
or
vector<T> x;
x.resize(n);
is - on many STL implementations - something like:
T temp;
for (int i = 0; i < n; ++i)
x[i] = temp;
The issue being that the current generation of compiler optimisers don't seem to work from the insight that temp is uninitialised garbage, and fail to optimise out the loop and default copy constructor invocations. You could credibly argue that compilers absolutely shouldn't optimise this away, as a programmer writing the above has a reasonable expectation that all the objects will be identical after the loop, even if garbage (usual caveats about 'identical'/operator== vs memcmp/operator= etc apply). The compiler can't be expected to have any extra insight into the larger context of std::vector<> or the later usage of the data that would suggest this optimisation safe.
This can be contrasted with the more obvious, direct implementation:
for (int i = 0; i < n; ++i)
x[i] = T();
Which we can expect a compiler to optimise out.
To be a bit more explicit about the justification for this aspect of vector's behaviour, consider:
std::vector<big_reference_counted_object> x(10000);
Clearly it's a major difference if we make 10000 independent objects versus 10000 referencing the same data. There's a reasonable argument that the advantage of protecting casual C++ users from accidentally doing something so expensive outweights the very small real-world cost of hard-to-optimise copy construction.
ORIGINAL ANSWER (for reference / making sense of the comments): No chance. vector is as fast as an array, at least if you reserve space sensibly. ...
Try disabling checked iterators and building in release mode. You shouldn't see much of a performance difference.
Great question. I came in here expecting to find some simple fix that would speed the vector tests right up. That didn't work out quite like I expected!
Optimization helps, but it's not enough. With optimization on I'm still seeing a 2X performance difference between UseArray and UseVector. Interestingly, UseVector was significantly slower than UseVectorPushBack without optimization.
# g++ -Wall -Wextra -pedantic -o vector vector.cpp
# ./vector
UseArray completed in 20.68 seconds
UseVector completed in 120.509 seconds
UseVectorPushBack completed in 37.654 seconds
The whole thing completed in 178.845 seconds
# g++ -Wall -Wextra -pedantic -O3 -o vector vector.cpp
# ./vector
UseArray completed in 3.09 seconds
UseVector completed in 6.09 seconds
UseVectorPushBack completed in 9.847 seconds
The whole thing completed in 19.028 seconds
I tried changing malloc()
to new[]
in UseArray so the objects would get constructed. And changing from individual field assignment to assigning a Pixel instance. Oh, and renaming the inner loop variable to j
.
void UseArray()
{
TestTimer t("UseArray");
for(int i = 0; i < 1000; ++i)
{
int dimension = 999;
// Same speed as malloc().
Pixel * pixels = new Pixel[dimension * dimension];
for(int j = 0 ; j < dimension * dimension; ++j)
pixels[j] = Pixel(255, 0, 0);
delete[] pixels;
}
}
Surprisingly (to me), none of those changes made any difference whatsoever. Not even the change to new[]
which will default construct all of the Pixels. It seems that gcc can optimize out the default constructor calls when using new[]
, but not when using vector
.
I also attempted to get rid of the triple operator[]
lookup and cache the reference to pixels[j]
. That actually slowed UseVector down! Oops.
for(int j = 0; j < dimension * dimension; ++j)
{
// Slower than accessing pixels[j] three times.
Pixel &pixel = pixels[j];
pixel.r = 255;
pixel.g = 0;
pixel.b = 0;
}
# ./vector
UseArray completed in 3.226 seconds
UseVector completed in 7.54 seconds
UseVectorPushBack completed in 9.859 seconds
The whole thing completed in 20.626 seconds
What about removing the constructors entirely? Then perhaps gcc can optimize out the construction of all of the objects when the vectors are created. What happens if we change Pixel to:
struct Pixel
{
unsigned char r, g, b;
};
Result: about 10% faster. Still slower than an array. Hm.
# ./vector
UseArray completed in 3.239 seconds
UseVector completed in 5.567 seconds
How about using a vector<Pixel>::iterator
instead of a loop index?
for (std::vector<Pixel>::iterator j = pixels.begin(); j != pixels.end(); ++j)
{
j->r = 255;
j->g = 0;
j->b = 0;
}
Result:
# ./vector
UseArray completed in 3.264 seconds
UseVector completed in 5.443 seconds
Nope, no different. At least it's not slower. I thought this would have performance similar to #2 where I used a Pixel&
reference.
Even if some smart cookie figures out how to make the vector loop as fast as the array one, this does not speak well of the default behavior of std::vector
. So much for the compiler being smart enough to optimize out all the C++ness and make STL containers as fast as raw arrays.
The bottom line is that the compiler is unable to optimize away the no-op default constructor calls when using std::vector
. If you use plain new[]
it optimizes them away just fine. But not with std::vector
. Even if you can rewrite your code to eliminate the constructor calls that flies in face of the mantra around here: "The compiler is smarter than you. The STL is just as fast as plain C. Don't worry about it."