Why the libc++ std::vector internally keeps three pointers instead of one pointer and two sizes?

和自甴很熟 提交于 2019-11-28 10:49:51
Mehrdad

It's because the rationale is that performance should be optimized for iterators, not indices.
(In other words, performance should be optimized for begin()/end(), not size()/operator[].)
Why? Because iterators are generalized pointers, and thus C++ encourages their use, and in return ensures that their performance matches those of raw pointers when the two are equivalent.

To see why it's a performance issue, notice that the typical for loop is as follows:

for (It i = items.begin(); i != items.end(); ++i)
    ...

Except in the most trivial cases, if we kept track of sizes instead of pointers, what would happen is that the comparison i != items.end() would turn into i != items.begin() + items.size(), taking more instructions than you'd expect. (The optimizer generally has a hard time factoring out the code in many cases.) This slows things down dramatically in a tight loop, and hence this design is avoided.

(I've verified this is a performance problem when trying to write my own replacement for std::vector.)


Edit: As Yakk pointed out in the comments, using indices instead of pointers can also result in the generation of a multiplication instruction when the element sizes aren't powers of 2, which is pretty expensive and noticeable in a tight loop. I didn't think of this when writing this answer, but it's a phenomenon that's bitten me before (e.g. see here)... bottom line is, in a tight loop everything matters.

It's more convenient for implementers.

Storing size makes exactly one operation easier to implement: size()

size_t size() { return size_; }

on the other hand, it makes other harder to write and makes reusing code harder:

iterator end() { return iterator(end_); } // range version
iterator end() { return iterator(begin_ + size_); } // pointer + size version

void push_back(const T& v) // range version
{
    // assume only the case where there is enough capacity
    ::new(static_cast<void*>(end_)) T(v);
    ++end_;
}

void push_back(const T& v) // pointer + size version
{
    // assume only the case where there is enough capacity
    ::new(static_cast<void*>(begin_ + size_)) T(v);
    // it could use some internal `get_end` function, but the point stil stands:
    // we need to get to the end
    ++size_;
}

If we have to find the end anyway, we could store it directly - it's more useful than size anyway.

I would imagine it's primarily a speed thing. When iterating over the set, the generated instructions for bounds checking would simply be a compare statement with the end pointer (and maybe a load), rather than a load, an add, and a compare (and maybe another load, too).

When generating the iterators for end() and begin(), the code would also just be return pointer;, rather than return pointer + offset; for end().

These are very minor optimizations, but the standard template library is intended to be used in production code where every cycle counts.

PS: In regards to the different compilers implementing it the same way: There is a reference implementation that most (all?) of the compiler vendors base their STL implementations on. It is likely that this particular design decision is a part of the reference implementation, and is why all the implementations you looked at handle vectors this way.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!