I am curious why the following piece of code:
#include
int main()
{
std::string a = \"ABCDEFGHIJKLMNO\";
}
when compiled
This is due to the small string optimization. When the string data is less than or equal 16 characters, including the null terminator, it is stored in a buffer local to the std::string
object itself. Otherwise, it allocates memory on the heap and stores the data over there.
The first string "ABCDEFGHIJKLMNO"
plus the null terminator is exactly of size 16. Adding "P"
makes it exceed the buffer, hence new
is being called internally, inevitably leading to a system call. The compiler can optimize something away if it's possible to ensure that there are no side effects. A system call probably makes it impossible to do this - by constrast, changing a buffer local to the object under construction allows for such a side effect analysis.
Tracing the local buffer in libstdc++, version 9.1, reveals these parts of bits/basic_string.h
:
template
class basic_string { // ... enum { _S_local_capacity = 15 / sizeof(_CharT) }; union { _CharT _M_local_buf[_S_local_capacity + 1]; size_type _M_allocated_capacity; }; // ... };
which lets you spot the local buffer size _S_local_capacity
and the local buffer itself (_M_local_buf
). When the constructor triggers basic_string::_M_construct
being called, you have in bits/basic_string.tcc
:
void _M_construct(_InIterator __beg, _InIterator __end, ...) { size_type __len = 0; size_type __capacity = size_type(_S_local_capacity); while (__beg != __end && __len < __capacity) { _M_data()[__len++] = *__beg; ++__beg; }
where the local buffer is filled with its content. Right after this part, we get to the branch where the local capacity is exhausted - new storage is allocated (through the allocate in M_create
), the local buffer is copied into the new storage and filled with the rest of the initializing argument:
while (__beg != __end) { if (__len == __capacity) { // Allocate more space. __capacity = __len + 1; pointer __another = _M_create(__capacity, __len); this->_S_copy(__another, _M_data(), __len); _M_dispose(); _M_data(__another); _M_capacity(__capacity); } _M_data()[__len++] = *__beg; ++__beg; }
As a side note, small string optimization is quite a topic on its own. To get a feeling for how tweaking individual bits can make a difference at large scale, I'd recommend this talk. It also mentions how the std::string
implementation that ships with gcc
(libstdc++) works and changed during the past to match newer versions of the standard.