Both libstdc++ and libc++ makes moved-from std::string
object empty, even if the original stored string is short and short string optimization is applied. It se
I know I thought about whether to zero the moved-from string when I was implementing the libstdc++ version, but I don't remember my reasons for deciding to zero it out. I think I probably decided that leaving the moved-from string empty would be following the principle of least astonishment. The most "obvious" state for a moved-from string is to be empty, even if sometimes being non-empty would perform slightly better.
As suggested in the comments, it avoids breaking any code that (maybe unintentionally) relied on the string being empty. I don't think that was one of my considerations though. C++11 code that relies on the COW string semantics will be broken by more than just moved-from strings being non-empty.
Worth noting is that at -O2
the current libstdc++ code compiles to fewer instructions compared to your suggested alternative. However something like this compiles even smaller, and is probably faster (I didn't measure it though, or even test it works):
basic_string(basic_string&& __str) noexcept
: _M_dataplus(_M_local_data(), std::move(__str._M_get_allocator()))
{
memcpy(_M_local_buf, __str._M_local_buf, sizeof(_M_local_buf));
_M_length(__str.length());
if (!__str._M_is_local())
{
_M_data(__str._M_data());
__str._M_data(__str._M_local_data());
__str._M_set_length(0);
}
}
In the case of libc++, the string move constructor does empty the source, but it is not unnecessary. Indeed, the author of this string implementation was the same person that led the move semantics proposal for C++11. ;-)
This implementation of the libc++ string was actually designed from the move members outwards!
Here is the code with some unnecessary details (like debug mode) code left out:
template <class _CharT, class _Traits, class _Allocator>
basic_string<_CharT, _Traits, _Allocator>::basic_string(basic_string&& __str)
_NOEXCEPT
: __r_(_VSTD::move(__str.__r_))
{
__str.__zero();
}
In a nutshell, this code copies all of the bytes of the source, and then zeros all of the bytes of the source. One thing to immediately note: There is no branching: this code does the same thing for long and short strings.
Long string mode
In "long mode", the layout is 3 words, a data pointer and two integral types to store size and capacity, minus 1 bit for the long/short flag. Plus an space for an allocator (optimized away for empty allocators).
So this copies the pointer/sizes, and then nulls out the source to release ownership of the pointer. This also sets the source to "short mode" as the short/long bit means short in the zero state. Also all zero bits in the short mode represent a zero-size, non-zero capacity short string.
Short string mode
When the source is a short string, the code is identical: The bytes are copied over, and the source bytes are zeroed out. In short mode there are no self-referencing pointers, and so copying bytes is the correct algorithm.
Now it is true that in "short mode", the zeroing of the 3 words of the source might seem unnecessary, but to do that one would have to check the long/short bit and zero bytes when in long mode. Doing this check-and-branch would actually be more expensive than just zeroing the 3 words because of the occasional branch mis-prediction (breaking the pipeline).
Here is the optimized x86 (64bit) assembly for the libc++ string
move constructor.
std::string
test(std::string& s)
{
return std::move(s);
}
__Z4testRNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEE: ## @_Z4testRNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEE
.cfi_startproc
## %bb.0:
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
movq 16(%rsi), %rax
movq %rax, 16(%rdi)
movq (%rsi), %rax
movq 8(%rsi), %rcx
movq %rcx, 8(%rdi)
movq %rax, (%rdi)
movq $0, 16(%rsi)
movq $0, 8(%rsi)
movq $0, (%rsi)
movq %rdi, %rax
popq %rbp
retq
.cfi_endproc
(no branches!)
<aside>
The size of the internal buffer for the short string is also optimized for the move members. The internal buffer is "union'ed" with the 3 words required for "long mode", so that the sizeof(string)
requires no more space than when in long mode. Despite this compact sizeof
(the smallest among the 3 major implementations), libc++ enjoys the largest internal buffer on 64 bit architectures: 22 char
.
The small sizeof
translates into faster move members since all these members do is copy and zero bytes of the object layout.
See this Stackoverflow answer for more details on the internal buffer size.
</aside>
Summary
So in summary, the setting of the source to an empty string is necessary in "long mode" to transfer ownership of the pointer, and also necessary in short mode for performance reasons to avoid a broken pipeline.
I have no comment on the libstdc++ implementation as I did not author that code and your question already does a good job of that anyway. :-)