It had been my understanding that copy-on-write is not a viable way to implement a conforming std::string
in C++11, but when it came up in discussion recently I
It is, CoW is an acceptable mechanism for making faster strings... but...
it makes multithreading code slower (all that locking to check if you're the only one writing kills performance when using a lot of strings). This was the main reason CoW was killed off years ago.
The other reasons are that the []
operator will return you the string data, without any protection for you to overwrite a string someone else expects to be unchanging. The same applies to c_str()
and data()
.
Quick google says that the multithreading is basically the reason it was effectively disallowed (not explicitly).
The proposal says :
Proposal
We propose to make all iterator and element access operations safely concurrently executable.
We are increasing the stability of operations even in sequential code.
This change effectively disallows copy-on-write implementations.
followed by
The largest potential loss in performance due to a switch away from copy-on-write implementations is the increased consumption of memory for applications with very large read-mostly strings. However, we believe that for those applications ropes are a better technical solution, and recommend a rope proposal be considered for inclusion in Library TR2.
Ropes are part of STLPort and SGIs STL.
It's not allowed, because as per the standard 21.4.1 p6, invalidation of iterators/references is only allowed for
— as an argument to any standard library function taking a reference to non-const basic_string as an argument.
— Calling non-const member functions, except operator[], at, front, back, begin, rbegin, end, and rend.
For a COW string, calling non-const operator[]
would require making a copy (and invalidating references), which is disallowed by the paragraph above. Hence, it's no longer legal to have a COW string in C++11.
The answers by Dave S and gbjbaanb are correct. (And Luc Danton's is correct too, although it's more a side-effect of forbidding COW strings rather than the original rule that forbids it.)
But to clear up some confusion, I'm going to add some further exposition. Various comments link to a comment of mine on the GCC bugzilla which gives the following example:
std::string s("str");
const char* p = s.data();
{
std::string s2(s);
(void) s[0];
}
std::cout << *p << '\n'; // p is dangling
The point of that example is to demonstrate why GCC's reference counted (COW) string is not valid in C++11. The C++11 standard requires this code to work correctly. Nothing in the code permits the p
to be invalidated in C++11.
Using GCC's old reference-counted std::string
implementation, that code has undefined behaviour, because p
is invalidated, becoming a dangling pointer. (What happens is that when s2
is constructed it shares the data with s
, but obtaining a non-const reference via s[0]
requires the data to be unshared, so s
does a "copy on write" because the reference s[0]
could potentially be used to write into s
, then s2
goes out of scope, destroying the array pointed to by p
).
The C++03 standard explicitly permits that behaviour in 21.3 [lib.basic.string] p5 where it says that subsequent to a call to data()
the first call to operator[]()
may invalidate pointers, references and iterators. So GCC's COW string was a valid C++03 implementation.
The C++11 standard no longer permits that behaviour, because no call to operator[]()
may invalidate pointers, references or iterators, irrespective of whether they follow a call to data()
.
So the example above must work in C++11, but does not work with libstdc++'s kind of COW string, therefore that kind of COW string is not permitted in C++11.
Since it is now guaranteed that strings are stored contiguously and you are now allowed to take a pointer to the internal storage of a string, (i.e. &str[0] works like it would for an array), it's not possible to make a useful COW implementation. You would have to make a copy for way too many things. Even just using operator[]
or begin()
on a non-const string would require a copy.
I was always wondering about immutable cows: once cow is created I could be changed only through assignment from another cow, hence it will be compliant with the standard.
I had time to try it today for a simple comparison test: a map of size N keyed by string/cow with every node holding a set of all strings in the map (we have NxN number of objects).
With strings sized ~300 bytes and N=2000 cows are slightly faster and use almost order of magnitude less memory. See below, sizes are in kbs, run b is with cows.
~/icow$ ./tst 2000
preparation a
run
done a: time-delta=6 mem-delta=1563276
preparation b
run
done a: time-delta=3 mem-delta=186384
From 21.4.2 basic_string constructors and assignment operators [string.cons]
basic_string(const basic_string<charT,traits,Allocator>& str);
[...]
2 Effects: Constructs an object of class
basic_string
as indicated in Table 64. [...]
Table 64 helpfully documents that after construction of an object via this (copy) constructor, this->data()
has as value:
points at the first element of an allocated copy of the array whose first element is pointed at by str.data()
There are similar requirements for other similar constructors.