Legality of COW std::string implementation in C++11

后端 未结 7 1380
天涯浪人
天涯浪人 2020-11-22 09:30

It had been my understanding that copy-on-write is not a viable way to implement a conforming std::string in C++11, but when it came up in discussion recently I

相关标签:
7条回答
  • 2020-11-22 09:43

    It is, CoW is an acceptable mechanism for making faster strings... but...

    it makes multithreading code slower (all that locking to check if you're the only one writing kills performance when using a lot of strings). This was the main reason CoW was killed off years ago.

    The other reasons are that the [] operator will return you the string data, without any protection for you to overwrite a string someone else expects to be unchanging. The same applies to c_str() and data().

    Quick google says that the multithreading is basically the reason it was effectively disallowed (not explicitly).

    The proposal says :

    Proposal

    We propose to make all iterator and element access operations safely concurrently executable.

    We are increasing the stability of operations even in sequential code.

    This change effectively disallows copy-on-write implementations.

    followed by

    The largest potential loss in performance due to a switch away from copy-on-write implementations is the increased consumption of memory for applications with very large read-mostly strings. However, we believe that for those applications ropes are a better technical solution, and recommend a rope proposal be considered for inclusion in Library TR2.

    Ropes are part of STLPort and SGIs STL.

    0 讨论(0)
  • 2020-11-22 09:47

    It's not allowed, because as per the standard 21.4.1 p6, invalidation of iterators/references is only allowed for

    — as an argument to any standard library function taking a reference to non-const basic_string as an argument.

    — Calling non-const member functions, except operator[], at, front, back, begin, rbegin, end, and rend.

    For a COW string, calling non-const operator[] would require making a copy (and invalidating references), which is disallowed by the paragraph above. Hence, it's no longer legal to have a COW string in C++11.

    0 讨论(0)
  • 2020-11-22 09:50

    The answers by Dave S and gbjbaanb are correct. (And Luc Danton's is correct too, although it's more a side-effect of forbidding COW strings rather than the original rule that forbids it.)

    But to clear up some confusion, I'm going to add some further exposition. Various comments link to a comment of mine on the GCC bugzilla which gives the following example:

    std::string s("str");
    const char* p = s.data();
    {
        std::string s2(s);
        (void) s[0];
    }
    std::cout << *p << '\n';  // p is dangling
    

    The point of that example is to demonstrate why GCC's reference counted (COW) string is not valid in C++11. The C++11 standard requires this code to work correctly. Nothing in the code permits the p to be invalidated in C++11.

    Using GCC's old reference-counted std::string implementation, that code has undefined behaviour, because p is invalidated, becoming a dangling pointer. (What happens is that when s2 is constructed it shares the data with s, but obtaining a non-const reference via s[0] requires the data to be unshared, so s does a "copy on write" because the reference s[0] could potentially be used to write into s, then s2 goes out of scope, destroying the array pointed to by p).

    The C++03 standard explicitly permits that behaviour in 21.3 [lib.basic.string] p5 where it says that subsequent to a call to data() the first call to operator[]() may invalidate pointers, references and iterators. So GCC's COW string was a valid C++03 implementation.

    The C++11 standard no longer permits that behaviour, because no call to operator[]() may invalidate pointers, references or iterators, irrespective of whether they follow a call to data().

    So the example above must work in C++11, but does not work with libstdc++'s kind of COW string, therefore that kind of COW string is not permitted in C++11.

    0 讨论(0)
  • 2020-11-22 09:50

    Since it is now guaranteed that strings are stored contiguously and you are now allowed to take a pointer to the internal storage of a string, (i.e. &str[0] works like it would for an array), it's not possible to make a useful COW implementation. You would have to make a copy for way too many things. Even just using operator[] or begin() on a non-const string would require a copy.

    0 讨论(0)
  • 2020-11-22 09:59

    I was always wondering about immutable cows: once cow is created I could be changed only through assignment from another cow, hence it will be compliant with the standard.

    I had time to try it today for a simple comparison test: a map of size N keyed by string/cow with every node holding a set of all strings in the map (we have NxN number of objects).

    With strings sized ~300 bytes and N=2000 cows are slightly faster and use almost order of magnitude less memory. See below, sizes are in kbs, run b is with cows.

    ~/icow$ ./tst 2000
    preparation a
    run
    done a: time-delta=6 mem-delta=1563276
    preparation b
    run
    done a: time-delta=3 mem-delta=186384
    
    0 讨论(0)
  • 2020-11-22 10:05

    From 21.4.2 basic_string constructors and assignment operators [string.cons]

    basic_string(const basic_string<charT,traits,Allocator>& str);

    [...]

    2 Effects: Constructs an object of class basic_string as indicated in Table 64. [...]

    Table 64 helpfully documents that after construction of an object via this (copy) constructor, this->data() has as value:

    points at the first element of an allocated copy of the array whose first element is pointed at by str.data()

    There are similar requirements for other similar constructors.

    0 讨论(0)
提交回复
热议问题