(Disclaimer: I don't know what the C++ standard might say about this..I know, I'm horrible)
while operating on very large strings I noticed that std::string is using copy-on-write. I managed to write the smallest loop that would reproduce the observed behaviour and the following one, for instance, runs suspiciously fast:
#include <string>
using std::string;
int main(void) {
string basestr(1024 * 1024 * 10, 'A');
for (int i = 0; i < 100; i++) {
string a_copy = basestr;
}
}
when adding a write in the loop body a_copy[1] = 'B';
, an actual copy apparently took place, and the program ran in 0.3s instead of a few milliseconds. 100 writes slowed it down by about 100 times.
But then it got weird. Some of my strings weren't written to, only read from, and this was not reflected in the execution time, which was almost exactly proportional to the number of operations on the strings. With some digging, I found that simply reading from a string still gave me that performance hit, so it led me to assume GNU STL strings are using copy-on-read (?).
#include <string>
using std::string;
int main(void) {
string basestr(1024 * 1024 * 10, 'A');
for (int i = 0; i < 100; i++) {
string a_copy = basestr;
a_copy[99]; // this also ran in 0.3s!
}
}
After revelling in my discovery for a while, I found out that reading (with operator[]) from the base string also takes 0.3s for the entire toy program..I'm not 100% comfortable with this. Are STL strings indeed copy-on-read, or are they allowing copy-on-write at all? I'm led to think that operator[] has some safeguards against one who would keep the reference it returns and later write to it; is this really the case? If not, what is really happening? If someone can point to some relevant section in the C++ standard, that'd also be appreciated.
For reference, I'm using g++ (Ubuntu 4.4.3-4ubuntu5) 4.4.3
, and the GNU STL.
C++ doesn't distinguish between the operator[]
for reading and writing, but only the operator[]
for const object and mutable (non-const) object. Since a_copy
is mutable, the mutable operator[]
will be chosen, which forces the copying because that operator returns a (mutable) reference.
If efficiency is a concern, you could cast the a_copy
to a const string
to force the const
version of operator[]
to be used, which won't make a copy of the internal buffer.
char f = static_cast<const string>(a_copy)[99];
The C++ standard doesn't prohibit or mandate copy-on-write or any other implementation details for std::string
. So long as the semantics and complexity requirements are met an implementation may choose whatever implementation strategy it likes.
Note that operator[]
on a non-const
string is effectively a "write" operation as it returns a reference that can be used to modify the string at any point up to the next operation that mutates the the string. No copies should be affected by such a modification.
Have you tried profiling one of these two?
const string a_copy = basestr;
a_copy[99];
Or
string a_copy = basestr;
const std::string& a_copy_ref = a_copy;
a_copy_ref[99];
Try this code:
#include <iostream>
#include <iomanip>
#include <string>
using namespace std;
template<typename T>
void dump(std::ostream & ostr, const T & val)
{
const unsigned char * cp = reinterpret_cast<const unsigned char *>(&val);
for(int i=0; i<sizeof(T); i++)
ostr
<< setw(2) << setfill('0') << hex << (int)cp[i] << ' ';
ostr << endl;
}
int main(void) {
string a = "hello world";
string b = a;
dump(cout,a);
dump(cout,b);
char c = b[0];
dump(cout,a);
dump(cout,b);
}
On GCC, this is the output I get:
3c 10 51 00
3c 10 51 00
3c 10 51 00
5c 10 51 00
Which would seem to indicate that yes, they are copy on read in this case.
来源:https://stackoverflow.com/questions/4067395/gnu-stl-string-is-copy-on-write-involved-here