How efficient is std::string compared to null-terminated strings?

前端 未结 14 2033
心在旅途
心在旅途 2020-12-28 19:23

I\'ve discovered that std::strings are very slow compared to old-fashioned null-terminated strings, so much slow that they significantly slow down my overall pr

相关标签:
14条回答
  • 2020-12-28 20:21

    It looks like you're misusing char* in the code you pasted. If you have

    std::string a = "this is a";
    std::string b = "this is b"
    a = b;
    

    you're performing a string copy operation. If you do the same with char*, you're performing a pointer copy operation.

    The std::string assignment operation allocates enough memory to hold the contents of b in a, then copies each character one by one. In the case of char*, it does not do any memory allocation or copy the individual characters one by one, it just says "a now points to the same memory that b is pointing to."

    My guess is that this is why std::string is slower, because it's actually copying the string, which appears to be what you want. To do a copy operation on a char* you'd need to use the strcpy() function to copy into a buffer that's already appropriately sized. Then you'll have an accurate comparison. But for the purposes of your program you should almost definitely use std::string instead.

    0 讨论(0)
  • 2020-12-28 20:21

    This test is testing two fundamentally different things: a shallow copy vs. a deep copy. It's essential to understand the difference and how to avoid deep copies in C++ since a C++ object, by default, provides value semantics for its instances (as with the case with plain old data types) which means that assigning one to the other is generally going to copy.

    I "corrected" your test and got this:

    char* loop = 19.921
    string = 0.375
    slowdown = 0.0188244
    

    Apparently we should cease using C-style strings since they are soooo much slower! In actuality, I deliberately made my test as flawed as yours by testing shallow copying on the string side vs. strcpy on the :

    #include <string>
    #include <iostream>
    #include <ctime>
    
    using namespace std;
    
    #define LIMIT 100000000
    
    char* make_string(const char* src)
    {
        return strcpy((char*)malloc(strlen(src)+1), src);
    }
    
    int main(int argc, char* argv[])
    {
        clock_t start;
        string foo1 = "Hello there buddy";
        string foo2 = "Hello there buddy, yeah you too";
        start = clock();
        for (int i=0; i < LIMIT; i++)
            foo1.swap(foo2);
        double stl = double(clock() - start) / CLOCKS_PER_SEC;
    
        char* goo1 = make_string("Hello there buddy");
        char* goo2 = make_string("Hello there buddy, yeah you too");
        char *g;
        start = clock();
        for (int i=0; i < LIMIT; i++) {
            g = make_string(goo1);
            free(goo1);
            goo1 = make_string(goo2);
            free(goo2);
            goo2 = g;
        }
        double charLoop = double(clock() - start) / CLOCKS_PER_SEC;
        cout << "char* loop = " << charLoop << "\n";
        cout << "string = " << stl << "\n";
        cout << "slowdown = " << stl / charLoop << "\n";
        string wait;
        cin >> wait;
    }
    

    The main point is, and this actually gets to the heart of your ultimate question, you have to know what you are doing with the code. If you use a C++ object, you have to know that assigning one to the other is going to make a copy of that object (unless assignment is disabled, in which case you'll get an error). You also have to know when it's appropriate to use a reference, pointer, or smart pointer to an object, and with C++11, you should also understand the difference between move and copy semantics.

    My real question is: why don't people use reference counting implementations anymore, and does this mean we all need to be much more careful about avoiding common performance pitfalls of std::string?

    People do use reference-counting implementations. Here's an example of one:

    shared_ptr<string> ref_counted = make_shared<string>("test");
    shared_ptr<string> shallow_copy = ref_counted; // no deep copies, just 
                                                   // increase ref count
    

    The difference is that string doesn't do it internally as that would be inefficient for those who don't need it. Things like copy-on-write are generally not done for strings either anymore for similar reasons (plus the fact that it would generally make thread safety an issue). Yet we have all the building blocks right here to do copy-on-write if we wish to do so: we have the ability to swap strings without any deep copying, we have the ability to make pointers, references, or smart pointers to them.

    To use C++ effectively, you have to get used to this way of thinking involving value semantics. If you don't, you might enjoy the added safety and convenience but do it at heavy cost to the efficiency of your code (unnecessary copies are certainly a significant part of what makes poorly written C++ code slower than C). After all, your original test is still dealing with pointers to strings, not char[] arrays. If you were using character arrays and not pointers to them, you'd likewise need to strcpy to swap them. With strings you even have a built-in swap method to do exactly what you are doing in your test efficiently, so my advice is to spend a bit more time learning C++.

    0 讨论(0)
提交回复
热议问题