How efficient is std::string compared to null-terminated strings?

前端 未结 14 2039
心在旅途
心在旅途 2020-12-28 19:23

I\'ve discovered that std::strings are very slow compared to old-fashioned null-terminated strings, so much slow that they significantly slow down my overall pr

相关标签:
14条回答
  • 2020-12-28 20:10
                            string  const string&   char*   Java string
    ---------------------------------------------------------------------------------------------------
    Efficient               no **       yes         yes     yes
    assignment                          
    
    Thread-safe             yes         yes         yes     yes
    
    memory management       yes         no          no      yes
    done for you
    

    ** There are 2 implementations of std::string: reference counting or deep-copy. Reference counting introduces performance problems in multi-threaded programs, EVEN for just reading strings, and deep-copy is obviously slower as shown above. See: Why VC++ Strings are not reference counted?

    As this table shows, 'string' is better than 'char*' in some ways and worse in others, and 'const string&' is similar in properties to 'char*'. Personally I'm going to continue using 'char*' in many places. The enormous amount of copying of std::string's that happens silently, with implicit copy constructors and temporaries makes me somewhat ambivalent about std::string.

    0 讨论(0)
  • 2020-12-28 20:11

    They didn't go wrong. STL implementation is generally speaking better than yours.

    I'm sure that you can write something better for a very particular case, but a factor of 2 is too much... you really must be doing something wrong.

    0 讨论(0)
  • 2020-12-28 20:12

    Well there are definitely known problems regarding the performance of strings and other containers. Most of them have to do with temporaries and unnecessary copies.

    It's not too hard to use it right, but it's also quite easy to Do It Wrong. For example, if you see your code accepting strings by value where you don't need a modifiable parameter, you Do It Wrong:

    // you do it wrong
    void setMember(string a) {
        this->a = a; // better: swap(this->a, a);
    }
    

    You better had taken that by const reference or done a swap operation inside, instead of yet another copy. Performance penalty increases for a vector or list in that case. However, you are right definitely that there are known problems. For example in this:

    // let's add a Foo into the vector
    v.push_back(Foo(a, b));
    

    We are creating one temporary Foo just to add a new Foo into our vector. In a manual solution, that might create the Foo directly into the vector. And if the vector reaches its capacity limit, it has to reallocate a larger memory buffer for its elements. What does it do? It copies each element separately to their new place using their copy constructor. A manual solution might behave more intelligent if it knows the type of the elements before-hand.

    Another common problem is introduced temporaries. Have a look at this

    string a = b + c + e;
    

    There are loads of temporaries created, which you might avoid in a custom solution that you actually optimize onto performance. Back then, the interface of std::string was designed to be copy-on-write friendly. However, with threads becoming more popular, transparent copy on write strings have problems keeping their state consistent. Recent implementations tend to avoid copy on write strings and instead apply other tricks where appropriate.

    Most of those problems are solved however for the next version of the Standard. For example instead of push_back, you can use emplace_back to directly create a Foo into your vector

    v.emplace_back(a, b);
    

    And instead of creating copies in a concatenation above, std::string will recognize when it concatenates temporaries and optimize for those cases. Reallocation will also avoid making copies, but will move elements where appropriate to their new places.

    For an excellent read, consider Move Constructors by Andrei Alexandrescu.

    Sometimes, however, comparisons also tend to be unfair. Standard containers have to support the features they have to support. For example if your container does not keep map element references valid while adding/removing elements from your map, then comparing your "faster" map to the standard map can become unfair, because the standard map has to ensure that elements keep being valid. That was just an example, of course, and there are many such cases that you have to keep in mind when stating "my container is faster than standard ones!!!".

    0 讨论(0)
  • 2020-12-28 20:16

    The main rules of optimization:

    • Rule 1: Don't do it.
    • Rule 2: (For experts only) Don't do it yet.

    Are you sure that you have proven that it is really the STL that is slow, and not your algorithm?

    0 讨论(0)
  • 2020-12-28 20:17

    Good performance isn't always easy with STL, but generally, it is designed to give you the power. I found Scott Meyers' "Effective STL" an eye-opener for understanding how to deal with the STL efficiently. Read!

    As others said, you are probably running into frequent deep copies of the string, and compare that to a pointer assignment / reference counting implementation.

    Generally, any class designed towards your specific needs, will beat a generic class that's designed for the general case. But learn to use the generic class well, and learn to ride the 80:20 rules, and you will be much more efficient than someone rolling everything on their own.


    One specific drawback of std::string is that it doesn't give performance guarantees, which makes sense. As Tim Cooper mentioned, STL does not say whether a string assignment creates a deep copy. That's good for a generic class, because reference counting can become a real killer in highly concurrent applications, even though it's usually the best way for a single threaded app.

    0 讨论(0)
  • 2020-12-28 20:18

    I would say that STL implementations are better than the traditional implementations. Also did you try using a list instead of a vector, because vector is efficient for some purpose and list is efficient for some other

    0 讨论(0)
提交回复
热议问题