How can I increase the performance in a map lookup with key type std::string?

后端 未结 14 995
天涯浪人
天涯浪人 2021-02-05 23:25

I\'m using a std::map (VC++ implementation) and it\'s a little slow for lookups via the map\'s find method.

The key type is std::string.

相关标签:
14条回答
  • 2021-02-05 23:50

    std::map's comparator isn't std::equal_to it's std::less, I'm not sure what the best way to short circuit a < compare so that it would be faster than the built in one.

    If there are always < 15 elems, perhaps you could use a key besides std::string?

    0 讨论(0)
  • 2021-02-05 23:54

    You can try to use a sorted vector (here's one sample), this may turn out to be faster (you'll have to profile it to make sure of-course).

    Reasons to think it'll be faster:

    1. Less memory allocations and deallocations (the vector will expand to the maximal size used and then reuse freed memory).
    2. Binary find with random access should be faster than tree traversal (espacially due to data locality).

    Reasons to think it'll be slower:

    1. Deleations and additions will mean moving strings around in memory, since string's swap is efficiant and the size of the data set is small this may not be an issue.
    0 讨论(0)
  • 2021-02-05 23:56

    Try std::tr1::unordered_map (found in the header <tr1/unordered_map>). This is a hash map, and, while it doesn't maintain a sorted order of elements, will likely be far faster than a regular map.

    If your compiler doesn't support TR1, get a newer version. MSVC and gcc both support TR1, and I believe the newest versions of most other compilers also have support. Unfortunately, a lot of the library reference sites haven't been updated, so TR1 remains a largely-unknown piece of technology.

    I hope C++0x isn't the same way.

    EDIT: Note that the default hashing method for tr1::unordered_map is tr1::hash, which needs to be specialized to work on a UDT, probably.

    0 讨论(0)
  • 2021-02-05 23:57

    As Even said the operator used in a set is < not ==.

    If you don't care about the order of the strings in your set you can pass the set a custom comparator that performs better than the regular less-than.

    For example if a lot of your strings have similar prefixes (but they vary in length) you can sort by string length (since string.length is constant speed).

    If you do so beware a common mistake:

    struct comp {
        bool operator()(const std::string& lhs, const std::string& rhs)
        {
            if (lhs.length() < rhs.length())
                return true;
            return lhs < rhs;
        }
    };
    

    This operator does not maintain a strict weak ordering, as it can treat two strings as each less than the other.

    string a = "z";
    string b = "aa";
    

    Follow the logic and you'll see that comp(a, b) == true and comp(b, a) == true.

    The correct implementation is:

    struct comp {
        bool operator()(const std::string& lhs, const std::string& rhs)
        {
            if (lhs.length() != rhs.length())
                return lhs.length() < rhs.length();
            return lhs < rhs;
        }
    };
    
    0 讨论(0)
  • 2021-02-06 00:00

    The first thing is to try using a hash_map if that's possible - you are right that the standard string compare doesn't first check for size (since it compares lexicographically), but writing your own map code is something you'd be better off avoiding. From your question it sounds like you do not need to iterate over ranges; in that case map doesn't have anything hash_map doesn't.

    It also depends on what sort of keys you have in your map. Are they typically very long? Also what does "a little slow" mean? If you have not profiled the code it's quite possible that it's a different part taking time.

    Update: Hmm, the bottleneck in your program is a map::find, but the map always has less than 15 elements. This makes me suspect that the profile was somehow misleading, because a find on a map this small should not be slow, at all. In fact, a map::find should be so fast, just the overhead of profiling could be more than the find call itself. I have to ask again, are you sure this is really the bottleneck in your program? You say the strings are paths, but you're not doing any sort of OS calls, file system access, disk access in this loop? Any of those should be orders of magnitude slower than a map::find on a small map. Really any way of getting a string should be slower than the map::find.

    0 讨论(0)
  • 2021-02-06 00:01

    Maybe you could reverse the strings prior to using them as keys in the map? That could help if the first few letters of each string are identical.

    0 讨论(0)
提交回复
热议问题