Why is std::unordered_map slow, and can I use it more effectively to alleviate that?

南楼画角 提交于 2020-01-21 04:56:47

问题


I’ve recently found out an odd thing. It seems that calculating Collatz sequence lengths with no caching at all is over 2 times faster than using std::unordered_map to cache all elements.

Note I did take hints from question Is gcc std::unordered_map implementation slow? If so - why? and I tried to used that knowledge to make std::unordered_map perform as well as I could (I used g++ 4.6, it did perform better than recent versions of g++, and I tried to specify a sound initial bucket count, I made it exactly equal to the maximum number of elements the map must hold).

In comparision, using std::vector to cache a few elements was almost 17 times faster than no caching at all and almost 40 times faster than using std::unordered_map.

Am I doing something wrong or is this container THAT slow and why? Can it be made performing faster? Or maybe hashmaps are inherently ineffective and should be avoided whenever possible in high-performance code?

The problematic benchmark is:

#include <iostream>
#include <unordered_map>
#include <cstdint>
#include <ctime>

std::uint_fast16_t getCollatzLength(std::uint_fast64_t val) {
    static std::unordered_map <std::uint_fast64_t, std::uint_fast16_t> cache ({{1,1}}, 2168611);

    if(cache.count(val) == 0) {
        if(val%2 == 0)
            cache[val] = getCollatzLength(val/2) + 1;
        else
            cache[val] = getCollatzLength(3*val+1) + 1;
    }

    return cache[val];
}

int main()
{
    std::clock_t tStart = std::clock();

    std::uint_fast16_t largest = 0;
    for(int i = 1; i <= 999999; ++i) {
        auto cmax = getCollatzLength(i);
        if(cmax > largest)
            largest = cmax;
    }
    std::cout << largest << '\n';

    std::cout << "Time taken: " << (double)(std::clock() - tStart)/CLOCKS_PER_SEC << '\n';
}

It outputs: Time taken: 0.761717

Whereas a benchmark with no caching at all:

#include <iostream>
#include <unordered_map>
#include <cstdint>
#include <ctime>

std::uint_fast16_t getCollatzLength(std::uint_fast64_t val) {
    std::uint_fast16_t length = 1;
    while(val != 1) {
        if(val%2 == 0)
            val /= 2;
        else
            val = 3*val + 1;
        ++length;
    }
    return length;
}

int main()
{
    std::clock_t tStart = std::clock();

    std::uint_fast16_t largest = 0;
    for(int i = 1; i <= 999999; ++i) {
        auto cmax = getCollatzLength(i);
        if(cmax > largest)
            largest = cmax;
    }
    std::cout << largest << '\n';

    std::cout << "Time taken: " << (double)(std::clock() - tStart)/CLOCKS_PER_SEC << '\n';
}

Outputs Time taken: 0.324586


回答1:


The standard library's maps are, indeed, inherently slow (std::map especially but std::unoredered_map as well). Google's Chandler Carruth explains this in his CppCon 2014 talk; in a nutshell: std::unordered_map is cache-unfriendly because it uses linked lists as buckets.

This SO question mentioned some efficient hash map implementations - use one of those instead.



来源:https://stackoverflow.com/questions/42588264/why-is-stdunordered-map-slow-and-can-i-use-it-more-effectively-to-alleviate-t

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!