What is the default hash function used in C++ std::unordered_map?

前端 未结 2 716
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-11-30 23:48

I am using

unordered_map

and

unordered_map

What hash function is use

相关标签:
2条回答
  • 2020-12-01 00:28

    Though the hashing algorithms are compiler-dependent, I'll present it for GCC C++11. @Avidan Borisov astutely discovered that the GCC hashing algorithm used for strings is "MurmurHashUnaligned2," by Austin Appleby. I did some searching and found a mirrored copy of GCC on Github. Therefore:

    The GCC C++11 hashing functions used for unordered_map (a hash table template) and unordered_set (a hash set template) appear to be as follows.

    • Thanks to Avidan Borisov for his background research which on the question of what are the GCC C++11 hash functions used, stating that GCC uses an implementation of "MurmurHashUnaligned2", by Austin Appleby (http://murmurhash.googlepages.com/).
    • In the file "gcc/libstdc++-v3/libsupc++/hash_bytes.cc", here (https://github.com/gcc-mirror/gcc/blob/master/libstdc++-v3/libsupc++/hash_bytes.cc), I found the implementations. Here's the one for the "32-bit size_t" return value, for example (pulled 11 Aug 2017)

    Code:

    // Implementation of Murmur hash for 32-bit size_t.
    size_t _Hash_bytes(const void* ptr, size_t len, size_t seed)
    {
      const size_t m = 0x5bd1e995;
      size_t hash = seed ^ len;
      const char* buf = static_cast<const char*>(ptr);
    
      // Mix 4 bytes at a time into the hash.
      while (len >= 4)
      {
        size_t k = unaligned_load(buf);
        k *= m;
        k ^= k >> 24;
        k *= m;
        hash *= m;
        hash ^= k;
        buf += 4;
        len -= 4;
      }
    
      // Handle the last few bytes of the input array.
      switch (len)
      {
        case 3:
          hash ^= static_cast<unsigned char>(buf[2]) << 16;
          [[gnu::fallthrough]];
        case 2:
          hash ^= static_cast<unsigned char>(buf[1]) << 8;
          [[gnu::fallthrough]];
        case 1:
          hash ^= static_cast<unsigned char>(buf[0]);
          hash *= m;
      };
    
      // Do a few final mixes of the hash.
      hash ^= hash >> 13;
      hash *= m;
      hash ^= hash >> 15;
      return hash;
    }
    

    For additional hashing functions, including djb2, and the 2 versions of the K&R hashing functions (one apparently terrible, one pretty good), see my other answer here: https://stackoverflow.com/a/45641002/4561887.

    0 讨论(0)
  • 2020-12-01 00:33

    The function object std::hash<> is used.

    Standard specializations exist for all built-in types, and some other standard library types such as std::string and std::thread. See the link for the full list.

    For other types to be used in a std::unordered_map, you will have to specialize std::hash<> or create your own function object.

    The chance of collision is completely implementation-dependent, but considering the fact that integers are limited between a defined range, while strings are theoretically infinitely long, I'd say there is a much better chance for collision with strings.

    As for the implementation in GCC, the specialization for builtin-types just returns the bit pattern. Here's how they are defined in bits/functional_hash.h:

      /// Partial specializations for pointer types.
      template<typename _Tp>
        struct hash<_Tp*> : public __hash_base<size_t, _Tp*>
        {
          size_t
          operator()(_Tp* __p) const noexcept
          { return reinterpret_cast<size_t>(__p); }
        };
    
      // Explicit specializations for integer types.
    #define _Cxx_hashtable_define_trivial_hash(_Tp)     \
      template<>                        \
        struct hash<_Tp> : public __hash_base<size_t, _Tp>  \
        {                                                   \
          size_t                                            \
          operator()(_Tp __val) const noexcept              \
          { return static_cast<size_t>(__val); }            \
        };
    
      /// Explicit specialization for bool.
      _Cxx_hashtable_define_trivial_hash(bool)
    
      /// Explicit specialization for char.
      _Cxx_hashtable_define_trivial_hash(char)
    
      /// ...
    

    The specialization for std::string is defined as:

    #ifndef _GLIBCXX_COMPATIBILITY_CXX0X
      /// std::hash specialization for string.
      template<>
        struct hash<string>
        : public __hash_base<size_t, string>
        {
          size_t
          operator()(const string& __s) const noexcept
          { return std::_Hash_impl::hash(__s.data(), __s.length()); }
        };
    

    Some further search leads us to:

    struct _Hash_impl
    {
      static size_t
      hash(const void* __ptr, size_t __clength,
           size_t __seed = static_cast<size_t>(0xc70f6907UL))
      { return _Hash_bytes(__ptr, __clength, __seed); }
      ...
    };
    ...
    // Hash function implementation for the nontrivial specialization.
    // All of them are based on a primitive that hashes a pointer to a
    // byte array. The actual hash algorithm is not guaranteed to stay
    // the same from release to release -- it may be updated or tuned to
    // improve hash quality or speed.
    size_t
    _Hash_bytes(const void* __ptr, size_t __len, size_t __seed);
    

    _Hash_bytes is an external function from libstdc++. A bit more searching led me to this file, which states:

    // This file defines Hash_bytes, a primitive used for defining hash
    // functions. Based on public domain MurmurHashUnaligned2, by Austin
    // Appleby.  http://murmurhash.googlepages.com/
    

    So the default hashing algorithm GCC uses for strings is MurmurHashUnaligned2.

    0 讨论(0)
提交回复
热议问题