hash function to size of buckets for unordered containers?

为君一笑 提交于 2019-12-11 02:30:04

问题


To put an element into, say, an unordered set we calculate its hash and put it to the corresponding bucket. However we usually have many fewer buckets than the range of values of the hash function. How is the correspondence of buckets and hash values calculated? It seems like some function is used reflecting (0 ... size_t) -> (0 ... size_of_buckets - 1). But using such a function could lead to big number of collisions even for good hash function.


回答1:


I'm not sure whether std::unordered_map exact behavior is defined in the standard. However, the basic principle is this: always keep the number of buckets larger than the size of the container multiplied by a small number (this small number is 1.0/load_factor). This way, collisions should be rare.

For a hash table, usually there are two ways to calculate bucket_index:

  1. number of buckets is chosen to be a power of 2: hash is calculated, then some of its lower/higher bits extracted with bit operations. This method needs a "good" hash function, where every bit is random
  2. number of buckets is chosen to be a prime number: hash is calculated, then with a modulo operation, bucket_index is calculated. This method doesn't need a "too good" hash function

For method 1., if the hash function quality is bad, you can get a lot of collisions. For method 2., even with a not-too-good quality hash function, collisions are rare usually. But, method 1. is usually faster, as bit operations are much faster than a mod (but there are techniques to make it faster), and a good-enough quality hash function is usually cheap.




回答2:


Many hash tables are built to be general enough so as to support many different hash functions, as such there most of them do not "calculate" a correspondence between the range of the hash function and the number of buckets.

The number of buckets is however dependent on the internals of the hash table (collision resolution technique, etc) and especially on this value called the load factor, and when the load factor limit has been reached implementations usually increase the number of buckets by a predetermined constant factor.

You should look more into the std::unordered_map interface and play around with the following functions to learn more

http://en.cppreference.com/w/cpp/container/unordered_map/max_bucket_count http://en.cppreference.com/w/cpp/container/unordered_map/bucket_count http://en.cppreference.com/w/cpp/container/unordered_map/bucket_size http://en.cppreference.com/w/cpp/container/unordered_map/max_load_factor http://en.cppreference.com/w/cpp/container/unordered_map/load_factor



来源:https://stackoverflow.com/questions/44668672/hash-function-to-size-of-buckets-for-unordered-containers

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!