What are the disadvantages of hashing function using multiplication method

拜拜、爱过 提交于 2019-12-18 17:22:46

问题


There are two basic methods for implementing a hash function that are cited in pretty much every text book and CS courses:

  1. Division method where we simply do k mod m essentially picking m as prime not too close to power of 2.
  2. Multiplication method where we multiply k with some well-chosen irrational number (Knuth suggests using number based on Golden ratio) between 0 to 1, take the fractional part of the product and use desired number of most significant bits from it.

Most text books and courses cite several disadvantages for method 1 including the fact that it's expensive and things depends on m. However I've never yet seen any textbook or courses mention single disadvantage for method 2.

This makes method 2 more desirable. Plus the method 2 can be made very efficient on modern computers eliminating floating point arithmetic all together. So it would look like that method 2 is hands down winner and no body should be talking about method 1. But that's obviously not the case. In fact, I've never seen method 2 getting used in any practical implementations. So it does have some disadvantages.

Question is what are they and why method 1 gets used more often despite of its disadvantages?


回答1:


Division method is used in conjunction with hash table algorithms which require prime table size - for example, open addressing with double hashing or QHash, when you anyway need to divide the key, or it's hash, by table size to get the index.

Multiplication method is suitable when the table size is a power of two, then getting the index from the hash could be implemented as bitwise AND operation, therefore the whole path of computing the table index by the key, with multiplication hashing, is very fast. You can explore some actual implementations by searching for magic constant 2654435769 on Github.

There is a recent trend of using MurmurHash3 avalanche procedure instead of multiplication method:

int hash = key;
hash ^= (hash >> 16);
hash *= 0x85ebca6b;
hash ^= (hash >> 13);
hash *= 0xc2b2ae35;
hash ^= (hash >> 16);
// see this code and the version for 64 bits here:
// https://smhasher.googlecode.com/svn/trunk/MurmurHash3.cpp

Because it is just a bit slower, but considered more robust to bad key distribution. That is why you might get wrong (or right?) impression that multiplication method is used unfairly rarely.



来源:https://stackoverflow.com/questions/25217770/what-are-the-disadvantages-of-hashing-function-using-multiplication-method

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!