knuth multiplicative hash

后端 未结 4 868
一生所求
一生所求 2021-02-02 03:10

Is this a correct implementation of the Knuth multiplicative hash.

int hash(int v)
{
    v *= 2654435761;
    return v >> 32;
}

Does over

4条回答
  •  不知归路
    2021-02-02 03:42

    Knuth multiplicative hash is used to compute an hash value in {0, 1, 2, ..., 2^p - 1} from an integer k.

    Suppose that p is in between 0 and 32, the algorithm goes like this:

    • Compute alpha as the closest integer to 2^32 (-1 + sqrt(5)) / 2. We get alpha = 2 654 435 769.

    • Compute k * alpha and reduce the result modulo 2^32:

      k * alpha = n0 * 2^32 + n1 with 0 <= n1 < 2^32

    • Keep the highest p bits of n1:

      n1 = m1 * 2^(32-p) + m2 with 0 <= m2 < 2^(32 - p)

    So, a correct implementation of Knuth multiplicative algorithm in C++ is:

    std::uint32_t knuth(int x, int p) {
        assert(p >= 0 && p <= 32);
    
        const std::uint32_t knuth = 2654435769;
        const std::uint32_t y = x;
        return (y * knuth) >> (32 - p);
    }
    

    Forgetting to shift the result by (32 - p) is a major mistake. As you would lost all the good properties of the hash. It would transform an even sequence into an even sequence which would be very bad as all the odd slots would stay unoccupied. That's like taking a good wine and mixing it with Coke. By the way, the web is full of people misquoting Knuth and using a multiplication by 2 654 435 761 without taking the higher bits. I just opened the Knuth and he never said such a thing. It looks like some guy who decided he was "smart" decided to take a prime number close to 2 654 435 769.

    Bare in mind that most hash tables implementations don't allow this kind of signature in their interface, as they only allow

    uint32_t hash(int x);
    

    and reduce hash(x) modulo 2^p to compute the hash value for x. Those hash tables cannot accept the Knuth multiplicative hash. This might be a reason why so many people completely ruined the algorithm by forgetting to take the higher p bits. So you can't use the Knuth multiplicative hash with std::unordered_map or std::unordered_set. But I think that those hash tables use a prime number as a size, so the Knuth multiplicative hash is not useful in this case. Using hash(x) = x would be a good fit for those tables.

    Source: "Introduction to Algorithms, third edition", Cormen et al., 13.3.2 p:263

    Source: "The Art of Computer Programming, Volume 3, Sorting and Searching", D.E. Knuth, 6.4 p:516

提交回复
热议问题