What integer hash function are good that accepts an integer hash key?

前端 未结 11 1608
孤街浪徒
孤街浪徒 2020-11-22 17:23

What integer hash function are good that accepts an integer hash key?

11条回答
  •  有刺的猬
    2020-11-22 17:51

    Fast and good hash functions can be composed from fast permutations with lesser qualities, like

    • multiplication with an uneven integer
    • binary rotations
    • xorshift

    To yield a hashing function with superior qualities, like demonstrated with PCG for random number generation.

    This is in fact also the recipe rrxmrrxmsx_0 and murmur hash are using, knowingly or unknowingly.

    I personally found

    uint64_t xorshift(const uint64_t& n,int i){
      return n^(n>>i);
    }
    uint64_t hash(const uint64_t& n){
      uint64_t p = 0x5555555555555555ull; // pattern of alternating 0 and 1
      uint64_t c = 17316035218449499591ull;// random uneven integer constant; 
      return c*xorshift(p*xorshift(n,32),32);
    }
    

    to be good enough.

    A good hash function should

    1. be bijective to not loose information, if possible and have the least collisions
    2. cascade as much and as evenly as possible, i.e. each input bit should flip every output bit with probability 0.5.

    Let's first look at the identity function. It satisfies 1. but not 2. :

    Input bit n determines output bit n with a correlation of 100% (red) and no others, they are therefore blue, giving a perfect red line across.

    A xorshift(n,32) is not much better, yielding one and half a line. Still satisfying 1., because it is invertible with a second application.

    A multiplication with an unsigned integer is much better, cascading more strongly and flipping more output bits with a probability of 0.5, which is what you want, in green. It satisfies 1. as for each uneven integer there is a multiplicative inverse.

    Combining the two gives the following output, still satisfying 1. as the composition of two bijective functions yields another bijective function.

    A second application of multiplication and xorshift will yield the following:

    Or you can use Galois field multiplications like GHash, they have become reasonably fast on modern CPUs and have superior qualities in one step.

       uint64_t const inline gfmul(const uint64_t& i,const uint64_t& j){           
         __m128i I{};I[0]^=i;                                                          
         __m128i J{};J[0]^=j;                                                          
         __m128i M{};M[0]^=0xb000000000000000ull;                                      
         __m128i X = _mm_clmulepi64_si128(I,J,0);                                      
         __m128i A = _mm_clmulepi64_si128(X,M,0);                                      
         __m128i B = _mm_clmulepi64_si128(A,M,0);                                      
         return A[0]^A[1]^B[1]^X[0]^X[1];                                              
       }
    

提交回复
热议问题