What are good methods for hashing bits in an Int32 or UInt32?

痴心易碎 提交于 2019-12-22 08:24:44

问题


I have an implementation of a pseudo random number generator, specifically of George Marsaglia's XOR-Shift RNG. My implementation is here:

FastRandom.cs

It turns out that the first random sample is very closely correlated with the seed, which is fairly obvious if you take a look at the Reinitialise(int seed) method. This is bad. My proposed solution is to mix up the bits of the seed as follows:

_x = (uint)(  (seed * 2147483647) 
           ^ ((seed << 16 | seed >> 48) * 28111) 
           ^ ((seed << 32 | seed >> 32) * 69001)
           ^ ((seed << 48 | seed >> 16) * 45083));

So I have significantly weakened any correlation by multiplying the seed's bits with four primes and XORing back to form _x. I also rotate the seed's bits before multiplication to ensure that bits of varying magnitudes get mixed up across the full range of values for a 32 bit value.

The four-way rotation just seemed liked a nice balance between doing nothing and every possible rotation (32). The primes are 'finger in the air' - enough magnitude and bit structure to jumble up the bits and 'spread' them over the full 32 bits regardless of the starting seed.

Should I use bigger primes? Is there a standard approach to this problem, perhaps with a more formal basis? I am trying to do this with minimal CPU overhead.

Thanks

=== UPDATE ===

I decided to use some primes with set bits better distributed across all 32 bits. The result is that I can omit the shifts as the multiplications achieve the same effect (hashing bits across the full range of 32 bits), so I then just add the four products to give the final seed...

_x = (uint)(  (seed * 1431655781) 
            + (seed * 1183186591) 
            + (seed * 622729787)
            + (seed * 338294347));

I could possibly get away with fewer primes/multiplciations. Two seemed too few (I could still see patterns in the first samples), three looked OK, so for a safety margin I made it four.

=== UPDATE 2 ===

FYI the above reduces to the functionally equivalent:

_x = seed * 3575866506U;

I didn't spot this initially and when I did I was wondering if overflowing at different stages in the calculation would cause a different result. I believe the answer is no - the two calculations always give the same answer.


回答1:


According to some researchers, CrapWow, Crap8 and Murmur3 are the best non-cryptographic hash algorithms available today that are both fast, simple and statistically good.

More information is available at Non-Cryptographic Hash Function Zoo.



来源:https://stackoverflow.com/questions/12717413/what-are-good-methods-for-hashing-bits-in-an-int32-or-uint32

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!