I have many integers in range [0; 2^63-1]. There is only 10^8 integers, however. There is no duplicates. Full list is known at compile-time but it is just unique random numbers. These numbers never changes.
To store one integer explicitly, 8 bytes required, and there is associated 1-byte values, so explicit storing requires about 860 MB.
So I want to find minimal perfect hash function to map each of 10^8 integers from [0;2^63-1] to [0;10^8-1]. I should find this function only once, data never changes, and function can be complicated. But it should be minimal, perfect, and calculating should be fast. How I can do this better? Maybe it is possible to find and use some subsequences if they happens?
Thanks.
Let your computer do the work for you:
http://www.gnu.org/software/gperf/
Quote: "GNU gperf is a perfect hash function generator. For a given list of strings, it produces a hash function and hash table, in form of C or C++ code, for looking up a value depending on the input string. The hash function is perfect, which means that the hash table has no collisions, and the hash table lookup needs a single string comparison only. "
I'm working on an algorithm and Java implementation that needs less than 1.6 bits per key.
Previously, I have implemented a minimal perfect hash function tool in Java that needs less than 2.0 bits per key.
Other algorithms are implemented in CMPH. For example CHD needs about 2.06 bits per key by default. It can be configured to use less space, but generation is then slower.
来源:https://stackoverflow.com/questions/6743316/minimal-perfect-hash-function