The problem in general: I have a big 2d point space, sparsely populated with dots. Think of it as a big white canvas sprinkled with black dots. I have to it
According to your use case, it might be possible to use a Quadtree and replace points with the string of branch names. It is actually a sparse representation for points and will need a custom Quadtree structure that extends the canvas by adding branches when you add points off the canvas but it avoids collisions and you'll have benefits like quick nearest neighbor searches.
If you're already using languages or platforms that all objects (even primitive ones like integers) has built-in hash functions implemented (Java platform Languages like Java, .NET platform languages like C#. And others like Python, Ruby, etc ). You may use built-in hashing values as a building block and add your "hashing flavor" in to the mix. Like:
// C# code snippet
public class SomeVerySimplePoint {
public int X;
public int Y;
public override int GetHashCode() {
return ( Y.GetHashCode() << 16 ) ^ X.GetHashCode();
}
}
And also having test cases like "predefined million point set" running against each possible hash generating algorithm comparison for different aspects like, computation time, memory required, key collision count, and edge cases (too big or too small values) may be handy.
the Fibonacci hash works very well for integer pairs
multiplier 0x9E3779B9
other word sizes 1/phi = (sqrt(5)-1)/2 * 2^w round to odd
a1 + a2*multiplier
this will give very different values for close together pairs
I do not know about the result with all pairs
Your "ideal" is impossible.
You want a mapping (x, y) -> i where x, y, and i are all 32-bit quantities, which is guaranteed not to generate duplicate values of i.
Here's why: suppose there is a function hash() so that hash(x, y) gives different integer values. There are 2^32 (about 4 billion) values for x, and 2^32 values of y. So hash(x, y) has 2^64 (about 16 million trillion) possible results. But there are only 2^32 possible values in a 32-bit int, so the result of hash() won't fit in a 32-bit int.
See also http://en.wikipedia.org/wiki/Counting_argument
Generally, you should always design your data structures to deal with collisions. (Unless your hashes are very long (at least 128 bit), very good (use cryptographic hash functions), and you're feeling lucky).
a hash function that is GUARANTEED collision-free is not a hash function :)
Instead of using a hash function, you could consider using binary space partition trees (BSPs) or XY-trees (closely related).
If you want to hash two uint32's into one uint32, do not use things like Y & 0xFFFF because that discards half of the bits. Do something like
(x * 0x1f1f1f1f) ^ y
(you need to transform one of the variables first to make sure the hash function is not commutative)
If you can do a = ((y & 0xffff) << 16) | (x & 0xffff) then you could afterward apply a reversible 32-bit mix to a, such as Thomas Wang's
uint32_t hash( uint32_t a)
a = (a ^ 61) ^ (a >> 16);
a = a + (a << 3);
a = a ^ (a >> 4);
a = a * 0x27d4eb2d;
a = a ^ (a >> 15);
return a;
}
That way you get a random-looking result rather than high bits from one dimension and low bits from the other.