hash function providing unique uint from an integer coordinate pair

前端未结

关注

 11  1866

The problem in general: I have a big 2d point space, sparsely populated with dots. Think of it as a big white canvas sprinkled with black dots. I have to it

相关标签:

11条回答

天涯浪人

2020-12-13 10:27

According to your use case, it might be possible to use a Quadtree and replace points with the string of branch names. It is actually a sparse representation for points and will need a custom Quadtree structure that extends the canvas by adding branches when you add points off the canvas but it avoids collisions and you'll have benefits like quick nearest neighbor searches.

0 讨论(0)
发布评论:

提交评论
- 加载中...
渐次进展

2020-12-13 10:27
If you're already using languages or platforms that all objects (even primitive ones like integers) has built-in hash functions implemented (Java platform Languages like Java, .NET platform languages like C#. And others like Python, Ruby, etc ). You may use built-in hashing values as a building block and add your "hashing flavor" in to the mix. Like:
```
// C# code snippet 
public class SomeVerySimplePoint { 

public int X;
public int Y;

public override int GetHashCode() {
   return ( Y.GetHashCode() << 16 ) ^ X.GetHashCode();
}

}
```
And also having test cases like "predefined million point set" running against each possible hash generating algorithm comparison for different aspects like, computation time, memory required, key collision count, and edge cases (too big or too small values) may be handy.
0 讨论(0)
发布评论:

提交评论
- 加载中...
孤街浪徒

2020-12-13 10:27

the Fibonacci hash works very well for integer pairs

multiplier 0x9E3779B9

other word sizes 1/phi = (sqrt(5)-1)/2 * 2^w round to odd

a1 + a2*multiplier

this will give very different values for close together pairs

I do not know about the result with all pairs

0 讨论(0)
发布评论:

提交评论
- 加载中...
自闭症患者

2020-12-13 10:29

Your "ideal" is impossible.

You want a mapping (x, y) -> i where x, y, and i are all 32-bit quantities, which is guaranteed not to generate duplicate values of i.

Here's why: suppose there is a function hash() so that hash(x, y) gives different integer values. There are 2^32 (about 4 billion) values for x, and 2^32 values of y. So hash(x, y) has 2^64 (about 16 million trillion) possible results. But there are only 2^32 possible values in a 32-bit int, so the result of hash() won't fit in a 32-bit int.

See also http://en.wikipedia.org/wiki/Counting_argument

Generally, you should always design your data structures to deal with collisions. (Unless your hashes are very long (at least 128 bit), very good (use cryptographic hash functions), and you're feeling lucky).

0 讨论(0)
发布评论:

提交评论
- 加载中...
别那么骄傲

2020-12-13 10:33
a hash function that is GUARANTEED collision-free is not a hash function :)

Instead of using a hash function, you could consider using binary space partition trees (BSPs) or XY-trees (closely related).

If you want to hash two uint32's into one uint32, do not use things like Y & 0xFFFF because that discards half of the bits. Do something like
```
(x * 0x1f1f1f1f) ^ y
```
(you need to transform one of the variables first to make sure the hash function is not commutative)
0 讨论(0)
发布评论:

提交评论
- 加载中...
星月不相逢

2020-12-13 10:38
If you can do a = ((y & 0xffff) << 16) | (x & 0xffff) then you could afterward apply a reversible 32-bit mix to a, such as Thomas Wang's
```
uint32_t hash( uint32_t a)
    a = (a ^ 61) ^ (a >> 16);
    a = a + (a << 3);
    a = a ^ (a >> 4);
    a = a * 0x27d4eb2d;
    a = a ^ (a >> 15);
    return a;
}
```
That way you get a random-looking result rather than high bits from one dimension and low bits from the other.
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页