What is an appropriate `GetHashCode()` algorithm for a 2D point struct (avoiding clashes)

纵然是瞬间 提交于 2019-12-19 14:59:42

问题


Consider the following code:

struct Vec2 : IEquatable<Vec2>
{
    double X,Y;

    public bool Equals(Vec2 other)
    {
        return X.Equals(other.X) && Y.Equals(other.Y);
    }

    public override bool Equals(object obj)
    {
        if (obj is Vec2)
        {
            return Equals((Vec2)obj);
        }
        return false;
    }

    // this will return the same value when X, Y are swapped
    public override int GetHashCode()
    {
        return X.GetHashCode() ^ Y.GetHashCode();
    }

}

Beyond the conversation of comparing doubles for equality (this is just demo code), what I am concerned with is that there is a hash clash when X, Y values are swapped. For example:

Vec2 A = new Vec2() { X=1, Y=5 };
Vec2 B = new Vec2() { X=5, Y=1 };

bool test1 = A.Equals(B);  // returns false;
bool test2 = A.GetHashCode() == B.GetHashCode() // returns true !!!!!

which should wreck havoc in a dictionary collection. So the question is how to property form the GetHashCode() function for 2,3 or even 4 floating point values such that the results are not symmetric and the hashes don't clash.

Edit 1:

Point implements the inappropriate x ^ y solution, and PointF wraps ValueType.GetHashCode().

Rectangle has a very peculiar (((X ^ ((Y << 13) | (Y >> 19))) ^ ((Width << 26) | (Width >> 6))) ^ ((Height << 7) | (Height >> 25))) expression for the hash code, which seems to perform as expected.

Edit 2:

'System.Double' has a nice implementation as it does not consider each bit equally important

public override unsafe int GetHashCode() //from System.Double
{
    double num = this;
    if (num == 0.0)
    {
        return 0;
    }
    long num2 = *((long*) &num);
    return (((int) num2) ^ ((int) (num2 >> 32)));
}

回答1:


Jon skeet has this covered:

What is the best algorithm for an overridden System.Object.GetHashCode?

   public override int GetHashCode()
   {
       unchecked // Overflow is fine, just wrap
       {
           int hash = 17;
           // Suitable nullity checks etc, of course :)
           hash = hash * 23 + X.GetHashCode();
           hash = hash * 23 + Y.GetHashCode();
           return hash;
       }
   }

Also, change your Equals(object) implementation to:

return Equals(obj as FVector2);

Note however that this could perceive a derived type to be equal. If you don't want that, you'd have to compare the runtime type other.GetType() with typeof(FVector2) (and don't forget nullity checks) Thanks for pointing out it's a struct, LukH

Resharper has nice code generation for equality and hash code, so if you have resharper you can let it do its thing




回答2:


Hash collisions don't wreak havoc in a dictionary collection. They'll reduce the efficiency if you're unlucky enough to get them, but dictionaries have to cope with them.

Collisions should be rare if at all possible, but they're don't mean the implementation is incorrect. XORs are often bad for the reasons you've given (high collisions) - ohadsc has posted a sample I gave before for an alternative, which should be fine.

Note that it would be impossible to implement Vec2 with no collisions - there are only 232 possible return values from GetHashCode, but there are rather more possible X and Y values, even after you've removed NaN and infinite values...

Eric Lippert has a recent blog post on GetHashCode which you may find useful.




回答3:


What are reasonable bounds for the coordinates?

Unless it can be all possible integer values you could simply:

const SOME_LARGE_NUMBER=100000; return SOME_LARGE_NUMBER * x + y;




回答4:


If size of your hash code is lesser than size of your struct, then clashes are inevitable anyways.




回答5:


The hash codes approach works for interger coordinates but is not recommended for floating point values. With floating point coordinates one can create a point-set/pool by using a sorted sequence structure.

A sorted sequence is a leaf version balanced binary tree.

Here the keys would be the point coordinates.



来源:https://stackoverflow.com/questions/5221396/what-is-an-appropriate-gethashcode-algorithm-for-a-2d-point-struct-avoiding

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!