问题
I'm trying to implement an IEqualityComparer<string>
which basically compares two strings in a way that,(let's assume we have two strings x
and y
) if x
starts with y
or y
starts with x
they should be treated as equal.
public bool Equals(string x, string y)
{
return x.StartsWith(y) || y.StartsWith(x);
}
public int GetHashCode(string obj)
{
return obj.GetHashCode();
}
Ofcourse implementing the Equals
method is pretty easy.But the GetHashCode
is not, I couldn't think any way to implement it correctly.I have written a test program like this:
string[] values = {"hell", "hello", "foo", "fooooo"};
var result = values.Distinct(new StringComparer());
foreach(var x in result)
Console.WriteLine(x);
And I get the wrong result because of GetHashCode
:
hell
hello
foo
fooooo
Obviously I can force calling Equals
method by returning same value from the GetHashCode
for all values but I wanna know if there is another way to implement it because the performance is critical. Is there a way to implement GetHashCode
method correctly for my situation ?
Note: I know it is vague but I couldn't find a better title, if you have a better idea you are free to edit.
Edit: I'm going to use this logic with web urls. In my situation first 20 characters are equal. For example:
http://www.foo.com/bar?id=3
http://www.foo.com/bar?id=3&fooId=23
回答1:
The issue is in your definition of equality: Equality must be transitive. But it is not in your case. Take the following three values:
* f
* freeze
* foo
Then f == freeze
, and foo == f
, but freeze != foo
.
See also MSDN on Implementing the Equals Method, which says:
(x.Equals(y) && y.Equals(z))
returns true if and only ifx.Equals(z)
returns true.
A proper definition of equality produces distinct sets of values that are considered equal. If you had those, you could define a "canonical" representation for each set and calculate the hash of the canonical value so each set would have its hash code. But this only works with an operation that is transitive (as well as commutative and reflexive, these two properties are covered by your definition).
Since your definition of equality is not transitive you cannot define such sets so you can't find a proper hash code either.
But it raises other questions, too. Taking your example:
string[] values = { "hell", "hello", "foo", "fooooo" };
var result = values.Distinct(new StringComparer());
Which values do you expect to go into your result
? Do you always want the shortest version? This will not be guaranteed by your code, the result will depend on the internal implementation of Distinct
.
Implementing an EqualityComparer
might possibly be a sub-optimal approach to your actual issue. What are you trying to achieve?
回答2:
As strings are equal to each other depending on what string you compare them with, any string can be equal to another. Thus there is only one way to implement the GetHashCode
method; return the same value for all strings:
public int GetHashCode(string obj) {
return 0;
}
This will naturally give a horrible distribution. A dictionary will have a O(n) lookup time instead of O(1), but it works, and it's the only way to make it work for such an equality comparison.
来源:https://stackoverflow.com/questions/25524127/how-to-implement-gethashcode-for-this-situation