I\'ve read about 10 different questions on when and how to override GetHashCode
but there\'s still something I don\'t quite get. Most implementations of
Wow, that's actually several questions in one :-). So one after the other:
it's been cited that the value of GetHashCode should never change over the lifetime of the object. How does that work if the fields that it's based on are mutable?
This common advice is meant for the case where you want to use your object as a key in a HashTable/dictionary etc. . HashTables usually require the hash not to change, because they use it to decide how to store & retrieve the key. If the hash changes, the HashTable will probably no longer find your object.
To cite the docs of Java's Map interface:
Note: great care must be exercised if mutable objects are used as map keys. The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map.
In general it's a bad idea to use any kind of mutable object as a key in a hash table: It's not even clear what should happen if a key changes after it's been added to the hash table. Should the hash table return the stored object via the old key, or via the new key, or via both?
So the real advice is: Only use immutable objects as keys, and make sure their hashcode never changes either (which is usually automatic if the object is immutable).
Also what if I do want dictionary lookups etc to be based on reference equality not my overridden Equals?
Well, find a dictionary implementation that works like that. But the standard library dictionaries use the hashcode&Equals, and there's no way to change that.
I'm primarily overriding Equals for the ease of unit testing my serialization code which I assume serializing and deserializing (to XML in my case) kills the reference equality so I want to make sure at least it's correct by value equality. Is this bad practice to override Equals in this case?
No, I'd find that perfectly acceptable. However, you should not use such objects as keys in a dictionary/hashtable, as they're mutable. See above.
The underlying topic here is how to best uniquely identify objects. You mention serialization/deserialization which is important because referential integrity is lost in that process.
The short answer, Is that objects should be uniquely identified by the smallest set of immutable fields that can be used to do so. These are the fields you should use when overrideing GetHashCode and Equals.
For testing it's perfectly reasonable to define whatever assertions you need, usually these are not defined on the type itself but rather as utility methods in the test suite. Maybe a TestSuite.AssertEquals(MyClass, MyClass) ?
Note that GetHashCode and Equals should work together. GetHashCode should return the same value for two objects if they are equal. Equals should return true if and only if two objects have the same hash code. (Note that it's possible that two object may not be equal but may return the same hash code). There are plenty of webpage that tackle this topic head-on, just google away.
It doesn't in the sense that the hash code will change as the object changes. That is a problem for all of the reasons listed in the articles you read. Unfortunately this is the type of problem that typically only show up in corner cases. So developers tend to get away with the bad behavior.
As long as you implement an interface like IEquatable<T>
this shouldn't be a problem. Most dictionary implementations will choose an equality comparer in a way that will use IEquatable<T>
over Object.ReferenceEquals. Even without IEquatable<T>
, most will default to calling Object.Equals() which will then go into your implementation.
If you expect your objects to behave with value equality you should override == and != to enforce value equality for all comparisons. Users can still use Object.ReferenceEquals if they actually want reference equality.
What the BCL uses has changed a bit over time. Now most cases which use equality will take an IEqualityComparer<T>
instance and use it for equality. In the cases where one is not specified they will use EqualityComparer<T>.Default
to find one. At worst case this will default to calling Object.Equals
If you have a mutable object, there isn't much point in overriding the GetHashCode method, as you can't really use it. It's used for example by the Dictionary
and HashSet
collections to place each item in a bucket. If you change the object while it's used as a key in the collection, the hash code no longer matches the bucket that the object is in, so the collection doesn't work properly and you may never find the object again.
If you want the lookup not to use the GetHashCode
or Equals
method of the class, you can always provide your own IEqualityComparer
implementation to use instead when you create the Dictionary
.
The Equals
method is intended for value equality, so it's not wrong to implement it that way.
I don't know about C#, being a relative noob to it, but in Java, if you override equals() you need to also override hashCode() to maintain the contract between them (and vice-versa)... And java also has the same catch 22; basically forcing you use immutable fields... But this is an issue only for classes which are used as a hash-key, and Java has alternate implementations for all hash-based collections... which maybe not as fast, but they do effecitely allow you to use a mutable object as a key... it's just (usually) frowned up as a "poor design".
And I feel the urge to point out that this fundamental problem is timeless... It's been around since Adam was a lad.
I've worked on fortran code which is older than I am (I'm 36) which breaks when a username is changed (like when a girl gets married, or divorced ;-) ... Thus is engineering, The adopted solution was: The GetHashCode "method" remembers the previously calculated hashCode, recalculates the hashCode (i.e. a virtual isDirty marker) and if the keyfields have changed it returns null. This causes the cache to delete the "dirty" user (by calling another GetPreviousHashCode) and then the cache returns null, causing the user to re-read from the database. An interesting and worthwhile hack; even if I do say so myself ;-)
I'll trade-off mutability (only desirable in corner cases) for O(1) access (desirable in all cases). Welcome to engineering; the land of the informed compromise.
Cheers. Keith.