I am using Java 7, and I have the following class below. I implemented equals
and hashCode
correctly, but the problem is that equals
r
There is no requirement that unequal objects must have different hashCodes. Equal objects are expected to have equal hashCodes, but hash collisions are not forbidden. return 1;
would be a perfectly legal implementation of hashCode, if not very useful.
There are only 32 bits worth of possible hash codes, and an unbounded number of possible objects, after all :) Collisions will happen sometimes.
it's not necessary for two unequal objects to have different hashes, the important thing is to have the same hash for two equal objects.
I can implement hashCode() like this :
public int hashCode() {
return 5;
}
and it will stay correct (but inefficient).
HashCode being 32 bit int value, there is always a possibility of collisions(same hash code for two objects), but its rare/coincidental. Your example is one of the such a highly coincidental one. Here is the explanation.
When you call Objects.hash
, it internally calls Arrays.hashCode()
with logic as below:
public static int hashCode(Object a[]) {
if (a == null)
return 0;
int result = 1;
for (Object element : a)
result = 31 * result + (element == null ? 0 : element.hashCode());
return result;
}
For your 3 param hashCode, it results into below:
31 * (31 * (31 *1 +hashOfString1)+hashOfString2) + hashOfString3
For your first object. Hash value of individual Strings are:
chamorro --> 1140493257 english --> 1698758127 notes --> 0
And for second object:
chamorro --> 1140494218 english --> 1698728336 notes -->0
If you notice, first two values of the hash code in both objects are different.
But when it computes the final hash code as:
int hashCode1 = 31*(31*(31+1140493257) + 1698758127)+0;
int hashCode2 = 31*(31*(31+1140494218) + 1698728336)+0;
Coincidentally it results into same hash code 1919283673
because int
is stored in 32 bits.
Verify the theory your self be using the code segment below:
public static void main(String... args) {
ChamorroEntry entry1 = new ChamorroEntry("Åguigan",
"Second island south of Saipan. Åguihan.", "");
ChamorroEntry entry2 = new ChamorroEntry("Åguihan",
"Second island south of Saipan. Åguigan.", "");
System.out.println(entry1.equals(entry2)); // returns false
System.out.println("Åguigan".hashCode());
System.out.println("Åguihan".hashCode());
System.out.println("Second island south of Saipan. Åguihan.".hashCode());
System.out.println("Second island south of Saipan. Åguigan.".hashCode());
System.out.println("".hashCode());
System.out.println("".hashCode());
int hashCode1 = 31*(31*(31+1140493257) + 1698758127)+0;
int hashCode2 = 31*(31*(31+1140494218) + 1698728336)+0;
System.out.println(entry1.hashCode() + "\n" + entry2.hashCode());
System.out.println(getHashCode(
new String[]{entry1.chamorro, entry1.english, entry1.notes})
+ "\n" + getHashCode(
new String[]{entry2.chamorro, entry2.english, entry2.notes}));
System.out.println(hashCode1 + "\n" + hashCode2); // returns same hash code!
}
public static int getHashCode(Object a[]) {
if (a == null)
return 0;
int result = 1;
for (Object element : a)
result = 31 * result + (element == null ? 0 : element.hashCode());
return result;
}
If you use some different string parameters, hope it will result into different hashCode.
Actually, you happened to trigger pure coincidence. :)
Objects.hash
happens to be implemented by successively adding the hash code of each given object and then multiplying the result by 31, while String.hashCode
does the same with each of its characters. By coincidence, the differences in the "English" strings you used occur at exactly one offset more from the end of the string as the same difference in the "Chamorro" string, so everything cancels out perfectly. Congratulations!
Try with other strings, and you'll probably find that it works as expected. As others have already pointed out, this effect is not actually wrong, strictly speaking, since hash codes may correctly collide even if the objects they represent are unequal. If anything, it might be worthwhile trying to find a more efficient hash, but I hardly think it should be necessary in realistic situations.