Hash Collision or Hashing Collision in HashMap is not a new topic and I\'ve come across several blogs and discussion boards explaining how to produce Hash Collision or how t
Actually I think the hash collision is Normal. Let talk about a case to think. We have 1000000 big numbers(the set S of x), say x is in 2^64. And now we want to do a map for this number set. lets map this number set S to [0,1000000] .
But how? use hash!!
Define a hash function f(x) = x mod 1000000. And now the x in S will be converted into [0,1000000), OK, But you will find that many numbers in S will convert into one number. for example. the number k * 1000000 + y will all be located in y which because (k * 1000000 + y ) % x = y. So this is a hash collision.
And how to deal with collision? In this case we talked above, it is very difficult to delimiter the collision because the math computing has some posibillity. We can find a more complex, more good hash function, but can not definitely say we eliminate the collision. We should do our effort to find a more good hash function to decrease the hash collision. Because the hash collision increase the time cost we use hash to find something.
Simplely there are two ways to deal with hash collision. the linked list is a more direct way, for example: if two numbers above get same value after the hash_function, we create a linkedlist from this value bucket, and all the same value is put the value's linkedlist. And another way is that just find a new position for the later number. for example, if number 1000005 has took the position in 5 and when 2000005 get value 5, it can not be located at position 5, it then go ahead and find a empty position to took.
For the last question : Does Java generate or at least try to generate unique hashCode per class during object initiation?
the hashcode of Object is typically implemented by converting the internal address of the object into an integer. So you can think different objects has different hashcode, if you use the Object's hashcode().
What exactly is Hash Collision - is it a feature, or common phenomenon which is mistakenly done but good to avoid?
What exactly causes Hash Collision - the bad definition of custom class' hashCode() method, OR to leave the equals() method un-overridden while imperfectly overriding the hashCode() method alone, OR is it not up to the developers and many popular java libraries also has classes which can cause Hash Collision?
Does anything go wrong or unexpected when Hash Collision happens? I mean is there any reason why we should avoid Hash Collision?
to summarize: hashcode collision is harmless is you know what is that for and why is not the same as an id used to prove equality
What exactly is Hash Collision - is it a feature, or common phenomenon which is mistakenly done but good to avoid?
It's a feature. It arises out of the nature of a hashCode: a mapping from a large value space to a much smaller value space. There are going to be collisions, by design and intent.
What exactly causes Hash Collision - the bad definition of custom class' hashCode() method,
A bad design can make it worse, but it is endemic in the notion.
OR to leave the equals() method un-overridden while imperfectly overriding the hashCode() method alone,
No.
OR is it not up to the developers and many popular java libraries also has classes which can cause Hash Collision?
This doesn't really make sense. Hashes are bound to collide sooner or later, and poor algorithms can make it sooner. That's about it.
Does anything go wrong or unexpected when Hash Collision happens?
Not if the hash table is competently written. A hash collision only means that the hashCode is not unique, which puts you into calling equals()
, and the more duplicates there are the worse the performance.
I mean is there any reason why we should avoid Hash Collision?
You have to trade off ease of computation against spread of values. There is no single black and white answer.
Does Java generate or atleast try to generate unique hasCode per class during object initiation?
No. 'Unique hash code' is a contradiction in terms.
If no, is it right to rely on Java alone to ensure that my program would not run into Hash Collision for JRE classes? If not right, then how to avoid hash collision for hashmaps with final classes like String as key?
The question is meaningless. If you're using String
you don't have any choice about the hashing algorithm, and you are also using a class whose hashCode has been slaved over by experts for twenty or more years.
What exactly is Hash Collision - is it a feature, or common phenomenon which is mistakenly done but good to avoid?
Neither... both... it is a common phenomenon, but not mistakenly done, that is good to avoid.
What exactly causes Hash Collision - the bad definition of custom class' hashCode() method, OR to leave the equals() method un-overridden while imperfectly overriding the hashCode() method alone, OR is it not up to the developers and many popular java libraries also has classes which can cause Hash Collision?
by poorly designing your hashCode() method, you can produce too many collisions, leaving you equals method un-overridden should not directly affect the number of collisions, many popular java libraries have classes that can cause collisions (nearly all classes actually).
Does anything go wrong or unexpected when Hash Collision happens? I mean is there any reason why we should avoid Hash Collision?
There is degradation in performance, that is a reason to avoid them, but the program should continue to work.
Does Java generate or at least try to generate unique hashCode per class during object initiation? If no, is it right to rely on Java alone to ensure that my program would not run into Hash Collision for JRE classes? If not right, then how to avoid hash collision for hashmaps with final classes like String as key?
Java doesn't try to generate a unique hash code during object initialization, but it has a default implementation of hashCode() and equals(). The default implementation works to know whether two object references point to the same instance or not, and doesn't rely on the content (field values) of the objects. Therefore, the String class has its own implementation.