About Object.hashcode() and collisions

牧云@^-^@ 提交于 2019-12-10 14:29:37

问题


I was reading the JavaDoc for Object.hashCode method, it says that

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer [...])

But whatever its implementation is, hashCode method always returns a (let's assume positive) integer, so given Integer.MAX+1 different objects, two of them are going to have the same hashcode.

Why is the JavaDoc here "denying" collisions? Is it a practical conclusion given that internal address is used and "come on, you're never going to have Integer.MAX+1 objects in memory at once, so we can say it's practically always unique"?

EDIT

This bug entry (thank you Sleiman Jneidi) gives an exact idea of what I mean (it seems to be a more that 10 years old discussion):

appears that many, perhaps majority, of programmers take this to mean that the default implementation, and hence System.identityHashCode, will produce unique hashcodes.

The qualification "As much as is reasonably practical," is, in practice, insufficient to make clear that hashcodes are not, in practice, distinct.


回答1:


The docs are misleading indeed, and there is a bug opened ages ago that says that the docs are misleading especially that the implementation is JVM dependent, and in-practice especially with massive heap sizes it is so likely to get collisions when mapping object identities to 32-bit integers




回答2:


there is an interesting discussion of hashcode collisions here:

http://eclipsesource.com/blogs/2012/09/04/the-3-things-you-should-know-about-hashcode/

In particular, this highlights that your practical conclusion, "you're never going to have Integer.MAX+1 objects in memory at once, so we can say it's practically always unique" is a long way from accurate due to the birthday paradox.

The conclusion from the link is that, assuming a random distribution of hashCodes, we only need 77,163 objects before we have a 50/50 chance of hashCode collision.




回答3:


When you read this carefully, you'll notice that this only means objects should try to avoid collisions ('as much as reasonably practical'), but also that you are not guaranteed to have different hashcodes for unequal objects.

So the promise is not very strong, but it is still very useful. For instance when using the hashcode as a quick indication of equality before doing the full check.

For instance ConcurrentHashMap which will use (a function performed on) the hashcode to assign a location to an object in the map. In practice the hashcode is used to find where an object is roughly located, and the equals is used to find the precisely located.

A hashmap could not use this optimization if objects don't try to spread their hashcodes as much as possible.



来源:https://stackoverflow.com/questions/34333792/about-object-hashcode-and-collisions

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!