问题
1. I understand the different hash map mechanisms and the ways in which key collisions are handled (either open addressing -linear/quadratic probing, chaining, extendable hashing, etc. Which one does HashSet/HashMap make use of?
2. I realise that a good HashMap relies on a good hash function. How does Java's HashSet/HashMap hash the objects? I know that there is a hash function but so far for strings I have not needed to implement this. What if I now want to Hash a Java Object that I create - do I need to implement the hash function? Or does Java have a built in way of creating a hash code?
I know that the default implementation cannot be relied on as it bases the hash function on the memory address which is not constant.
回答1:
You could answer many of these questions yourself, by reading the source code for HashMap.
(Hint: you can usually find the source code for Java SE classes using Google; e.g. search for "java.util.HashMap source
".)
I understand the different hash map mechanisms and the ways in which key collisions are handled (either open addressing -linear/quadratic probing, chaining, extendable hashing, etc. Which one does HashSet/HashMap make use of?
Chaining. See the source code. (Line 154 in the version I linked to).
How does Java's HashSet/HashMap hash the objects?
It doesn't. The object's hashCode
method is called to do this. See the source code. (line 360).
If you look at the code you will see some interesting wrinkles:
The code (in the version I linked to) is hashing Strings using a special method. (It appears that this is to allow hashing of strings to be "tuned" at the platform level. I didn't dig into this ...)
The hashcode returned by the
Object.hashCode()
call is "scrambled" further to reduce the chance of collisions. (Read the comment!)
What if I now want to Hash a Java Object that I create - do I need to implement the hash function?
You can do that.
Whether you need to do this depends on how you have defined equals
for the class. Specifically, Java's HashMap
, HashSet
and related classes place the following requirement on hashcode()
and equals(Object)
:
- If
a.equals(b)
thena.hashCode() == b.hashCode()
. - While
a
is in aHashSet
or is a key in aHashMap
, the value returned bya.hashCode()
must not change. - if
!a.equals(b)
, then the probability thata.hashCode() == b.hashCode()
should be low, especially ifa
andb
are probably hash keys for the application.
(The last requirement for performance reasons. If you you have a "poor" hash function that results in a high probability that different keys hash the same hashcode, you get lots of collisions. The hash chains will become unbalanced, and you won't get the average O(1)
performance that is normally expected of hash table operations. In the worst case, performance will be O(N)
; i.e. equivalent to a linear search of a linked list.)
Or does Java have a built in way of creating a hash code?
Every class inherits a default hashCode()
method from Object
(unless this is overridden). It uses what is known as an "identity hash code"; i.e. a hash value that is based on the object's identity (its reference). This matches the default implementation of equals(Object)
... which simply uses ==
to compare references.
I know that the default implementation cannot be relied on as it bases the hash function on the memory address which is not constant.
This is incorrect.
The default hashCode()
method returns the "identity hashcode". This is typically based on the object's memory address at some point time, but it is NOT the object's memory address.
In particular, if an object is moved by the garbage collector, its "identity hashcode" is guaranteed not to change. Yes. That's right, it DOES NOT CHANGE ... even though the object was moved!
(How they implement this efficiently is rather clever. See https://stackoverflow.com/a/3796963/139985 for details.)
The bottom line is that the default Object.hashCode()
method satisfies all of the requirements that I listed above. It can be relied on.
回答2:
Question 1)
The Java HashMap
implementation uses the chaining implementation to deal with collisions. Think of it as an array of linked lists.
Question 2
Object
has a default implementation of equals and hashCode
. equals
is implemented as return this == other
and hashcode
is (to all intents and purposes) implemented as assigning a random identifier to each instance and using that as the hashCode
.
As all classes in Java extends Object
, they all inherit these implementations.
Some classes override these implementations by default. String
, as you mentioned, is a very good example. Another is the classes in the collections API - so ArrayList
implements these methods based on the elements it contains.
As far as implementing a good hashCode
, this is a little bit of a dark art. Here's a pretty good summary of best practice.
Your final comment:
I know that the default implementation cannot be relied on as it bases the hash function on the memory address which is not constant.
This is not correct. The default implementation of hashCode
is constant as that is part of the method's contract. From the Javadoc:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
来源:https://stackoverflow.com/questions/20849809/clarifying-facts-behind-javas-implementation-of-hashset-hashmap