When I implement a collection that uses hashes for optimizing access, should I cache the hash values or assume an efficient implementation of hashCode()
?
On
On the other hand, when I implement a class that overrides hashcode(), should I assume that the collection (i.e. HashSet) caches the hash?
No, you should not make any assumptions beyond the scope of the class you are writing.
Of course you should try to make your hashCode cheap. If it isn't, and your class is immutable, create the hashCode on initialization or lazily upon the first request (see java.lang.String). If your class is not immutable, I don't see any other option than to re-calculate the hashCode every time.
Why do you want to cache it? You need to ask objects what their hashcode is while you're working with it to allocate it to a hash bucket (and any objects that are in the same bucket that may have the same hashcode), but then you can forget it.
You could store objects in a wrapper HashNode
or something, but I would try implementing it first without caching (just like HashSet
et al does) and see if you need the added performance and complexity before going there.
I'd say in most cases you can rely on efficient implementations of hashCode()
. AFAIK, that method is only invoked on lookup methods (like contains
, get
etc.) or methods that change the collection (add
/put
, remove
etc.).
Thus, in most cases there shouldn't be any need to cache hashes yourself.
When designing Guava's ImmutableSet
and ImmutableMap
classes, we opted not to cache hash codes. This way, you'll get better performance from hash code caching when and only when you care enough to do the caching yourself. If we cached them ourselves, we'd be costing you extra time and memory even in the case that you care deeply about speed and space!
It's true that HashMap
does this caching, but it was HashMap
's author (Josh Bloch) who strongly suggested we not follow that precedent!
Edit: oh, also, if your hashCode()
is slow, the caching by the collection only addresses half of the problem anyway, as hashCode()
still must be invoked on the object passed in to get()
no matter what.
Considering that java.lang.String caches its hash, i guess that hashcode() is supposed to be fast.
So as first approach, I would not cache hashes in my collection. In my objects that I use, I would not cache hash code unless it is oviously slow, and only do it if profiling tell me so.
If my objects will be used by others, i would probubly consider cachnig hash codes sooner (but needs measurements anyway).