Comparing the HashMap
and Hashtable
source code in JDK 1.6, I saw the below code inside HashMap:
/**
* The default initial capacity -
The requirement for the table size to be a power of two is an implementation detail, not known to the users of the class -- that is why the c'tor silently adjusts the value to the next larger power of two instead of flagging an error.
The Hashtable implementation assumes that the hash may not be evenly distributed, so it tries to use a number of bins that is prime in the hope of avoiding peaks in the frequency distribution of the hash.
The combination of these two implementation details leads to bad performance.
(e.g. a primitive hash function would be
int hash(String s, int nBins) {
return s[0] % nBins;
}
If nBins is 32, e
and E
end up in the same bin, so the distribution of hash values correlates with the distribution of occurence of letters, which has distinct peaks -- so the frequency distribution would have a peak at 32.)