why not:
public native long hashCode();
instead of:
public native int hashCode();
for higher chance of a
Because the maximum length of an array is Integer.MAX_VALUE
.
Since the prime use of hashCode()
is to determine which slot to insert an object into in the backing array of a HashMap
/Hashtable
, a hashcode > Integer.MAX_VALUE
would not be able to be stored in the array.
Anyway, the hash code value will be used to determine a number of row in a table which is relatively small value.
In HashMap
, for instance, the default table contains 256 rows only 16 rows (Sun JDK 1.6.0_17). This means that the row number is determined in the way like this:
int rowNumber = obj.hashCode() % rowsCount;
So, the real distribution is from 0 to rowsCount
.
UPD: I remember the implementation of ConcurrentHashMap
. In a nutshell, ConcurrentHashMap
contains many relatively small tables. At first the hashCode
function is used to determine the table number, and after that the same function is used to determine a row in the selected table.
This approach removes the limitation of array size (and even allows to build distributed hash table).
So, I incline to the conclusion that hashCode
returns int
because it covers the vast majority of use cases.
I'd assume it's a balance of computation cost vs. hash range. Hashcodes are so frequently referenced that pushing around twice as much data every time you need a hash would be expensive, especially if you consider more common use cases -
for example - if you create a small hash with 10, or 100, or 1000 values, the difference in the number of hash collisions you're going to see will be extremely negligible. For larger hashes, ... well, think of how big a hash will need to be for 10**32 values to start having frequent collisions, and whether that's even possible to do in a JVM given the amount of memory you'd need.