Per the Java documentation, the hash code for a String
object is computed as:
s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
<
By multiplying, bits are shifted to the left. This uses more of the available space of hash codes, reducing collisions.
By not using a power of two, the lower-order, rightmost bits are populated as well, to be mixed with the next piece of data going into the hash.
The expression n * 31
is equivalent to (n << 5) - n
.
In latest version of JDK, 31 is still used. https://docs.oracle.com/en/java/javase/12/docs/api/java.base/java/lang/String.html#hashCode()
The purpose of hash string is
^
in hashcode calculation document, it help unique)31 is max value can put in 8 bit (= 1 byte) register, is largest prime number can put in 1 byte register, is odd number.
Multiply 31 is <<5 then subtract itself, therefore need cheap resources.
I'm not sure, but I would guess they tested some sample of prime numbers and found that 31 gave the best distribution over some sample of possible Strings.
This is because 31 has a nice property – it's multiplication can be replaced by a bitwise shift which is faster than the standard multiplication:
31 * i == (i << 5) - i
Goodrich and Tamassia computed from over 50,000 English words (formed as the union of the word lists provided in two variants of Unix) that using the constants 31, 33, 37, 39, and 41 will produce fewer than 7 collisions in each case. This may be the reason that so many Java implementations choose such constants.
See section 9.2 Hash Tables (page 522) of Data Structures and Algorithms in Java.
Neil Coffey explains why 31 is used under Ironing out the bias.
Basically using 31 gives you a more even set-bit probability distribution for the hash function.