Why does Java's hashCode() in String use 31 as a multiplier?

前端 未结 13 2271
星月不相逢
星月不相逢 2020-11-22 01:34

Per the Java documentation, the hash code for a String object is computed as:

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
<         


        
相关标签:
13条回答
  • 2020-11-22 02:26

    By multiplying, bits are shifted to the left. This uses more of the available space of hash codes, reducing collisions.

    By not using a power of two, the lower-order, rightmost bits are populated as well, to be mixed with the next piece of data going into the hash.

    The expression n * 31 is equivalent to (n << 5) - n.

    0 讨论(0)
  • 2020-11-22 02:27

    In latest version of JDK, 31 is still used. https://docs.oracle.com/en/java/javase/12/docs/api/java.base/java/lang/String.html#hashCode()

    The purpose of hash string is

    • unique (Let see operator ^ in hashcode calculation document, it help unique)
    • cheap cost for calculating

    31 is max value can put in 8 bit (= 1 byte) register, is largest prime number can put in 1 byte register, is odd number.

    Multiply 31 is <<5 then subtract itself, therefore need cheap resources.

    0 讨论(0)
  • 2020-11-22 02:29

    I'm not sure, but I would guess they tested some sample of prime numbers and found that 31 gave the best distribution over some sample of possible Strings.

    0 讨论(0)
  • 2020-11-22 02:29

    This is because 31 has a nice property – it's multiplication can be replaced by a bitwise shift which is faster than the standard multiplication:

    31 * i == (i << 5) - i
    
    0 讨论(0)
  • 2020-11-22 02:32

    Goodrich and Tamassia computed from over 50,000 English words (formed as the union of the word lists provided in two variants of Unix) that using the constants 31, 33, 37, 39, and 41 will produce fewer than 7 collisions in each case. This may be the reason that so many Java implementations choose such constants.

    See section 9.2 Hash Tables (page 522) of Data Structures and Algorithms in Java.

    0 讨论(0)
  • 2020-11-22 02:32

    Neil Coffey explains why 31 is used under Ironing out the bias.

    Basically using 31 gives you a more even set-bit probability distribution for the hash function.

    0 讨论(0)
提交回复
热议问题