Explanation of HashMap#hash(int) method

后端 未结 2 1355
余生分开走
余生分开走 2020-12-01 07:58

Can someone please explain to me the static HashMap#hash(int) method?

What\'s the justification behind it to generate uniformly distributed hashes?

/         


        
相关标签:
2条回答
  • 2020-12-01 08:20

    I don't know how all the shifting works, but the motivation is laid out in the comments:

    The way the HashMap is implemented relies on the hashCode function being sufficiently well implemented. In particular, the lower bits of the hash value should be distributed evenly. If you have many collisions on the lower bits, the HashMap will not perform well.

    Because the implementation of hashCode is outside of the control of HashMap (every object can implement their own), they supply an additional hash function that shifts the object's hashCode around a little to ensure that the lower bits are distributed more randomly. Again, I have no idea how this works exactly (or how effective it is), but I assume it depends on at least the higher bits being distributed equally (it seems to mesh the higher bits into the lower bits).

    So what this does is to try to minimize collisions (and thus improve performance) in the presence of poorly implemented hashCode methods.

    0 讨论(0)
  • 2020-12-01 08:41

    >>> is the logical right shift (no sign-extension) (JLS 15.19 Shift Operators), and ^ is the bitwise exclusive-or (JLS 15.22.1 Integer Bitwise Operators).

    As to why this is done, the documentation offers a hint: HashMap uses power-of-two length tables, and hashes keys by masking away the higher bits and taking only the lower bits of their hash code.

    // HashMap.java -- edited for conciseness
    static int indexFor(int h, int length) {
        return h & (length-1);
    }
    
    public V put(K key, V value) {
        int hash = hash(key.hashCode());
        int index = indexFor(hash, table.length);
        // ...
    }
    

    So hash() attempts to bring relevancy to the higher bits, which otherwise would get masked away (indexFor basically discards the higher bits of h and takes only the lower k bits where length == (1 << k)).

    Contrast this with the way Hashtable (which should have NOT a power-of-two length table) uses a key's hash code.

    // Hashtable.java -- edited for conciseness
    public synchronized V get(Object key) {
        int hash = key.hashCode();
        int index = (hash & 0x7FFFFFFF) % table.length;
        // ...
    }
    

    By doing the more expensive % operation (instead of simple bit masking), the performance of Hashtable is less sensitive to hash codes with poor distribution in the lower bits (especially if table.length is a prime number).

    0 讨论(0)
提交回复
热议问题