A faster hash function

前端 未结 1 813
灰色年华
灰色年华 2021-02-11 11:02

I\'m trying to implement my own hash function, i add up the ASCII numbers of each string, using java. I find the hash code by finding the mod of the size of the hash table and t

1条回答
  •  广开言路
    2021-02-11 11:37

    I would look at the code for String and HashMap as these have a low collision rate and don't use % and handle negative numbers.

    From the source for String

    public int hashCode() {
        int h = hash;
        if (h == 0 && value.length > 0) {
            char val[] = value;
    
            for (int i = 0; i < value.length; i++) {
                h = 31 * h + val[i];
            }
            hash = h;
        }
        return h;
    }
    

    From the source for HashMap

    /**
     * Retrieve object hash code and applies a supplemental hash function to the
     * result hash, which defends against poor quality hash functions.  This is
     * critical because HashMap uses power-of-two length hash tables, that
     * otherwise encounter collisions for hashCodes that do not differ
     * in lower bits. Note: Null keys always map to hash 0, thus index 0.
     */
    final int hash(Object k) {
        int h = 0;
        if (useAltHashing) {
            if (k instanceof String) {
                return sun.misc.Hashing.stringHash32((String) k);
            }
            h = hashSeed;
        }
    
        h ^= k.hashCode();
    
        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }
    

    As the HashMap is always a power of 2 in size you can use

            hash = (null != key) ? hash(key) : 0;
            bucketIndex = indexFor(hash, table.length);
    

    and

    /**
     * Returns index for hash code h.
     */
    static int indexFor(int h, int length) {
        return h & (length-1);
    }
    

    Using & is much faster than % and only return positive numbers as length is positive.

    0 讨论(0)
提交回复
热议问题