What is a good hash function for a collection (i.e., multi-set) of integers?

后端 未结 6 1241
不知归路
不知归路 2021-02-05 06:03

I\'m looking for a function that maps a multi-set of integers to an integer, hopefully with some kind of guarantee like pairwise independence.

Ideally, memory usage woul

6条回答
  •  攒了一身酷
    2021-02-05 06:50

    Reverse-bits.

    For example 00001011 become 11010000. Then, just SUM all the reversed set elements.


    If we need O(1) on insert/delete, the usual SUM will work (and that's how Sets are implemented in Java), though not well distributed over sets of small integers.

    In case our set will not be uniformly distributed (as it usually is), we need mapping N->f(N), so that f(N) would be uniformly distributed for the expected data sample. Usually, data sample contains much more close-to-zero numbers than close-to-maximum numbers. In this case, reverse-bits hash would distribute them uniformly.

    Example in Scala:

    def hash(v: Int): Int = {
            var h = v & 1
            for (i <- 1 to 31) {
                    h <<= 1;
                    h |= ((v >>> i) & 1)
            }
            h
    }
    def hash(a: Set[Int]): Int = {
            var h = 0
            for (e: Int <- a) {
                    h += hash(e);
            }
            h
    }
    

    But the hash of our multi-set will not be uniform, though much better than simple SUM.

提交回复
热议问题