What is a good hash function for a collection (i.e., multi-set) of integers?

后端 未结 6 1247
不知归路
不知归路 2021-02-05 06:03

I\'m looking for a function that maps a multi-set of integers to an integer, hopefully with some kind of guarantee like pairwise independence.

Ideally, memory usage woul

6条回答
  •  臣服心动
    2021-02-05 06:29

    Min-hashing should work here. Apply permutation, maintain a small multiset of n minimal elements, pick the biggest.

    Elaborating: this is a simple way to work in O(1) time and space. You need something like a priority queue, without making the link to the initial values too obvious. So you order your priority queue according to some elaborate key, which is equivalent to running a priority queue on a permutation of the normal sort order. Make the queue keep track of multiplicity so that the selected elements also form a multiset.

    That said, I'm not sure this disperses well enough (and running multiple permutations might become costly), so maybe build on Bradley's answer instead. Here is a tweak so that repeated elements don't cancel out:

    xor(int_hash(x_n, multiplicity_n) foreach n)
    

提交回复
热议问题