Algorithm to test minimum hamming distance against a set?

点点圈 提交于 2019-12-03 17:36:42

I think that problem may be resolved by the splitting each numbers from S to substrings such that the query results must have at least 1 partition whose Hamming distance is no more than 1 with the corresponding partitions of the query.

This algorithm is described in the article: Alex X. Liu, Ke Shen, Eric Torng. Large scale Hamming distance query processing, 2011. The authors are called the algorithm as HEngine. I try to explain some intuition.

Lets N - bit count of the number (it dimensionality)

k - query Hamming distance

r-cut(α) - function of splitting number α into r substring {α1, α2, ..., αr} where the first r − (m mod r) substrings have length ⌊m/r⌋ and the last m mod r substrings have length ⌈m/r⌉

The algorithm is based on the theorem:

For any two binary strings β and γ such that HD(β, γ) ≤ k, consider r-cut(β) and r-cut(γ) where r ≥ ⌊k/2⌋ + 1. It must be the case that HD(βi, γi) ≤ 1 for at least q = r − ⌊k/2⌋ different values of i.

For example, we have binary string of length N = 8 bits. And we would like to find substrings with k = 2.

α = 10001110
β = 10100110
HD(α, β) = 2

Then minimum value of r = ⌊2/2⌋ + 1 = 2. In this case r-cut(α,β) produces 2 substrings of length 4 bits:

    α1 = 1000    α2 = 1110
    β1 = 1010    β2 = 0110
HD(α1, β1) = 1,  HD(α2, β2) = 1

q = 2 - ⌊2/2⌋ = 1.

Also the authors introduced the next theorem:

Consider any string β ∈ T such that HD(α, β) ≤ k. Given any r ≥ ⌊k/2⌋ + 1, it follows that at least one signature β-signature matches its compatible signature α-signature.

The basic idea of the algorithm is to preprocess S to facilitate finding all strings β in S that satisfy the signature match property and then verify which of these strings actually are within Hamming distance k of α.

I suppose you should prepare the set of S to subtables using HEngine algorithm, and split Q to partitions the same way. And then perform the search by corresponding partitions taking into account that the Hamming distance is no more than 1 with the corresponding partitions.

Please I advise you to see more details in the article.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!