Efficiently find binary strings with low Hamming distance in large set
Problem: Given a large (~100 million) list of unsigned 32-bit integers, an unsigned 32-bit integer input value, and a maximum Hamming Distance , return all list members that are within the specified Hamming Distance of the input value. Actual data structure to hold the list is open, performance requirements dictate an in-memory solution, cost to build the data structure is secondary, low cost to query the data structure is critical. Example: For a maximum Hamming Distance of 1 (values typically will be quite small) And input: 00001000100000000000000001111101 The values: