问题
I have a large collection (n = 20,000,000) of BigInteger, representing bit arrays of length 225. Given a single BigInteger, I want to find the x BigInteger within my collection below a certain Hamming distance.
Currently, I convert all BigInteger to byte arrays:
bHashes = new byte[hashes.Length][];
for (int i = 0; i < hashes.Length; i++)
{
bHashes[i] = hashes[i].ToByteArray();
}
I then create a Hamming distance lookup array:
int[][] lookup = new int[256][];
for (int i = 0; i < 256; i++) {
lookup[i] = new int[256];
for (int j = 0; j < 256; j++)
{
lookup[i][j] = HammingDistance(i, j);
}
}
static int HammingDistance(BigInteger a, BigInteger b)
{
BigInteger n = a ^ b;
int x = 0;
while (n != 0)
{
n &= (n - 1);
x++;
}
return x;
}
Finally, I calculate the total Hamming distance by calculating the sum of the Hamming distances between the bytes. My time measures have shown that "manually" adding the distances was faster than using a loop:
static List<int> GetMatches(byte[] a)
{
List<int> result = new List<int>();
for (int i = 0; i < bHashes.Length; i++)
{
byte[] b = bHashes[i];
int dist = lookup[a[0]][b[0]] +
lookup[a[1]][b[1]] +
lookup[a[2]][b[2]] +
lookup[a[3]][b[3]] +
lookup[a[4]][b[4]] +
lookup[a[5]][b[5]] +
lookup[a[6]][b[6]] +
lookup[a[7]][b[7]] +
lookup[a[8]][b[8]] +
lookup[a[9]][b[9]] +
lookup[a[10]][b[10]] +
lookup[a[11]][b[11]] +
lookup[a[12]][b[12]] +
lookup[a[13]][b[13]] +
lookup[a[14]][b[14]];
if (dist < THRESHOLD) result.Add(i);
}
return result;
}
Preprocessing time is irrelevant, only the execution time of the GetMatches() function matters. Using the method above, my system needs ~1,2s which, unfortunately, is way to long for my needs. Is there a faster way?
来源:https://stackoverflow.com/questions/40676129/fastest-way-to-calculate-hamming-distance-in-c-sharp