Fastest way to calculate Hamming Distance in C#

风格不统一 提交于 2019-12-11 04:13:05


I have a large collection (n = 20,000,000) of BigInteger, representing bit arrays of length 225. Given a single BigInteger, I want to find the x BigInteger within my collection below a certain Hamming distance.

Currently, I convert all BigInteger to byte arrays:

bHashes = new byte[hashes.Length][];
for (int i = 0; i < hashes.Length; i++)
    bHashes[i] = hashes[i].ToByteArray();

I then create a Hamming distance lookup array:

int[][] lookup = new int[256][];

for (int i = 0; i < 256; i++) {
    lookup[i] = new int[256];
    for (int j = 0; j < 256; j++)
        lookup[i][j] = HammingDistance(i, j);

static int HammingDistance(BigInteger a, BigInteger b)
    BigInteger n = a ^ b;

    int x = 0;
    while (n != 0)
        n &= (n - 1);
    return x;

Finally, I calculate the total Hamming distance by calculating the sum of the Hamming distances between the bytes. My time measures have shown that "manually" adding the distances was faster than using a loop:

static List<int> GetMatches(byte[] a)
    List<int> result = new List<int>();
    for (int i = 0; i < bHashes.Length; i++)
        byte[] b = bHashes[i];
        int dist = lookup[a[0]][b[0]] +
                   lookup[a[1]][b[1]] +
                   lookup[a[2]][b[2]] +
                   lookup[a[3]][b[3]] +
                   lookup[a[4]][b[4]] +
                   lookup[a[5]][b[5]] +
                   lookup[a[6]][b[6]] +
                   lookup[a[7]][b[7]] +
                   lookup[a[8]][b[8]] +
                   lookup[a[9]][b[9]] +
                   lookup[a[10]][b[10]] +
                   lookup[a[11]][b[11]] +
                   lookup[a[12]][b[12]] +
                   lookup[a[13]][b[13]] +
        if (dist < THRESHOLD) result.Add(i);
    return result;

Preprocessing time is irrelevant, only the execution time of the GetMatches() function matters. Using the method above, my system needs ~1,2s which, unfortunately, is way to long for my needs. Is there a faster way?

