Optimizing Jaro-Winkler algorithm

前端 未结 6 2012
孤独总比滥情好
孤独总比滥情好 2021-02-05 13:45

I have this code for Jaro-Winkler algorithm taken from this website. I need to run 150,000 times to get distance between differences. It takes a long time, as I run on an Androi

6条回答
  •  北恋
    北恋 (楼主)
    2021-02-05 14:08

    Instead returning the common characters using GetCommonCharacters method, use a couple of arrays to keep the matches, similarly to the C version here https://github.com/miguelvps/c/blob/master/jarowinkler.c

    /*Calculate matching characters*/
    for (i = 0; i < al; i++) {
        for (j = max(i - range, 0), l = min(i + range + 1, sl); j < l; j++) {
            if (a[i] == s[j] && !sflags[j]) {
                sflags[j] = 1;
                aflags[i] = 1;
                m++;
                break;
            }
        }
    }
    

    Another optimization is to pre-calculate a bitmask for each string. Using that, check if the current character on the first string is present on the second. This can be done using efficient bitwise operations.

    This will skip calculating the max/min and looping for missing characters.

提交回复
热议问题