Levenshtein distance: how to better handle words swapping positions?

前端 未结 9 1334
忘掉有多难
忘掉有多难 2021-01-30 02:22

I\'ve had some success comparing strings using the PHP levenshtein function.

However, for two strings which contain substrings that have swapped positions, the algorithm

9条回答
  •  醉酒成梦
    2021-01-30 03:20

    I've been implementing levenshtein in a spell checker.

    What you're asking for is counting transpositions as 1 edit.

    This is easy if you only wish to count transpositions of one word away. However for transposition of words 2 or more away, the addition to the algorithm is worst case scenario !(max(wordorder1.length(), wordorder2.length())). Adding a non-linear subalgorithm to an already quadratic algorithm is not a good idea.

    This is how it would work.

    if (wordorder1[n] == wordorder2[n-1])
    {
      min(workarray[x-1, y] + 1, workarray[x, y-1] + 1, workarray[x-2, y-2]);
    }
      else
    {
      min(workarray[x-1, y] + 1, workarray[x, y-1] + 1);
    }
    

    JUST for touching transpositions. If you want all transpositions, you'd have to for every position work backwards from that point comparing

    1[n] == 2[n-2].... 1[n] == 2[0]....
    

    So you see why they don't include this in the standard method.

提交回复
热议问题