Is there an edit distance algorithm that takes “chunk transposition” into account?

后端 未结 6 2003
被撕碎了的回忆
被撕碎了的回忆 2021-02-04 14:54

I put \"chunk transposition\" in quotes because I don\'t know whether or what the technical term should be. Just knowing if there is a technical term for the process would be ve

6条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2021-02-04 14:59

    Have a look at the Jaccard distance metric (JDM). It's an oldie-but-goodie that's pretty adept at token-level discrepancies such as last name first, first name last. For two string comparands, the JDM calculation is simply the number of unique characters the two strings have in common divided by the total number of unique characters between them (in other words the intersection over the union). For example, given the two arguments "JEFFKTYZZER" and "TYZZERJEFF," the numerator is 7 and the denominator is 8, yielding a value of 0.875. My choice of characters as tokens is not the only one available, BTW--n-grams are often used as well.

提交回复
热议问题