Improving performance of fuzzy string matching against a dictionary

后端 未结 3 1409
借酒劲吻你
借酒劲吻你 2021-02-08 00:34

So I\'m currently working for with using SecondString for fuzzy string matching, where I have a large dictionary to compare to (with each entry in the dictionary has an associat

3条回答
  •  既然无缘
    2021-02-08 01:09

    What your looking for is a BKTree (BK-Tree) combined with the Levenshtein Distance algorithm. The lookup performance in a BKtree depends on how "Fuzzy" your search is. Where fuzzy is defined as the number of distance (edits) between the search word and the matches.

    Here is a good blog on the subject: http://blog.notdot.net/2007/4/Damn-Cool-Algorithms-Part-1-BK-Trees

    Some notes on the performance: http://www.kafsemo.org/2010/08/03_bk-tree-performance-notes.html

    Notes on the http://en.wikipedia.org/wiki/Levenshtein_distance algorithm.

    Also, here is a BK-Tree written in Java. Should give you an idea of the interface: http://code.google.com/p/java-bk-tree/

提交回复
热议问题