Improving performance of fuzzy string matching against a dictionary

后端未结

关注

 3  1409

借酒劲吻你 2021-02-08 00:34

So I\'m currently working for with using SecondString for fuzzy string matching, where I have a large dictionary to compare to (with each entry in the dictionary has an associat

3条回答

既然无缘 (楼主)

2021-02-08 01:09

What your looking for is a BKTree (BK-Tree) combined with the Levenshtein Distance algorithm. The lookup performance in a BKtree depends on how "Fuzzy" your search is. Where fuzzy is defined as the number of distance (edits) between the search word and the matches.

Here is a good blog on the subject: http://blog.notdot.net/2007/4/Damn-Cool-Algorithms-Part-1-BK-Trees

Some notes on the performance: http://www.kafsemo.org/2010/08/03_bk-tree-performance-notes.html

Notes on the http://en.wikipedia.org/wiki/Levenshtein_distance algorithm.

Also, here is a BK-Tree written in Java. Should give you an idea of the interface: http://code.google.com/p/java-bk-tree/

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...