Improving performance of fuzzy string matching against a dictionary

后端 未结 3 1410
借酒劲吻你
借酒劲吻你 2021-02-08 00:34

So I\'m currently working for with using SecondString for fuzzy string matching, where I have a large dictionary to compare to (with each entry in the dictionary has an associat

相关标签:
3条回答
  • 2021-02-08 01:07

    see this excellent article for explanation and comparison of different fuzzy string matching: http://ntz-develop.blogspot.com/2011/03/fuzzy-string-search.html

    java source code available at https://code.google.com/p/fuzzy-search-tools/

    0 讨论(0)
  • 2021-02-08 01:09

    What your looking for is a BKTree (BK-Tree) combined with the Levenshtein Distance algorithm. The lookup performance in a BKtree depends on how "Fuzzy" your search is. Where fuzzy is defined as the number of distance (edits) between the search word and the matches.

    Here is a good blog on the subject: http://blog.notdot.net/2007/4/Damn-Cool-Algorithms-Part-1-BK-Trees

    Some notes on the performance: http://www.kafsemo.org/2010/08/03_bk-tree-performance-notes.html

    Notes on the http://en.wikipedia.org/wiki/Levenshtein_distance algorithm.

    Also, here is a BK-Tree written in Java. Should give you an idea of the interface: http://code.google.com/p/java-bk-tree/

    0 讨论(0)
  • 2021-02-08 01:18

    Or you may also use a Java Fuzzy HashMap (an extention to java hashMap that allows fuzzy search): http://sourceforge.net/projects/fuzzyhashmap/ I think it is exactly what you need. Here you have a complete description of the data structure: http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=5565628

    0 讨论(0)
提交回复
热议问题