how to recognize similar words with difference in spelling

前端 未结 8 1854
逝去的感伤
逝去的感伤 2020-12-02 02:05

I want to filter out duplicate customer names from a database. A single customer may have more than one entry to the system with the same name but with little difference in

相关标签:
8条回答
  • 2020-12-02 02:32

    I would consider writing something such as the "famous" python spell checker.

    http://norvig.com/spell-correct.html

    This will take a word and find all possible alternatives based on missing letters, adding letters, swapping letters, etc.

    0 讨论(0)
  • 2020-12-02 02:36

    The Double Metaphone algorithm, published in 2000, is a new and improved version of the Soundex algorithm that was patented in 1918.

    The article has links to Double Metaphone implementations in many languages.

    0 讨论(0)
  • 2020-12-02 02:37

    I would recommend Soundex and derived algorithms over Lev distance for this solution. Levenstein distance more appropriate for spell checking solutions imho.

    0 讨论(0)
  • 2020-12-02 02:38

    Look into soundex. It's a pretty standard library in most languages that does what you require, i.e. algorithmically identify phonetic similarity. http://en.wikipedia.org/wiki/Soundex

    0 讨论(0)
  • 2020-12-02 02:41

    There is a very nice R (just search for "R" in Google) package for Record Linkage. The standard examples target exactly your problem: R RecordLinkage

    The C-Code for Soundex etc. is taken directly from PostgreSQL!

    0 讨论(0)
  • 2020-12-02 02:42

    You might want to google for phonetic similarity algorithm and you'll find plenty of information about this. Including this article on Codeproject about implementing a solution in C#.

    0 讨论(0)
提交回复
热议问题