soundex

Levenshtein distance based methods Vs Soundex

北城以北 提交于 2019-11-27 12:53:35
As per this comment in a related thread, I'd like to know why Levenshtein distance based methods are better than Soundex. Soundex is rather primitive - it was originally developed to be hand calculated. It results in a key that can be compared. Soundex works well with western names, as it was originally developed for US census data. It's intended for phonetic comparison. Levenshtein distance looks at two values and produces a value based on their similarity. It's looking for missing or substituted letters. Basically Soundex is better for finding that "Schmidt" and "Smith" might be the same

Finding similar sounding text in VBA [closed]

戏子无情 提交于 2019-11-27 10:37:06
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 6 years ago . My manager tells me that there is a way to evaluate names that are spelled differently but sound similar in the way they are pronounced. Ideally, we want to be able to evaluate a user-entered search name and return exact matches as well as "similar sounding" names. He called the process "Soundits" but I cannot

Soundex algorithm in Python (homework help request)

坚强是说给别人听的谎言 提交于 2019-11-27 07:30:45
问题 The US census bureau uses a special encoding called “soundex” to locate information about a person. The soundex is an encoding of surnames (last names) based on the way a surname sounds rather than the way it is spelled. Surnames that sound the same, but are spelled differently, like SMITH and SMYTH, have the same code and are filed together. The soundex coding system was developed so that you can find a surname even though it may have been recorded under various spellings. In this lab you

Enabling soundex/metaphone for non-English characters

送分小仙女□ 提交于 2019-11-27 05:49:32
问题 I've been studying soundex, metaphone and other string search techniques the past few days, and in my understanding both algorithms work well in handling non-English words transliterated to English. However the requirement that I have would be for such search to work in the original, untransliterated languages, accomodating alphabets such as German, Norwegian, and even Cyrilic alphabets. Are there any search algorithms capable of handling these alphabets completely? Or am I better off using