Enabling soundex/metaphone for non-English characters

大兔子大兔子 提交于 2019-11-28 09:09:05
ire_and_curses

I'm not an expert in this area, but your requirements seem quite difficult to me. Soundex was specifically designed for English sounds as well as characters. I don't think it will perform well for non-English languages. See for example the responses to this related question.

Double-Metaphone is an attempt to deal with much more complex variations than Soundex or Metaphone, and was designed to handle irregularities in a range of languages. It might be sufficient for your needs. There is a list of library implementations on the linked page.

Support for other languages in Lucene is based on the concept of Analyzers. Lucene comes with a set of analyzers for different languages (although I couldn't find the default list), but the quality may be quite variable.

There are some good references on Wikipedia, starting from the Soundex article. I don't know whether there are existing libraries designed to handle such a wide variety of languages.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!