machine learning algorithm for spelling check

问题

I have a list of medicine names(regular_list) and a list of new names(new_list).I want to check whether the names in the new_list are already present in the regular_list or not.The issue is that the names new_list could have some typo errors and I want those name to be considered as a match to the regular list. I know that using stringdist is a solution to the problem but I need a machine learning algorithm

回答1:

As it was already mentioned here machine learning to overcome typo errors , machine learning tools are too much for such task, but the simplest possibility would be to merge those approaches.

On one hand, you can compute the edit distance between given word x and each of the dictionary words d_i. Additionaly, you can traing per-word classifier

c(d_i, distance(x,d_i))

returning True (class 1) if a given edit distance has been learned to be sufficient to consider x a missspelled version of d_i. This can give you more general model then not using machine learning, as you can have different thresholds for each dictionary word (some words are more often misspelled then others), but obviously, you have to prepare a training set in form of (misspelled_word, correct_one) (and add also (correct_one, correct_one).

You can use any type of binary classifier for such task, which can work on "real" input data.

来源：https://stackoverflow.com/questions/18374749/machine-learning-algorithm-for-spelling-check

标签

text

machine-learning

stringdist

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!