How do speech recognition algorithms recognize homophones?

十年热恋 提交于 2020-01-05 15:19:08

问题


I was pondering this question earlier. What clues do modern algorithms (specifically those that convert voice to text) use to determine which homophone was said (E.g. to, too, or two?)

Do they use contextual clues? Sentence structure? Perhaps there are slight differences in the way each word is usually pronounced (for example, I usually hold the o sound longer in two than in to). A combination of the first two seems most plausible.


回答1:


Do they use contextual clues?

Yes, ASR systems use cross-word context. For example if previous word is "going" the next word will likely to be "to" not "two". ASR systems account for probabilities and select the best probable decoding variant.

Sentence structure?

Yes, ASR systems use more advanced language models as well to predict probable words given the context.

Perhaps there are slight differences in the way each word is usually pronounced (for example, I usually hold the o sound longer in two than in to).

That too. Actually "too" and "to" are pronounced quite differently. "to" is often reduced to shwa.

If you are interested in speech recognition algorithms, it may have sense to read ASR book or check online course. See for details

https://sourceforge.net/p/cmusphinx/discussion/speech-recognition/thread/3ea89abf/



来源:https://stackoverflow.com/questions/14684594/how-do-speech-recognition-algorithms-recognize-homophones

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!