Can CMU Sphinx be set up to recognize ~200 words

倖福魔咒の 提交于 2019-12-17 19:25:28

问题


I have a client who needs an Android App that can recognize spoken commands. From what I understand the built-in voice to text functionality actually sends data to Google's servers which then sends back a text translation. This is a major problem, as the voice data is extremely sensitive (unless if the data is encrypted when it is sent to and from Google - but I doubt it is encrypted).

There are 2 options that I can think of. First is to convert speech-to-text on the Android, though this seems like it would be an extremely expensive operation. The second possibility is to have a local server convert the data for me (I could encrypt the voice data and the translation when it is being sent to and from). Is this something CMU Sphinx could pull off? It may be worth noting that I will also have access to an Asterisk server, which could possibly assist with this (I don't know).

In reality, there should only be ~200 words which will need to be recognized. I would prefer opensource/free software solutions however I am also open to a commercial solution (perhaps FlexT9). Ideally, I can send the audio stream somewhere, get back a String which is the text, and I can then parse and do other things with the String.

I haven't done much android or any speech recognition development in the past, so I'm hoping someone can at least point me in the right direction. Thanks!


回答1:


CMUSphinx is an open source speech recognition toolkit you can use to build your application. It contains tools, libraries and data which will enable you to build a speech application. You can learn more about CMUSphinx on the website above.

On Android you have several options to use CMUSphinx:

  1. Recognize audio on the device. For that you can compile Pocketsphinx engine for android. For details see this blog post.

  2. Recognize audio on server. As a server you can use either Pocketsphinx or Sphinx4. You can send audio in compressed flac format or extract speech recognition features on device and send feature stream to the server.

CMUSphinx provides you several acoustic models which will enable you to recognize audio in several languages like English, French, Mandarin, German, Dutch, Russian.

You can also improve the recognition result with adaptation tools.

If you have any questions on CMUSphinx you are welcome to ask in our community forums.




回答2:


Closed source, but free, is the Microsoft speech engines. For some background see What is the difference between System.Speech.Recognition and Microsoft.Speech.Recognition?. For some more background you can try https://stackoverflow.com/a/4217638/90236

The complete SDK for the Microsoft Server Speech Platform 11 is available at http://www.microsoft.com/download/en/details.aspx?id=27226. The speech engine is a free download.



来源:https://stackoverflow.com/questions/9073856/can-cmu-sphinx-be-set-up-to-recognize-200-words

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!