How to recognize a phrase from a voice file

后端 未结 3 1526
被撕碎了的回忆
被撕碎了的回忆 2020-12-18 10:45

How to get the engine to successfully recognize a phrase from a voice file (wav/mp3/etc..)?

For example, if I\'ll have a voice file and a written text of the context

相关标签:
3条回答
  • 2020-12-18 11:10

    According to the MSDN article Getting Started with Speech Recognition.

    The steps you need to do are(from article). Note the create recognition grammer step. The article goes on to suggest using the GrammerBuilder or Choices Classes.

    A speech recognition application will typically perform the following basic operations:
    - Start the speech recognizer.
    - Create a recognition grammar.
    - Load the grammar into the speech recognizer.
    - Register for speech recognition event notification.
    - Create a handler for the speech recognition event.

    0 讨论(0)
  • 2020-12-18 11:16

    If you are trying to convert audio files using the Microsoft speech engines, you have to use some care. First, the only format supported is WAV (it can be encoded as PCM, ALaw, or uLaw), but you must verify that your file is in a format supported by your recognizer. You also must verify the sample rate. The recognizers only support a fixed set of sample rates. On my machine,

    • 8 bits per sample
    • single channel mono
    • 22,050 samples per second
    • PCM encoding

    works well. See https://stackoverflow.com/a/6203533/90236 for some more info. You may have to re-sample or re-encode the WAV files using a tool like audacity. See https://stackoverflow.com/a/9467044/90236.

    A simple example to get you started is in SAPI and Windows 7 Problem.

    Last, (I always repeat this point, sorry) there is a great article about programming recognition in Windows .NET. See http://msdn.microsoft.com/en-us/magazine/cc163663.aspx, it is a little out of date, but a great introduction.

    0 讨论(0)
  • 2020-12-18 11:18

    It seems you need to look for a specific word in a long file. This technique is called "Keyword Spotting", it's quite different from speech recognition, way more efficient. Obviosly you do not need to transcribe the whole file to search a word in it, you can quickly scan through the file. Microsoft Speech Recognition engine have very limited support for keyword spotting.

    Open source engines like CMUSphinx could be used to implement the keyword spotting efficiently. See for the further references the information on how to implement wake-up listening with pocketsphinx.

    For the more information on the underlying algorithms see ACOUSTIC KEYWORD SPOTTING IN SPEECH WITH APPLICATIONs TO DATA MINING

    0 讨论(0)
提交回复
热议问题