问题
I'm using pocketsphinx with raspberry pi for home automation. I've written a simple JSGF grammar file with the supported commands. Now, I want to use an activation phrase such as "hey computer" prior to the commands, to avoid false detections and only perform speech recognition once the activation phrase has been spoken.
If I'm not getting this wrong, pocketsphinx supports two modes for speech recognition: keyword spotting mode, and language model / JSGF grammar mode.
In pocketsphinx FAQ when addressing the issue of how to reject out-of-grammar words, it says:
If you want to recognize several commands, you can use keyword spotting mode or keyword activation mode combined with the switch to grammar to perform actual operation.
My question is, how exactly is this "switching" from keyword spotting mode to grammar mode implemented? (what should I do to achieve it?). Related to that, what's the difference between "keyword spotting mode" and "keyword activation mode"?
Thanks!
回答1:
A quote from tutorial:
Developer can configure several “search” objects with different grammars and language models and switch them in runtime to provide interactive experience for the user.
There are different possible search modes:
- keyword - efficiently looks for keyphrase and ignores other speech. allows to configure detection threshold
- grammar - recognizes speech according to JSGF grammar. Unlike keyphrase grammar search doesn't ignore words which are not in grammar but tries to recognize them.
- ngram/lm - recognizes natural speech with a language model.
- allphone - recognizes phonemes with a phonetic language model.
Each search has a name and can be referenced by a name, names are application-specific. The function ps_set_search
allows to activate the search previously added by a name.
To add the search one needs to point to the grammar/language model describing the search. The location of the grammar is specific to the application. If only a simple recognition is required it is sufficient to add a single search or just configure the required mode with configuration options.
The exact design of a searches depends on your application. For example, you might want to listen for activation keyword first and once keyword is recognized switch to ngram search to recognize actual command. Once you recognized the command you can switch to grammar search to recognize the confirmation and then switch back to keyword listening mode to wait for another command.
The code to switch searches in Python looks like this:
# Init decoder
config = Decoder.default_config()
config.set_string('-hmm', path.join(MODELDIR, 'en-us/en-us'))
config.set_string('-dict', path.join(MODELDIR, 'en-us/cmudict-en-us.dict'))
decoder = Decoder(config)
# Add searches
decoder.set_kws('keyword', 'keyword.list')
decoder.set_lm_file('lm', 'query.lm')
decoder.set_search('keyword')
import pyaudio
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1024)
stream.start_stream()
in_speech_bf = False
decoder.start_utt()
while True:
buf = stream.read(1024)
if buf:
decoder.process_raw(buf, False, False)
if decoder.get_in_speech() != in_speech_bf:
in_speech_bf = decoder.get_in_speech()
if not in_speech_bf:
decoder.end_utt()
# Print hypothesis and switch search to another mode
print 'Result:', decoder.hyp().hypstr
if decoder.get_search() == 'keyword':
decoder.set_search('lm')
else:
decoder.set_search('keyword')
decoder.start_utt()
else:
break
decoder.end_utt()
来源:https://stackoverflow.com/questions/39069614/pocketsphinx-how-to-switch-from-keyword-spotting-to-grammar-mode