Yes, I\'m aware that speech recognition is fairly complicated (as an understatement). What I\'m looking for is a method for distinguishing between maybe 20-30 phras
There are some open source project in speech recognition:
Both have decoder, training, language model toolkits. Eveything to build a complete and robust speech recognizer. Voxforge has acoustic and language models for both open source speech recognition toolkits.
Some time ago, I read a whitepaper about a limited vocabulary system, which used a simple recognition process. The system divided each utterance into a small number of bins (6 in time, and 4 in magnitude, if I remember correctly, for 24 total), and all it did was count the number of sample audio measurements in each bin. There was a fuzzy logic rule base which then interpreted each utterances 24 bin counts, and generated an interpretation.
I imagine that (for some applications) a simple matching process might work just as well, in which the 24 bin counts of the current utterance are simple matched against those of each of your stored prototypes, and the one with the least overall difference is the winner.