Continuous Speech Recognition on browser like “ok google” or “hey siri”

问题

I am doing a POC and my requirement is that I want to implement the feature like OK google or Hey Siri on browser.

I am using the Chrome Browser's Web speech api. The things I noticed that I can't continuous the recognition as it terminates automatically after a certain period of time and I know its relevant because of security concern. I just does another hack like when the SpeechReognition terminates then on its end event I further start the SpeechRecogntion but it is not the best way to implement such a solution because suppose if I am using the 2 instances of same application on the different browser tab then It doesn't work or may be I am using another application in my browser that uses the speech recognition then both the application doesn't behave the same as expected. I am looking for a best approach to solve this problem.

Thanks in advance.

回答1:

Since your problem is that you can't run the SpeechRecognition continuously for long periods of time, one way would be to start the SpeechRecognition only when you get some input in the mic.

This way only when there is some input, you will start the SR, looking for your magic_word.
If the magic_word is found, then you will be able to use the SR normally for your other tasks.

This can be detected by the WebAudioAPI, which is not tied by this time restriction SR suffers from. You can feed it by an LocalMediaStream from MediaDevices.getUserMedia.

For more info, on below script, you can see this answer.

Here is how you could attach it to a SpeechRecognition:

const magic_word = ##YOUR_MAGIC_WORD##;

// initialize our SpeechRecognition object
let recognition = new webkitSpeechRecognition();
recognition.lang = 'en-US';
recognition.interimResults = false;
recognition.maxAlternatives = 1;
recognition.continuous = true;

// detect the magic word
recognition.onresult = e => {
    // extract all the transcripts
    var transcripts  = [].concat.apply([], [...e.results]
      .map(res => [...res]
        .map(alt => alt.transcript)
      )
    );
  if(transcripts.some(t => t.indexOf(magic_word) > -1)){
    //do something awesome, like starting your own command listeners
  }
  else{
    // didn't understood...
  }
}
// called when we detect silence
function stopSpeech(){
    recognition.stop();
}
// called when we detect sound
function startSpeech(){
    try{ // calling it twice will throw...
      recognition.start();
  }
  catch(e){}
}
// request a LocalMediaStream
navigator.mediaDevices.getUserMedia({audio:true})
// add our listeners
.then(stream => detectSilence(stream, stopSpeech, startSpeech))
.catch(e => log(e.message));


function detectSilence(
  stream,
  onSoundEnd = _=>{},
  onSoundStart = _=>{},
  silence_delay = 500,
  min_decibels = -80
  ) {
  const ctx = new AudioContext();
  const analyser = ctx.createAnalyser();
  const streamNode = ctx.createMediaStreamSource(stream);
  streamNode.connect(analyser);
  analyser.minDecibels = min_decibels;

  const data = new Uint8Array(analyser.frequencyBinCount); // will hold our data
  let silence_start = performance.now();
  let triggered = false; // trigger only once per silence event

  function loop(time) {
    requestAnimationFrame(loop); // we'll loop every 60th of a second to check
    analyser.getByteFrequencyData(data); // get current data
    if (data.some(v => v)) { // if there is data above the given db limit
      if(triggered){
        triggered = false;
        onSoundStart();
        }
      silence_start = time; // set it to now
    }
    if (!triggered && time - silence_start > silence_delay) {
      onSoundEnd();
      triggered = true;
    }
  }
  loop();
}

As a plunker, since neither StackSnippets nor jsfiddle's iframes will allow gUM in two versions...

来源：https://stackoverflow.com/questions/47277211/continuous-speech-recognition-on-browser-like-ok-google-or-hey-siri

标签

javascript

html5

google-chrome

speech-recognition

webspeech-api