How to end Google Speech-to-Text streamingRecognize gracefully and get back the pending text results?

后端 未结 3 1203
心在旅途
心在旅途 2020-12-20 20:58

I\'d like to be able to end a Google speech-to-text stream (created with streamingRecognize), and get back the pending SR (speech recognition) results.

In a nutshell,

相关标签:
3条回答
  • 2020-12-20 21:30

    This: "I'm looking for a potential workaround." - have you considered extending from SpeechClient as a base class? I don't have credential to test, but you can extend from SpeechClient with your own class and then call the internal close() method as needed. The close() method shuts down the SpeechClient and resolves the outstanding Promise.

    Alternatively you could also Proxy the SpeechClient() and intercept/respond as needed. But since your intent is to shut it down, the below option might be your workaround.

    const speech = require('@google-cloud/speech');
    
    class ClientProxy extends speech.SpeechClient {
      constructor() {
        super();
      }
      myCustomFunction() {
        this.close();
      }
    }
    
    const clientProxy = new ClientProxy();
    try {
      clientProxy.myCustomFunction();
    } catch (err) {
      console.log("myCustomFunction generated error: ", err);
    }
    
    0 讨论(0)
  • 2020-12-20 21:30

    Since it's a bug, I don't know if this is suitable for you but I have used this.recognizeStream.end(); several times in different situations and it worked. However, my code was a bit different...

    This feed may be something for you: https://groups.google.com/g/cloud-speech-discuss/c/lPaTGmEcZQk/m/Kl4fbHK2BQAJ

    0 讨论(0)
  • 2020-12-20 21:37

    My bad — unsurprisingly, this turned to be an obscure race condition in my code.

    I've put together a self-contained sample that works as expected (gist). It helped me tracking down the issue. Hopefully, it may help others and my future self:

    // A simple streamingRecognize workflow,
    // tested with Node v15.0.1, by @noseratio
    
    import fs from 'fs';
    import path from "path";
    import url from 'url'; 
    import util from "util";
    import timers from 'timers/promises';
    import speech from '@google-cloud/speech';
    
    export {}
    
    // need a 16-bit, 16KHz raw PCM audio 
    const filename = path.join(path.dirname(url.fileURLToPath(import.meta.url)), "sample.raw");
    const encoding = 'LINEAR16';
    const sampleRateHertz = 16000;
    const languageCode = 'en-US';
    
    const request = {
      config: {
        encoding: encoding,
        sampleRateHertz: sampleRateHertz,
        languageCode: languageCode,
      },
      interimResults: false // If you want interim results, set this to true
    };
    
    // init SpeechClient
    const client = new speech.v1p1beta1.SpeechClient();
    await client.initialize();
    
    // Stream the audio to the Google Cloud Speech API
    const stream = client.streamingRecognize(request);
    
    // log all data
    stream.on('data', data => {
      const result = data.results[0];
      console.log(`SR results, final: ${result.isFinal}, text: ${result.alternatives[0].transcript}`);
    });
    
    // log all errors
    stream.on('error', error => {
      console.warn(`SR error: ${error.message}`);
    });
    
    // observe data event
    const dataPromise = new Promise(resolve => stream.once('data', resolve));
    
    // observe error event
    const errorPromise = new Promise((resolve, reject) => stream.once('error', reject));
    
    // observe finish event
    const finishPromise = new Promise(resolve => stream.once('finish', resolve));
    
    // observe close event
    const closePromise = new Promise(resolve => stream.once('close', resolve));
    
    // we could just pipe it: 
    // fs.createReadStream(filename).pipe(stream);
    // but we want to simulate the web socket data
    
    // read RAW audio as Buffer
    const data = await fs.promises.readFile(filename, null);
    
    // simulate multiple audio chunks
    console.log("Writting...");
    const chunkSize = 4096;
    for (let i = 0; i < data.length; i += chunkSize) {
      stream.write(data.slice(i, i + chunkSize));
      await timers.setTimeout(50);
    }
    console.log("Done writing.");
    
    console.log("Before ending...");
    await util.promisify(c => stream.end(c))();
    console.log("After ending.");
    
    // race for events
    await Promise.race([
      errorPromise.catch(() => console.log("error")), 
      dataPromise.then(() => console.log("data")),
      closePromise.then(() => console.log("close")),
      finishPromise.then(() => console.log("finish"))
    ]);
    
    console.log("Destroying...");
    stream.destroy();
    console.log("Final timeout...");
    await timers.setTimeout(1000);
    console.log("Exiting.");
    

    The output:

    Writting...
    Done writing.
    Before ending...
    SR results, final: true, text:  this is a test I'm testing voice recognition This Is the End
    After ending.
    data
    finish
    Destroying...
    Final timeout...
    close
    Exiting.
    

    To test it, a 16-bit/16KHz raw PCM audio file is required. An arbitrary WAV file wouldn't work as is because it contains a header with metadata.

    0 讨论(0)
提交回复
热议问题