Improving accuracy of Google Cloud Speech API

问题

I am currently recording audio from a web page on my Mac OS computer and running it through the cloud speech api to produce a transcript. However, the results aren't that accurate and there are chunks of missing words in the results.

Are there any steps that would help me yield more accurate results?

Here are the steps I am taking to convert audio to text:

Use Soundflower to channel audio output from my soundcard to mic in.
Play audio from website
Use quickTime player to record audio which is saved as a .m4a file.
Use the command line tool ffmpeg to convert the .m4a file to a .flac, and also combine 2 audio channels (stereo) to 1 audio channel (mono).
Upload the .flac file to Google Cloud Storage. The file has a sample rate of 44100Hz and has 24 bits per sample.
Use the longRunningRecognize api via the node.js client library, pointing to the file in Google cloud storage.

回答1:

From the Speech-to-Text API side, I would suggest you to verify your are following the Best Practices recommendations, such as avoid excessive background noise and multiple people talking at the same time since these aspects can affect the service recognition.

I think you have a good sampling rate and looseless codecs; However, keep in mind that the audio pre-processing can affect the audio quality. In these cases, it is preferred to avoid re-sampling, nevertheless, you can try by using different audio formats to verify which get the most accurate results.

Additionally, you can use the languageCode and phrase hints API properties that are commonly used to boost the recognition accuracy.

来源：https://stackoverflow.com/questions/51885317/improving-accuracy-of-google-cloud-speech-api

标签

node.js

ffmpeg

speech-to-text

google-cloud-speech