Google Speech API streaming audio exceeding 1 minute

问题

I would like to be able to extract utternaces of a person from a stream of telephone audio. The phone audio is routed to my server which then creates a streaming recognition request. How can I tell when a word exists as part of a complete utterance or is part of an utterance currently being transcribed? Should I compare timestamps between words? Will the API continue to return interim results even if there is no speech for a certain amount of time in the streaming phone audio? How can I exceed the 1-minute of streaming audio limit?

回答1:

About your first 3 questions:

You don’t need to compare timestamps between words, you can tell if a word is part of a complete utterance (final result) by looking at the is_final flag in the Streaming Recognition Result. If the flag is set to true, the response corresponds to a completed transcription, otherwise, it is an interim result. More on this here.

Once you get the final results, no interim results should be generated until new utterances are streamed.

Regarding your last question, you can’t exceed the 1 minute limit, you need to send multiple requests instead.

来源：https://stackoverflow.com/questions/52175187/google-speech-api-streaming-audio-exceeding-1-minute

标签

audio

google-cloud-platform

speech-to-text

google-speech-api

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!