问题
This is a problem I ran into using the Google Speech to Text Engine. I am currently streaming 16 bit / 16 kHz audio real time in 32kB chunks. But there is an average 25 second latency between sending audio and receiving transcripts, defeating the purpose of real time transcription.
Why is there such high latency?
回答1:
The Google Speech to Text documentation recommends using a 100 ms frame size to minimize latency.
32kB * (8 bits / 1 byte) * ( 1 sample / 16 bits ) * (1 sec / 16000 samples ) = 1 sec.
So try sending 3.2kB chunks instead. That dropped average latency from 25s to ~4s.
来源:https://stackoverflow.com/questions/51545598/25s-latency-in-google-speech-to-text