25s Latency in Google Speech to Text

问题

This is a problem I ran into using the Google Speech to Text Engine. I am currently streaming 16 bit / 16 kHz audio real time in 32kB chunks. But there is an average 25 second latency between sending audio and receiving transcripts, defeating the purpose of real time transcription.

Why is there such high latency?

回答1:

The Google Speech to Text documentation recommends using a 100 ms frame size to minimize latency.

32kB * (8 bits / 1 byte) * ( 1 sample / 16 bits ) * (1 sec / 16000 samples ) = 1 sec.

So try sending 3.2kB chunks instead. That dropped average latency from 25s to ~4s.

来源：https://stackoverflow.com/questions/51545598/25s-latency-in-google-speech-to-text

标签

streaming

speech-to-text

google-cloud-speech

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!