How to translate live streaming using google speech api?

限于喜欢 提交于 2019-12-03 16:59:24

问题


this is the google speech API docs: https://cloud.google.com/speech/docs/sync-recognize

i try this API for 2 weeks. but still cant solved my main purpose (translate live streaming).

I'm using PHP. (other languange suggestion is allowed, i will find by myself)

What i can do in my 2 weeks:

  1. Synchronous Speech Recognition (<=1min)

  2. Asynchronous Speech Recognition (>1min and <=80min). Note: i can modify this to accept 3hours video.

  3. Live speech recognition from mic : https://www.google.com/intl/en/chrome/demos/speech.html

  4. UPDATE: Perform streaming API with audio less than 6sec duration.

What can't i do is:

  1. How to translate live streaming. ex: radio streaming (delay is allowed)

  2. How to Translate when video/audio playing. (delay is allowed)

UPDATE:

i also ask the question on google github too. but since no answer, i ask here.

Summary:

i can perform speech streaming but only with 6 second audio. This is not like what i expected. My expectation is to recognize unlimited duration (seems we dont know when radio streaming will end).

Thank for any help. i very appreciate it

#UPDATE:

to approve that i cant use video longer than 6sec. so i write this:

i try this video interview.mp4 and convert it with ffmpeg to interview.flac using this ffmpeg -i interview.mp4 -c:a flac -ar 16000 -ac 1 -sample_fmt s16 interview.flac.

i use this library to transcribe the video using this command:

php speech.php transcribe --encoding FLAC --language-code en-US --sample-rate 16000 --stream interview.flac

and the result is:

  [Google\GAX\ApiException]
  Invalid 'audio_content': too long.

it cant be too long, because the video duration is only 48 sec. this is the meta from ffmpeg result:

Output #0, flac, to 'interview.flac':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf57.72.101
    Stream #0:0(und): Audio: flac, 16000 Hz, mono, s16, 128 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
      encoder         : Lavc57.92.100 flac
size=     810kB time=00:00:48.01 bitrate= 138.1kbits/s speed= 108x
video:0kB audio:801kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.019650%

回答1:


You need to use the StreamingRecognize API call. You can find an example of doing that in PHP here.



来源:https://stackoverflow.com/questions/44177012/how-to-translate-live-streaming-using-google-speech-api

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!