Using Gstreamer with Google speech API (Streaming Transcribe) in C++

强颜欢笑 提交于 2020-02-02 13:38:30


I am using the Google Speech API from cloud platform for getting speech-to-text of a streaming audio. I have already done the REST API calls using curl POST requests for a short audio file using GCP.

I have seen the documentation of the Google Streaming Recognize, which says "Streaming speech recognition is available via gRPC only."

I have gRPC (also protobuf) installed in my OpenSuse Leap 15.0. Here is the screenshot of the directory.

Next I am trying to run the streaming_transcribe example from this link, and I found that the sample program uses a local file as the input but simulate it as a microphone input (catching 64K chunks sequentially) and then send the data to Google server.

For initial tests to check the grpc is correctly set on my system I ran make run_tests. I have changed the Makefile as:

...Some text as original Makefile
.PHONY: all
all: streaming_transcribe $( 
      ar r $@ $?
streaming_transcribe: streaming_transcribe.o parse_arguments.o
      $(CXX) $^ $(LDFLAGS) -o $@
      ./streaming_transcribe -b 16000 resources/audio.raw
      ./streaming_transcribe --bitrate 16000 resources/audio2.raw
      ./streaming_transcribe resources/audio.flac
      ./streaming_transcribe resources/quit.raw
clean: rm -f *.o streaming_transcribe \ \

This do not work well (neither does the orignal Makefile). But the streaming_transcribe.o file is created after running the Makefile. So I manually ran the file and got the following responses

Any suggestions on how to run the test and use gstreamer instead of the function used for simulating the mic-phone audio?


how to run the test

Follow the instructions on cpp-docs-samples. Prerequisit - Install grpc, protobuf, and googleapis and setup the environment as saib in the links above.

gstreamer instead of the function used for simulating the mic-phone audio

For this program I have created pipelines which are

gst-launch-1.0 filesrc location=/path/to/file/FOO.wav ! wavparse ! audioconvert ! audio/x-raw,channels=1,depth=16,width=16,rate=44100 ! rtpL16pay  ! udpsink port=yyyy

The audio file can be changed to flac or mp3 with changing appropriate elemnets in pipeline

gst-launch-1.0 udpsrc port=yyyy ! "application/x-rtp,media=(string)audio, clock-rate=(int)44100, width=16, height=16, encoding-name=(string)L16, encoding-params=(string)1, channels=(int)1, channel-positions=(int)1, payload=(int)96" ! rtpL16depay ! audioconvert ! audio/x-raw,format=S16LE ! filesink location=/path/to/where/you/want/to/dump/the/rtp/payloads/ABC.raw

The process of taking payloads from rtp stream and writing it on file is done in another thread than sending the data to google and reading the response.


maybe a dedicated soundcard can listen to rtsp stream? with

try (SpeechClient speechClient = SpeechClient.create

RecognitionConfig config =

