Using Gstreamer with Google speech API (Streaming Transcribe) in C++

问题

I am using the Google Speech API from cloud platform for getting speech-to-text of a streaming audio. I have already done the REST API calls using curl POST requests for a short audio file using GCP.

I have seen the documentation of the Google Streaming Recognize, which says "Streaming speech recognition is available via gRPC only."

I have gRPC (also protobuf) installed in my OpenSuse Leap 15.0. Here is the screenshot of the directory.

Next I am trying to run the streaming_transcribe example from this link, and I found that the sample program uses a local file as the input but simulate it as a microphone input (catching 64K chunks sequentially) and then send the data to Google server.

For initial tests to check the grpc is correctly set on my system I ran make run_tests. I have changed the Makefile as:

...
...Some text as original Makefile
...
.PHONY: all
all: streaming_transcribe
googleapis.ar: $(GOOGLEAPIS_CCS:.cc=.o) 
      ar r $@ $?
streaming_transcribe: streaming_transcribe.o parse_arguments.o googleapis.ar
      $(CXX) $^ $(LDFLAGS) -o $@
run_tests:
      ./streaming_transcribe -b 16000 resources/audio.raw
      ./streaming_transcribe --bitrate 16000 resources/audio2.raw
      ./streaming_transcribe resources/audio.flac
      ./streaming_transcribe resources/quit.raw
clean: rm -f *.o streaming_transcribe \
       googleapis.ar \
       $(GOOGLEAPIS_CCS:.cc=.o)

This do not work well (neither does the orignal Makefile). But the streaming_transcribe.o file is created after running the Makefile. So I manually ran the file and got the following responses

Any suggestions on how to run the test and use gstreamer instead of the function used for simulating the mic-phone audio?

回答1:

how to run the test

Follow the instructions on cpp-docs-samples. Prerequisit - Install grpc, protobuf, and googleapis and setup the environment as saib in the links above.

gstreamer instead of the function used for simulating the mic-phone audio

For this program I have created pipelines which are

gst-launch-1.0 filesrc location=/path/to/file/FOO.wav ! wavparse ! audioconvert ! audio/x-raw,channels=1,depth=16,width=16,rate=44100 ! rtpL16pay  ! udpsink host=xxx.xxx.xxx.xxx port=yyyy

The audio file can be changed to flac or mp3 with changing appropriate elemnets in pipeline

gst-launch-1.0 udpsrc port=yyyy ! "application/x-rtp,media=(string)audio, clock-rate=(int)44100, width=16, height=16, encoding-name=(string)L16, encoding-params=(string)1, channels=(int)1, channel-positions=(int)1, payload=(int)96" ! rtpL16depay ! audioconvert ! audio/x-raw,format=S16LE ! filesink location=/path/to/where/you/want/to/dump/the/rtp/payloads/ABC.raw

The process of taking payloads from rtp stream and writing it on file is done in another thread than sending the data to google and reading the response.

回答2:

maybe a dedicated soundcard can listen to rtsp stream? with

try (SpeechClient speechClient = SpeechClient.create

RecognitionConfig config =
    RecognitionConfig.newBuilder()
        .setEncoding(AudioEncoding.LINEAR16)
        .setLanguageCode("en-US")
        .setSampleRateHertz(44100)
        .setAudioChannelCount(2)
        .setEnableSeparateRecognitionPerChannel(true)
        .build();

来源：https://stackoverflow.com/questions/54514814/using-gstreamer-with-google-speech-api-streaming-transcribe-in-c

标签

c++

speech-recognition

gstreamer

grpc

google-speech-api