问题
I am using the Google Speech API from cloud platform for getting speech-to-text of a streaming audio. I have already done the REST API calls using curl POST
requests for a short audio file
using GCP.
I have seen the documentation of the Google Streaming Recognize, which says "Streaming speech recognition is available via gRPC only."
I have gRPC (also protobuf) installed in my OpenSuse Leap 15.0. Here is the screenshot of the directory.
Next I am trying to run the streaming_transcribe example from this link, and I found that the sample program uses a local file as the input but simulate it as a microphone input (catching 64K chunks sequentially) and then send the data to Google server.
For initial tests to check the grpc is correctly set on my system I ran make run_tests
. I have changed the Makefile as:
...
...Some text as original Makefile
...
.PHONY: all
all: streaming_transcribe
googleapis.ar: $(GOOGLEAPIS_CCS:.cc=.o)
ar r $@ $?
streaming_transcribe: streaming_transcribe.o parse_arguments.o googleapis.ar
$(CXX) $^ $(LDFLAGS) -o $@
run_tests:
./streaming_transcribe -b 16000 resources/audio.raw
./streaming_transcribe --bitrate 16000 resources/audio2.raw
./streaming_transcribe resources/audio.flac
./streaming_transcribe resources/quit.raw
clean: rm -f *.o streaming_transcribe \
googleapis.ar \
$(GOOGLEAPIS_CCS:.cc=.o)
This do not work well (neither does the orignal Makefile).
But the streaming_transcribe.o
file is created after running the Makefile. So I manually ran the file and got the following responses
Any suggestions on how to run the test and use gstreamer instead of the function used for simulating the mic-phone audio?
回答1:
how to run the test
Follow the instructions on cpp-docs-samples. Prerequisit - Install grpc, protobuf, and googleapis and setup the environment as saib in the links above.
gstreamer instead of the function used for simulating the mic-phone audio
For this program I have created pipelines which are
gst-launch-1.0 filesrc location=/path/to/file/FOO.wav ! wavparse ! audioconvert ! audio/x-raw,channels=1,depth=16,width=16,rate=44100 ! rtpL16pay ! udpsink host=xxx.xxx.xxx.xxx port=yyyy
The audio file can be changed to flac or mp3 with changing appropriate elemnets in pipeline
gst-launch-1.0 udpsrc port=yyyy ! "application/x-rtp,media=(string)audio, clock-rate=(int)44100, width=16, height=16, encoding-name=(string)L16, encoding-params=(string)1, channels=(int)1, channel-positions=(int)1, payload=(int)96" ! rtpL16depay ! audioconvert ! audio/x-raw,format=S16LE ! filesink location=/path/to/where/you/want/to/dump/the/rtp/payloads/ABC.raw
The process of taking payloads from rtp stream and writing it on file is done in another thread than sending the data to google and reading the response.
回答2:
maybe a dedicated soundcard can listen to rtsp stream? with
try (SpeechClient speechClient = SpeechClient.create
RecognitionConfig config =
RecognitionConfig.newBuilder()
.setEncoding(AudioEncoding.LINEAR16)
.setLanguageCode("en-US")
.setSampleRateHertz(44100)
.setAudioChannelCount(2)
.setEnableSeparateRecognitionPerChannel(true)
.build();
来源:https://stackoverflow.com/questions/54514814/using-gstreamer-with-google-speech-api-streaming-transcribe-in-c