Google Speech Recognition API Result is Empty

问题

I'm performing an asynchronous request to Google Cloud Speech API, and I do not know how to get the result of operation:

Request POST: https://speech.googleapis.com/v1beta1/speech:asyncrecognize

Body:

{
    "config":{
                 "languageCode" : "pt-BR",
                 "encoding" : "LINEAR16",
                 "sampleRate" : 16000
             },
     "audio":{
                 "uri":"gs://bucket/audio.flac"
             }
}

Which returns:

{ "name": "469432517" }

So, I do a POST: https://speech.googleapis.com/v1beta1/operations/469432517

Which returns:

{
    "name": "469432517",
    "metadata": {
        "@type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeMetadata",
        "progressPercent": 100,
        "startTime": "2016-08-11T21:18:29.985053Z",
        "lastUpdateTime": "2016-08-11T21:18:31.888412Z"
    },
    "done": true,
    "response": {
                    "@type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeResponse"
                }
}

I need to get the result of the operation: the transcribed text.

How can I do that?

回答1:

You've got the result of the operation and it is empty. The reason of the empty result is format mismatch. You should have submitted "LINEAR16" file (PCM uncompressed data, basically WAV file) and you try to submit FLAC (compressed format).

Other reason of the empty result might be incorrect sample rate, incorrect number of channels and so on.

Last, the file with pure silence will result in empty response.

回答2:

I got this issue also. The problem can be with the encoding and rate. Here is how I found what is the appropriate encoding and rate:

audio = types.RecognitionAudio(content = content )

ENCODING = [enums.RecognitionConfig.AudioEncoding.LINEAR16, enums.RecognitionConfig.AudioEncoding.FLAC,enums.RecognitionConfig.AudioEncoding.MULAW,enums.RecognitionConfig.AudioEncoding.AMR,enums.RecognitionConfig.AudioEncoding.AMR_WB,enums.RecognitionConfig.AudioEncoding.OGG_OPUS,enums.RecognitionConfig.AudioEncoding.SPEEX_WITH_HEADER_BYTE]

SAMPLE_RATE_HERTZ = [8000, 12000, 16000, 24000, 48000]
for enco in ENCODING:
    for rate in SAMPLE_RATE_HERTZ:
        config = types.RecognitionConfig(
            encoding=enco,
            sample_rate_hertz=rate,
            language_code='fa-IR')

        # Detects speech in the audio file
        response = []
        try:
            response = CLIENT.recognize(config, audio)
        except:
            pass
        print("-----------------------------------------------------")
        print(str(rate) + "   " + str(enco))
        print("response: ", str(response))

回答3:

Google Speech Recognition API Result could be Empty because parameters are incorrect. My suggestion is first to analyze audio properties, for instance with command line tools like ffmpeg.

Audio encoding formats list

Language codes info

My complete example:

$ ffmpeg -i 1515244791.flac -hide_banner

Input #0, flac, from '1515244791.flac':
  Metadata:
    ARTIST          : artist
    YEAR            : year
  Duration: 00:00:59.98, start: 0.000000, bitrate: 363 kb/s
    Stream #0:0: Audio: flac, 44100 Hz, mono, s16

then use the correct config:

import io
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types

LANG = "es-MX"
RATE = 44100
ENC = enums.RecognitionConfig.AudioEncoding.FLAC


def transcribe_streaming(stream_file):
    """Streams transcription of the given audio file."""

    client = speech.SpeechClient()

    with io.open(stream_file, 'rb') as audio_file:
        content = audio_file.read()

    # In practice, stream should be a generator yielding chunks of audio data.
    stream = [content]
    requests = (types.StreamingRecognizeRequest(audio_content=chunk)
                for chunk in stream)

    config = types.RecognitionConfig(
        encoding=ENC,
        sample_rate_hertz=RATE,
        language_code=LANG)
    streaming_config = types.StreamingRecognitionConfig(config=config)

    # streaming_recognize returns a generator.
    print(streaming_config)

    responses = client.streaming_recognize(streaming_config, requests)

    for response in responses:
        print(response)
        # Once the transcription has settled, the first result will contain the
        # is_final result. The other results will be for subsequent portions of
        # the audio.
        for result in response.results:
            print('Finished: {}'.format(result.is_final))
            print('Stability: {}'.format(result.stability))
            alternatives = result.alternatives
            # The alternatives are ordered from most likely to least.
            for alternative in alternatives:
                print('Confidence: {}'.format(alternative.confidence))
                print('Transcript: {}'.format(alternative.transcript))

So the transcription service works:

config {
  encoding: FLAC
  sample_rate_hertz: 44100
  language_code: "es-MX"
}

results {
  alternatives {
    transcript: "lo tienes que saber tienes derecho a recibir informaci\303\263n de todas las instituciones que reciben recursos p\303\272blicos M\303\251xico 4324 plataformadetransparencia.org.mx derecho Porque adem\303\241s de defender tu voto te atiende si no se respetan tus derechos pol\303\255tico-electorales imparten justicia cuando existen inconformidades en elecciones internas de partidos pol\303\255ticos comit\303\251s ciudadanos y consejos de los pueblos resuelve controversias en elecciones de autoridades en la Ciudad de M\303\251xico y en consulta ciudadana en tu elecci\303\263n MVS 102.5 espacio a las nuevas voces de la radio continuamos"
    confidence: 0.9409132599830627
  }
  is_final: true
}

Finished: True
Stability: 0.0
Confidence: 0.9409132599830627
Transcript: lo tienes que saber tienes derecho a recibir información de todas las instituciones que reciben recursos públicos México 4324 plataformadetransparencia.org.mx derecho Porque además de defender tu voto te atiende si no se respetan tus derechos político-electorales imparten justicia cuando existen inconformidades en elecciones internas de partidos políticos comités ciudadanos y consejos de los pueblos resuelve controversias en elecciones de autoridades en la Ciudad de México y en consulta ciudadana en tu elección MVS 102.5 espacio a las nuevas voces de la radio continuamos

来源：https://stackoverflow.com/questions/38906527/google-speech-recognition-api-result-is-empty

标签

google-api

speech-recognition

google-cloud-speech