Can the Google Speech API be configured to return only numbers / letters?

问题

Can the Google Speech API be configured to only return numbers and letters, as opposed to full words?

The use case is translating Canadian postal codes. Ex. M 1 B 0 R 3. Google may return "Em 1 Be 0 Are 3"

We have tried:

Using speechContexts and feeding in letters A - Z, as individual phrases. This improved the accuracy for us. We did not have much success passing in individual numbers (ex 1, 2, 3).
Specifying the codec and sample rate of our WAV file using the encoding and sampleRateHertz configuration options. We saw no improvement in doing this as we believe Google already does a great job of auto-recognizing the the sample rate and encoding.

Our audio file is 8000hz and encoded with "M-ULAW". We have no flexibility in changing the sample rate or encoding.

Is there a way to get a more accurate response from Google for this use case? Even ideas for better speechContexts phrases are welcome.

Thank you

回答1:

We are experiencing the same results, we would love to have a syntax based "context" suggestion or a parameter to force only digit return variable.

Changes in api version isn't fixing the way the digits are recognised, not even using model: phone_call.

What actually was better for recognising some kind of numbers, was to switch to en_US locale and that in turn forced the recognition engine to identify a list of numbers as a phone. So it was returned in phone-like syntax with +XXX-XXX-XXX-XXXX and this made detection really really good.

So I don't understand why Google has syntax matching behind the curtains and doesn't make it available through their api.

来源：https://stackoverflow.com/questions/45310657/can-the-google-speech-api-be-configured-to-return-only-numbers-letters

标签

google-api

google-cloud-platform

voice-recognition

google-speech-api

google-cloud-speech