问题
Can the Google Speech API be configured to only return numbers and letters, as opposed to full words?
The use case is translating Canadian postal codes. Ex. M 1 B 0 R 3. Google may return "Em 1 Be 0 Are 3"
We have tried:
- Using
speechContexts
and feeding in letters A - Z, as individual phrases. This improved the accuracy for us. We did not have much success passing in individual numbers (ex 1, 2, 3). - Specifying the codec and sample rate of our WAV file using the
encoding
andsampleRateHertz
configuration options. We saw no improvement in doing this as we believe Google already does a great job of auto-recognizing the the sample rate and encoding.
Our audio file is 8000hz and encoded with "M-ULAW". We have no flexibility in changing the sample rate or encoding.
Is there a way to get a more accurate response from Google for this use case? Even ideas for better speechContexts
phrases are welcome.
Thank you
回答1:
We are experiencing the same results, we would love to have a syntax based "context" suggestion or a parameter to force only digit return variable.
Changes in api version isn't fixing the way the digits are recognised, not even using model: phone_call.
What actually was better for recognising some kind of numbers, was to switch to en_US locale and that in turn forced the recognition engine to identify a list of numbers as a phone. So it was returned in phone-like syntax with +XXX-XXX-XXX-XXXX and this made detection really really good.
So I don't understand why Google has syntax matching behind the curtains and doesn't make it available through their api.
来源:https://stackoverflow.com/questions/45310657/can-the-google-speech-api-be-configured-to-return-only-numbers-letters