Can Microsoft Bing Speech be configured to return only numbers / letters?

懵懂的女人 提交于 2019-12-10 19:29:35

问题


Can the Microsoft Bing Speech API be configured to only return numbers and letters, as opposed to full words?

The use case is translating Canadian postal codes. Ex. M 1 B 0 R 3. Microsoft may return "Em 1 Be 0 Are 3"

Our audio file is 8000hz and encoded with "M-ULAW". We have no flexibility in changing the sample rate or encoding. We are using the "SMD" scenario, but I can't find any documentation on what this does. Base request URI:

https://speech.platform.bing.com/recognize?scenarios=smd&appid=D4D52672-91D7-4C74-8AD8-42B1D98141A5&device.os=your_device_os&version=3.0

Is there a way to get a more accurate response from Microsoft for this use case?

Thank you


回答1:


You could try using Microsoft's Custom Speech Service (previously known as the Custom Recognition Intelligent Service, or CRIS) to create and use a custom language model.

The guidelines for transcription of custom language models say "Common acronyms can be left as a single entity without periods or spaces between the letters, but all other acronyms should be written out in separate letters, with each letter separated by a single space" and include this example:

Original text               After normalization
-----------------------     ---------------------------
play OU812 by Van Halen     play O U 8 1 2 by Van Halen

So following their guidelines, your custom language model will be a file where each line looks something like this:

M 1 B 0 R 3

You can easily generate a file containing thousands of examples of Canadian postal codes based on the structure of the codes, which in regular expression format looks like this:

[ABCEGHJKLMNPRSTVXY][0-9][ABCEGHJKLMNPRSTVWXYZ][0-9][ABCEGHJKLMNPRSTVWXYZ][0-9]

(The above expression is taken from this answer about validating postal codes.)

By doing this you're telling the recognizer what sort of things you're expecting people to say, and helping it choose when there are multiple possibilities for a sound (e.g. "U" vs. "you"). I think it will make a huge difference in the results you get.



来源:https://stackoverflow.com/questions/45312110/can-microsoft-bing-speech-be-configured-to-return-only-numbers-letters

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!