What audio formats are supported by Azure Cognitive Services' Speech Service (SST)?

问题

Bearing in mind that the Microsoft/Azure Cognitive Services' "Speech Service" is currently going through a rationalisation exercise, as far as I can tell from looking at

https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/rest-apis#speech-to-text

https://docs.microsoft.com/en-us/azure/cognitive-services/speech/home

only .wav binaries are acceptable, with anything else giving the response:

{"Message":"Unsupported audio format"}

Is there any other way to discover the acceptable audio formats/encodings/etc., or is this it?

[Bonus points for tips on preprocessing arbitrary/.m4a audio formats in python pydub so that they meet the bar - currently works for .mp3 but not for .m4a].

Thanks!

回答1:

The currently support format is single-channel (mono) WAV / PCM with a sampling rate of 16 kHz. More format and codec support will be added in future.

来源：https://stackoverflow.com/questions/51614216/what-audio-formats-are-supported-by-azure-cognitive-services-speech-service-ss

标签

python

speech-to-text

microsoft-cognitive

azure-cognitive-services

pydub

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!