问题
I want to use SSML markers through the Google Cloud text-to-speech API to request the timing of these markers in the audio stream. These timestamps are necessary in order to provide cues for effects, word/section highlighting and feedback to the user.
I found this question which is relevant, although the question refers to the timestamps for each word and not the SSML <mark>
tag.
The following API request returns OK but shows the lack of the requested marker data. This is using the Cloud Text-to-Speech API v1
.
{
"voice": {
"languageCode": "en-US"
},
"input": {
"ssml": "<speak>First, <mark name=\"a\"/> second, <mark name=\"b\"/> third.</speak>"
},
"audioConfig": {
"audioEncoding": "mp3"
}
}
Response:
{
"audioContent":"//NExAAAAANIAAAAABcFAThYGJqMWA..."
}
Which only provides the synthesized audio without any contextual information.
Is there an API request that I am overlooking which can expose information about these markers such as is the case with IBM Watson and Amazon Polly?
来源:https://stackoverflow.com/questions/57381977/how-to-get-ssml-mark-timestamps-from-google-cloud-text-to-speech-api