I want to use SSML markers through the Google Cloud text-to-speech API to request the timing of these markers in the audio stream. These timestamps are necessary in order to pro
Looks like this is supported in Cloud Text-to-Speech API v1beta1
: https://cloud.google.com/text-to-speech/docs/reference/rest/v1beta1/text/synthesize#TimepointType
You can use https://texttospeech.googleapis.com/v1beta1/text:synthesize
. Set TimepointType
to SSML_MARK
. If this field is not set, timepoints are not returned by default.