I'm generating speech through Google Cloud's text-to-speech API and I'd like to highlight words as they are spoken.
Is there a way of getting timestamps for spoken words or sentences?
来源:https://stackoverflow.com/questions/55320826/google-cloud-text-to-speech-word-timestamps