Synchronize video subtitle with text-to-speech voice

亡梦爱人 提交于 2019-12-10 02:56:32

问题


I try to create a video of a text in which the text is narrated by text-to-speech.

To create the video file, I use the VideoFileWriter of Aforge.Net as the following:

VideoWriter = new VideoFileWriter();

VideoWriter.Open(CurVideoFile, (int)(Properties.Settings.Default.VideoWidth),
    (int)(Properties.Settings.Default.VideoHeight), 25, VideoCodec.MPEG4, 800000);

To read aloud the text I use SpeechSynthesizer class and write the output to a wave stream

AudioStream = new FileStream(CurAudioFile, FileMode.Create);
synth.SetOutputToWaveStream(AudioStream);

I want to highlight the word is spoken in the video, so I synchronize them by the SpeakProgress event:

void synth_SpeakProgress(object sender, SpeakProgressEventArgs e)
{

    curAuidoPosition = e.AudioPosition;
    using (Graphics g = Graphics.FromImage(Screen))
    {
         g.DrawString(e.Text,....); 
    }                    
    VideoWriter.WriteVideoFrame(Screen, curAuidoPosition);
}

And finally, I merge the video and audio using ffmpeg

using (Process process = new Process())
{
        process.StartInfo.FileName = exe_path;
        process.StartInfo.Arguments = 
            string.Format(@"-i ""{0}"" -i ""{1}"" -y -acodec copy -vcodec copy ""{2}""", avi_path, mp3_path, output_file);

        // ...
}

The problem is that for some voices like Microsoft Hazel, Zira and David, in windows 8.1 the video is not synchronized with the audio, and the audio is much faster than the shown subtitle. However, for the voices in windows 7, it works.

How can I synchronize them so that it works for any text-to-speech voices on any operating system?

It seems the e.AudioPosition is inaccurate as it is mentioned in Are the SpeakProgressEventArgs of the SpeechSynthesizer inaccurate? , I had the same experiment and the same result.

I have noticed if I adjust the audio format, I can be close to the actual time, however it doesn't work for any voice.

var formats = CurVoice.VoiceInfo.SupportedAudioFormats;
if (formats.Count > 0)
{
    var format = formats[0];
    reader.SetOutputToWaveFile(CurAudioFile, format);
}
else
{
     AudioStream = new FileStream(CurAudioFile, FileMode.Create);
     reader.SelectVoice(CurVoice.VoiceInfo.Name);
    var fmt = new SpeechAudioFormatInfo(16000, AudioBitsPerSample.Sixteen, AudioChannel.Mono);
    // this is more close but not precise yet
    MemStream = new MemoryStream();
    var mi = reader.GetType().GetMethod("SetOutputStream", BindingFlags.Instance | BindingFlags.NonPublic);
    mi.Invoke(reader, new object[] { MemStream, fmt, true, true }); 
 }

来源:https://stackoverflow.com/questions/33932390/synchronize-video-subtitle-with-text-to-speech-voice

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!