问题
Looking for a code that would process media file to "Who said what and when" in other words a "Speaker by speaker Segmentation" and what timing for each. Failing answers: doing any manual works to process the media file..thanks!
回答1:
You can use speaker diarization from Kaldi, it is not easy to setup but results are great.
There are many other libraries too - LIUM, bob, etc.
来源:https://stackoverflow.com/questions/24457722/speaker-recognition-and-segmentation