Open source code for voice detection and discrimination

前端 未结 8 2118
囚心锁ツ
囚心锁ツ 2021-01-31 17:49

I have 15 audio tapes, one of which I believe contains an old recording of my grandmother and myself talking. A quick attempt to find the right place didn\'t turn it up. I don

8条回答
  •  梦谈多话
    2021-01-31 18:47

    What you probably save you most of the time is speaker diarization. This works by annotating the recording with speaker IDs, which you can then manually map to real people with very little effort. The errors rates are typically at about 10-15% of record length, which sounds awful, but this includes detecting too many speakers and mapping two IDs to same person, which isn't that hard to mend.

    One such good tool is SHoUT toolkit (C++), even though it's a bit picky about input format. See usage for this tool from author. It outputs voice/speech activity detection metadata AND speaker diarization, meaning you get 1st and 2nd point (VAD/SAD) and a bit extra, since it annotates when is the same speaker active in a recording.

    The other useful tool is LIUM spkdiarization (Java), which basically does the same, except I haven't put enough effort in yet to figure how to get VAD metadata. It features a nice ready to use downloadable package.

    With a little bit of compiling, this should work in under an hour.

提交回复
热议问题