Comparing two recorded voices

前端未结

关注

 3  2032

遥遥无期 2021-02-02 03:13

I need to find some literature in how to compare a realtime recorded voice (From a mic) against a database of pre-recorded voices. After comparing I would then need to output a

3条回答

后悔当初 (楼主)

2021-02-02 03:39
No expert in this field (so handle accordingly) but you should look at:
- Cepstral or Spectral analysis via DFFT or DFCT
- correlation coefficient from statistics
- FIR (finite impulse response) filters
How to approach?
1. filter voices
  
  recognizable speech minimum is up to 0.4-3.4 KHz (that is why these are used in old phone filters). Human voice is usually up to 12.7 KHz so if you are sure you have unfiltered recordings then filter up to 12.7 KHz and also take out the 50Hz or 60Hz from power lines
2. Make the dataset
  
  if you have recording of the same sentence to compare then you can just compute spectrum via DFFT or DFCT of the same tone/letter (for example start,middle,end). Filter out unused areas, make voice print dataset from the data. If not then you need to find similar tones/letters in recordings first for that you need speech recognition to be sure or find parts in recording that have similar properties. What they are you have to learn (by trial, or by researching speech recognition papers) here some hints: tempo,dynamic volume range,frequency ranges.
3. compare dataset
  
  numeric comparison is done by correlation coefficient which is pretty straightforward (and mine favorite) you can also use neural network for this (even bullet 2) also may be there is some FUZZY approach for this. I recommend to use correlation because its output is similar to what you want and it is deterministic so there are no problems with over/under learning or invalid architecture,etc ...
[edit1]

People are using also Furmant filters to generate vocals and speech. Their properties mimics human vocalization paths and the math behind them can be also used in speech recognition by inspecting the major frequencies of the filter you can detect vocal, intonation, tempo ... Which might be used for speech detection directly. However that is way outside my field of expertise but there are many papers about this out there so just google ...
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...