Scream detection

我的梦境 提交于 2020-02-08 09:49:47

问题


I'm working on a project that need to detect some voice patterns. for example "someone is screaming": since I do not know who is that person is,a child, men, women ... each have his own voice... etc.

So, I'm looking for a way to detect a "screaming" by for example, save as many fingerprints of "screaming" as possible, then when I need to check if a voice is a "screaming" voice, I may create a fingerprint for it, then search and see if I can find a similarity on the list of "screaming" fingerprints I already have.

My approach is to use something like the following projects:

  • https://github.com/AddictedCS/soundfingerprinting
  • https://github.com/spotify/echoprint-codegen

Each will give me a unique fingerprint of the specific voice, right?, My question is: How would I be able to search for a similarity on the list of "screaming" fingerprints, is there any possible way to generate score or return % of similarity to each fingerprint so I can decide if the voice i'm testing is by % or have a screaming?

Thanks, J.B


回答1:


My approach is to use something like the following projects:

Not very good idea, screaming is usually pretty stable sound while all those libraries search for irregularities in sound instead. They will not detect anything. It is better to use a simple DNN-LSTM classifier instead. You can train it with tensorflow or any other DNN framework. You can find a description of the algorithm here;

Deep Recurrent Neural Network-based Autoencoders for Acoustic Novelty Detection

or here:

Deep Neural Networks for Automatic Detection of Screams and Shouted Speech In Subway Trains

How would I be able to search for a similarity on the list of "screaming" fingerprints, is there any possible way to generate score or return % of similarity to each fingerprint so I can decide if the voice i'm testing is by % or have a screaming?

In your first library you can use queryResult.BestMatch.Confidence for example:

Confidence - returns a value between [0, 1]. A value below 0.15 is most probably a false positive. A value bigger than 0.15 is very likely to be an exact match. For good audio quality queries you can expect getting a confidence > 0.5.



来源:https://stackoverflow.com/questions/44760992/scream-detection

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!