Scream detection | 易学教程

问题

I'm working on a project that need to detect some voice patterns. for example "someone is screaming": since I do not know who is that person is,a child, men, women ... each have his own voice... etc.

So, I'm looking for a way to detect a "screaming" by for example, save as many fingerprints of "screaming" as possible, then when I need to check if a voice is a "screaming" voice, I may create a fingerprint for it, then search and see if I can find a similarity on the list of "screaming" fingerprints I already have.

My approach is to use something like the following projects:

https://github.com/AddictedCS/soundfingerprinting
https://github.com/spotify/echoprint-codegen

Each will give me a unique fingerprint of the specific voice, right?, My question is: How would I be able to search for a similarity on the list of "screaming" fingerprints, is there any possible way to generate score or return % of similarity to each fingerprint so I can decide if the voice i'm testing is by % or have a screaming?

Thanks, J.B

回答1:

My approach is to use something like the following projects:

Not very good idea, screaming is usually pretty stable sound while all those libraries search for irregularities in sound instead. They will not detect anything. It is better to use a simple DNN-LSTM classifier instead. You can train it with tensorflow or any other DNN framework. You can find a description of the algorithm here;

Deep Recurrent Neural Network-based Autoencoders for Acoustic Novelty Detection

or here:

Deep Neural Networks for Automatic Detection of Screams and Shouted Speech In Subway Trains

How would I be able to search for a similarity on the list of "screaming" fingerprints, is there any possible way to generate score or return % of similarity to each fingerprint so I can decide if the voice i'm testing is by % or have a screaming?

In your first library you can use queryResult.BestMatch.Confidence for example:

Confidence - returns a value between [0, 1]. A value below 0.15 is most probably a false positive. A value bigger than 0.15 is very likely to be an exact match. For good audio quality queries you can expect getting a confidence > 0.5.

来源：https://stackoverflow.com/questions/44760992/scream-detection

标签

speech-recognition

similarity

audio-fingerprinting