问题
I'm working on a project that need to detect some voice patterns. for example "someone is screaming": since I do not know who is that person is,a child, men, women ... each have his own voice... etc.
So, I'm looking for a way to detect a "screaming" by for example, save as many fingerprints of "screaming" as possible, then when I need to check if a voice is a "screaming" voice, I may create a fingerprint for it, then search and see if I can find a similarity on the list of "screaming" fingerprints I already have.
My approach is to use something like the following projects:
- https://github.com/AddictedCS/soundfingerprinting
- https://github.com/spotify/echoprint-codegen
Each will give me a unique fingerprint of the specific voice, right?, My question is: How would I be able to search for a similarity on the list of "screaming" fingerprints, is there any possible way to generate score or return % of similarity to each fingerprint so I can decide if the voice i'm testing is by % or have a screaming?
Thanks, J.B
回答1:
My approach is to use something like the following projects:
Not very good idea, screaming is usually pretty stable sound while all those libraries search for irregularities in sound instead. They will not detect anything. It is better to use a simple DNN-LSTM classifier instead. You can train it with tensorflow or any other DNN framework. You can find a description of the algorithm here;
Deep Recurrent Neural Network-based Autoencoders for Acoustic Novelty Detection
or here:
Deep Neural Networks for Automatic Detection of Screams and Shouted Speech In Subway Trains
How would I be able to search for a similarity on the list of "screaming" fingerprints, is there any possible way to generate score or return % of similarity to each fingerprint so I can decide if the voice i'm testing is by % or have a screaming?
In your first library you can use queryResult.BestMatch.Confidence for example:
Confidence - returns a value between [0, 1]. A value below 0.15 is most probably a false positive. A value bigger than 0.15 is very likely to be an exact match. For good audio quality queries you can expect getting a confidence > 0.5.
来源:https://stackoverflow.com/questions/44760992/scream-detection