Which algorithm should I use for signal (sound) one class classification?

问题

Update this question was previously titled as "Give me the name of a simple algorithm for signal(sound) pattern detection"

My objective is to detect the presence of a given pattern in a noisy signal. I want to detect the presence of a species of insect recording the sounds with a microphone. I have previously recorded the sound of the insect in a digital format.
I am not trying to do voice recognition.
I am already using convolution between the input signal and the pattern to determine their similarity level. But I think that this technique is more suited to discrete time (i.e. digital communications, where signals occurs at fixed intervals) and to distinguish an input signal between 2 given patterns (I have only one pattern).
I am afraid to use neural networks, because I never used them, and I don't know if I could embed that code.

Could you please point me some other approaches, or try to convince me that my current approach still is a good idea or that neural networks may be a feasible way?

Update I already have 2 good answers, but another one would be welcome, and even rewarded.

回答1:

A step up from convolution is dynamic time warping which can be thought of as a convolution operator that stretches and shrinks one signal to optimally match another.

Perhaps a simpler approach would be to do an FFT of the sample and determine if your insect any particular frequencies that can be filtered on.

On the more complex side, but not quite a neural network, are SVM toolkits like libsvm and svmlight that you can throw your data at.

Regardless of the path you attempt, I would spend time exploring the nature of the sound your insect makes using tools like FFT. After all, it will be easier teaching a computer to classify the sound if you can do it yourself.

回答2:

Sound like a typical one class classification problem i.e. you want to search one thing in a large pool of other things you don't care about.

What you want to do is find a set of features or descriptors that you can calculate for every short piece of your raw recording that you can then match against the features your clean recording produces. I don't think convolution is neccessarily bad, though it is rather sensitive to noise so it might not be optimal for your case. What might actually work in your case is pattern matching on a binned fourier transform. You take the fourier transform of your signal, giving you a power vs frequency graph (rather than a power vs time graph) then you divide the frequency in bands and you take the average power for each band as a feature. If your data contains mostly white noise the patern you get from a raw insect sound of similar length will very closely match the pattern of your reference sound. This last trick has been used succesfully (with some windowing) to crack audio captcha's as used by google et al to make their sites accessible to the blind.

By the way, because your raw audio signal is digital (otherwise processing with a computer will not work ;-)) convolution is appropriate. You should perform the convolution between your reference signal and a sample of equal length from the raw input starting from each sample. So, if your reference signal has length N, and your raw sample has length M where M>=N then you should perform M-N+1=P convolutions between your reference signal and P samples from your raw input starting at 1..P. The best possibility for the location of the reference sound in the raw sample is the sample with the highest convolution score. Note that this becomes insanely time consuming very quickly.

Fourier transform based matching as I explained above using 50% overlapping samples from your raw data of twice the length of your reference sample would at least be faster (though not neccessarily better)

回答3:

Some more information is needed.

When you say noisy signal what is the background noise? Is it, to a first approximation, stationary (in a statistical sense, i.e. constant) or is it non-stationary (i.e. likely to contain other sounds, such as other animal calls etc?)

If the background noise is non-stationary then your best bet might be to use something called Independent Components Analysis which attempts to separate a given sound mixture into its component sources, you wouldn't even need the original recording of the insect itself. Lots of ICA software is linked from the Wikipedia page.

(Edit: ICA is a case of Blind Source Separation (BSS), there are many other ways of doing BSS and it might help to search for those as well.)

If however, the background noise is stationary then the problem is much easier (though still very hard):

In this case the approach I would use is as follows. Analyse the amplitude spectrum of a bit of the noise and the amplitude spectrum of your insect call. If you're lucky the insect call may, in general, be in a different frequency band to the noise. If so filter the incoming signal with suitable high-, low-, or band-pass filter.

You can then try comparing sections of your filtered signal that contain "more energy" than average with your (filtered) insect call. Possibly by using the image similarity algorithms suggested by A. Rex.

Edit: Since your background-noise is non-stationary then I can only suggest that searching for Blind Source Separation of non-Gaussian sources may lead you to some more algorithms. I'm afraid that the answer is that there is no simple algorithm that will do what you want.

回答4:

If I were you would start reading a little bit about Window Functions like Hamming window, this is a good starting point for sound recognition. (This is, of course, combined with Fourier Transformation)

回答5:

You can try a Matched Filter. Although I've never actually used one, I've heard good things.

Also, although not simple, I think a Hidden Markov Model (HMM, I know you said no speech recognition, but hear me out!) would provide the best results for you. Again, I've never actually used one but there are open source implementations available all over the place. You would just need to train it using your exisiting "clean" insect recording. Here is one open source implementation: General Hidden Markov Model Library.

回答6:

Admittedly this is not my area of expertise but my first thought is a recursive least squares filter - it performs autocorrelation. It's similar to the convolution filter you're using now but a bit more advanced. Kalman filtering is an extension of this - it's used to regenerate a signal from multiple noisy measurements so it's probably not useful in this case. I would not reject offhand neural networks - they're very useful at this sort of thing (provided you train them properly).

Thinking about this more in depth I would probably recommend using an FFT. Chances are the signal you're looking for is very band-limited, and you'd probably have more luck using a bandpass filter on the data then an FFT and finally using your simple convolution filter on that data instead of the time-domain data points. Or do both and have twice the data. I'm not heavy into math so I cant' tell you if you'll get significant (not linearly-dependent) results using this method but the only thing you're losing is time.

回答7:

You may be interested in a the MA Toolbox, a Matlab implementation of similarity measure(s).

I personally found this paper, General sound classification and similarity in MPEG-7, interesting. However, it might be behind a paywall (I don't know) and it might not be that useful in practice.

The GPL-ed framework Marsyas has a tool for machine learning classification, called kea. My guess is that this probably does not do what you want or is too much effort to hook up to.

My only idea otherwise is to take Fourier transforms, effectively transforming your sounds into grayscale images. Then use one of the many image similarity algorithms.

回答8:

A Naive Bayes Classifier may be worthwhile here, classifying sound samples into ones which contain your species of interest and ones which do not. It works quite well for complex phenomena; I once used it to decide if a given millimeter-wave RADAR data set contained an obstacle such as brush, a tank trap, etc. As for how to break up your continuous data into discrete chunks for the Bayesian classifier, you might just slide along the continuous data set and break off chunks equal in length to your insect sample. For example, if the sample you're comparing against is 2 seconds long, you might feed the discriminator 0-2s, 0.5-2.5s, 1-3s, etc. You'll need to train the discriminator, but that is a common requirement of any machine learning-based solution.

These sorts of approaches are about the only way to go if your insect species doesn't have a single, relatively distinct sound that you're looking for. Cross-correlation/convolution are of limited utility if you're looking for something more complex than a single sound which may be at higher or lower volume.

There are naive Bayes classifier implementations for several languages, such as nbc.

回答9:

You may want a Wiener filter approach.

回答10:

Google: FastICA algorithm. Some use ICA and Blind-Source Signal Separation Interchangeably. The author of the algorithm wrote a fantastic book on ICA that is around $40-$60 used on amazon.

回答11:

Goertzel - You can use it either for simple pattern detection, and for complicated frequencies separation. You can see the sample of my implementation in C#

来源：https://stackoverflow.com/questions/441438/which-algorithm-should-i-use-for-signal-sound-one-class-classification

标签

algorithm

audio

pattern-matching

signal-processing

pattern-recognition