Wav comparison, same file

人走茶凉 提交于 2019-12-04 07:54:39

Have you looked at MARF? It is a well-documented Java library used for audio recognition.

It is used to recognize speakers (for transcription or securing software) but the same features should be able to be used to classify audio samples. I'm not familiar with it but it looks like you'd want to use the FeatureExtraction class to extract an array of features from each audio sample and then create a unique id.

For 16-bit audio, 3e-05 isn't really that different from zero. So a file of zeros is pretty much the same as a file of zeros (maybe missing equality by some tiny rounding errors.)

ADDED: For your comparison, read in and plot, using some Java plotting library, a portion of each of the two waveforms when they get past the portion that's mostly (close to) zero.

I think for debugging you better try use matlab to plot out. Since matlab is much more powerful in dealing with this problem.

You use "wavread" to the file, and "stft" to get the short time Fourier Transformation which is a complex number Matrix. Then simply abs(Matrix) to get the magnitude of each complex number. Show the image with imshow(abs(Matrix),[]).

I don't know how do you compare the whole file and 30s clip (by looking at the stft image?)

I don't know how are you comparing both audio files, but, seeing some service that offer music recognition (like TrackId or MotoID), these services take a small sample of the music you're hearing (10-20 secs), then process them in their server, i theorize that they have samples that long or less and that they have a database of (or calculate it on the fly) patterns of that samples (in your case Fourier Transforms), in your case, you may need to break your long audio file in chunks of or smaller size than your sample data, in the first case you may find a specific chunk that resembles more the pattern in your sample data, in the second case your smaller chunks may resamble a part of your sample data and you can calculate the probability that the sample data belongs to a respective audio file.

I think you are looking at Acoustic Fingerprinting It's hard, and there are libraries to do it. If you want to implement it yourself, this is a whitepaper on the shazam algorithm.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!