问题
I am writing a program to help people learn guitar. To do this, I need to be able to look at a sample of time and see what note(s) they played. I looked at FFTW but I don't understand how to get this to work. I also tried to figure out the Goertzel algorithm but it seems like that is just for single-frequency notes like dial tones (not sure about that though). To be clear, I do need to be able to detect multiple notes (to see if a chord is played), but it doesn't matter too much if a few harmonics get in there too.
I'm coding this in C++, and would prefer a solution that is cross-platform.
UPDATE: I've realized it isn't so important to detect specific notes; what I really need is to check that certain frequencies are present, and that others aren't. For example, if someone plays a C, I want to check that a C frequency is present (about 262 Hz), as well as probably 524 Hz and 786 Hz, and check that nearby notes that are not near in the overtone series (like B and D) are not present.
回答1:
Notes are not present in a wav file. Sampled sound is.
Humans might perceive some notes that might have been played to create the sound in some wav file, but doing automatic polyphonic pitch estimation/recognition from recorded sound into transcribed music for rich and complex waveforms, such as produced by guitars, still appears to be an advanced research topic.
When possible for certain very restricted types of music sounds, some non-trivial DSP will be involved. FFTW might be useful for a small part of the more sophisticated DSP processing needed for pitch estimation, Goertzel filtering less so.
回答2:
I can't point you to specifics but I believe what you need would be a Fourier transform to detect the frequency you're looking for. There's also a similar question here
回答3:
What about this pdf? http://miracle.otago.ac.nz/tartini/papers/A_Smarter_Way_to_Find_Pitch.pdf
The problem with the FFT is that if you do a 256 sample FFT, you will get only 256 outputs. Essentially, what this means is that it will divide your your frequency space, where there are infinite number of frequencies, into a limited set of frequencies.
This is because if you only check 256 samples (256 can be replace by N, the number of samples used for the FFT), any frequency which is related by a multiple of 256 will look the same.
In other words, if you check 256 evenly spaced samples, taken at time 0, 1/256, 2/256, 3/256, ... 255/256. Then, the two signals sin(2 pi 80 x), which has frequency 80 cycles/sec, and sin(2 pi (80 + 9*256) x), which has frequency (80+9*256), will have the same samples.
Here, 9 can be replaced by k, the multiple to use. You could replace 9 with 1,2,3,4,5, etc. You can replace 256 (N) with any value as well.
As an example, sampling both at 200/256, one of the samples, we have: sin(2 pi (80 + 9*256) (200/256)) = sin(2 pi 80 (200/256) + 2 pi * 9 * 200)
Because multiples of 2 pi don't affect sin, this is the same as sin(2 pi 80 (200/256)).
More generically, sin(2 pi (M + k*N) j/N) = sin (2 pi M (j/N) + 2 pi k*j) = sin (2 pi M (j/N) ), where j is any integer 0,..., N - 1, N is the number of samples, (j/N) is the time to sample, M is the number of cycles per second, k is any integer ... -2, -1, 0, 1, 2 ...
From Nyquist sampling, if you want to distinguish, -128, -127, -126, -125, ..., 125, 126, 127 cycles per second you would take 256 samples/sec. 256 samples/sec means distinguishing 256 frequencies. However, 0 cycles/sec, 256 cycles/sec, 512 cycles/sec, 1024 cycles/sec would all look the same.
来源:https://stackoverflow.com/questions/11388053/how-to-get-a-list-of-notes-present-in-a-wav-file