Sound.extract() result can be passed to SampleDataEvent.data
What are you trying to do?
I haven't used computeSpectrum()
before, but the first half of my career as a DSP engineer.
If it does what the docs say, then you don't need to autocorrelate the results.
In your byte array, the index represents the frequency bin, and the index value represents the magnitude of that particular frequency.
If by pitch detection, you mean find the strongest frequency, then you need to loop through the byte array and calculate the sqrt(left*left+right*right)
for each index. Find the max value of these. The index of the max value represents thr strongest frequency.
Assuming fs=44.1kHz, and i is your index, then the strongest frequency is
f = (i/255) * (44100 / 2);
Keep in mind that you are limited by the bin spacing for frequency resolution. If you need higher resolution, you need to interpolate the data.
Sounds like you already understand how to get an FFT spectrum, right?
http://flic.kr/p/7notw6
But if you're looking for the fundamental (green dot), you can't just use the highest peak. It's not necessarily the fundamental. In my example, the actual fundamental is 100 Hz, but the highest peak is 300 Hz.
There are a lot of different ways you could find the true fundamental, and each works better in different contexts. One thread on comp.dsp mentions "FFT, cepstrum, auto/cross-correlation, AMDF/ASDF".
For a simple example, each of the red dots is 100 Hz away from its neighbor, so if you used a peak-finding algorithm and then averaged together the distance between each harmonic and the next, you'd find the fundamental, but this would fail if any of the peaks were missed, or extra peaks included, or if the signal was symmetrical and only contained odd harmonics (1f, 3f, 5f). You'd need to find the mode and then throw away outliers and then average. This is probably an error-prone method.
You could also do an autocorrelation of the original waveform. Conceptually, this means sliding a copy of the waveform past itself, and finding the delay at which it best lines up with itself (which will be one complete cycle). In normal implementation, we use the FFT, though, to speed it up. Autocorrelation is basically
where * means complex conjugate, or time reversal. In Python, for instance:
correlation = fftconvolve(sig, sig[::-1], mode='full')
and the source for fftconvolve() is relatively simple: https://github.com/scipy/scipy/blob/master/scipy/signal/signaltools.py#L133
You can use the Harmonic Product Spectrum method to estimate the distance (frequency difference) between overtone peaks in a frequency spectrum (FFT results), even if some peaks are missing, as long as there are not too many spurious frequency peaks (noise).
To do a Harmonic Product Spectrum, print the FFT out on semi-transparent paper and roll it up into a cylinder (or do the equivalent in software). Wrap the cylinder tighter and tighter until the greatest amount of peaks overlap. The circumference will be a good estimate of the pitch. This works for any musical sounds that have lots of harmonics, even if a fundamental pitch frequency peak is missing or weak.