问题
I'm trying to detect some echoes in sound coming from the microphone. The echoes will be periodic and at one of two possible offsets. I've heard I need to auto-correlate the cepstrum of the signal in order to detect the presence of these echoes. Can you provide code using the Accelerate framework that shows how to detect echoes in the audio data?
回答1:
I'm not entirely sure why you'd auto correlate the cepstrum. Auto correlation, though, gives you a representation that is related to the cepstrum so I assume you want to just auto correlate your signal.
In its simplest form it is performed as follows:
int sample = 0;
int sampleMax = inSize;
while( sample < sampleMax )
{
vDSP_vsmul( pInput, 1, pInputSample, tempBuffer, 1, sampleMax );
const size_t kAutoCorrWritePos = outSize - sampleMax - sample;
vDSP_vsadd( &pOutput[kAutoCorrWritePos], 1, tempBuffer, 1, &pOutput[kAutoCorrWritePos], 1, sampleMax )
sample++;
}
This is, however, a very slow operation. Thankfully correlation can be performed in several different ways. The fastest method is to perform an FFT, the multiply the complex values by the conjugate of themselves and then inverse fft.
Or in iOS you have the nicely optimised vDSP_conv function:
std::vector< float > paddedBuffer( (inSize + inSize) - 1 );
memcpy( &paddedBuffer.front(), pInput, sizeof( float ) * inSize );
vDSP_conv( &paddedBuffer.front(), 1, (float*)pInput, 1, (float*)pOutput + (inSize - 1), 1, inSize, inSize );
// Reflect the auto correlation for the true output.
int posWrite = (inSize - 1);
int posRead = (inSize - 1);
while( posWrite > 0 )
{
posWrite--;
posRead++;
pOutput[posWrite] = pOutput[posRead];
}
So now you have your auto correlation, what d o you do with it?
Well firstly right in the middle you will have the highest peak. This is the zero lag point. What you then want to do is scan to the right of this central peak to identify the secondary peaks. If yo are looking for a specific peak at a specific offset you can, simply, check the number of samples on from the central peak and check if there is a peak there. If there isn't then the signal you are looking for is not there. If it is there, the signal is there.
Edit: Its worth noting that with a 512 sample wide window if the lag you are looking at is beyond about 128 it may not get enough of a correlation signal to be spottable. The correlation works by providing peaks at points of repetitive signals in the sample data. At a lag of 128 you have enough data for that point to repeat 4 times. At 256 you can only see the point repeat twice. This will affect the height of the correlation peak. After 256 you may not spot the peak at all against just random repeatability factors. That said, though, experiment with different window sizes to see what provides you with the most reliable results.
回答2:
Auto-correlation is basically a cross-correlation of a signal with itself, which is basically detecting of similarities in the signal itself, by a certain time delay. Which is actually a good idea to search for echoes in the signal. While I cannot give you the most accurate and correct solution, you shall be able to write your own using the information found on links below.
There are some answers on this already:
- Objective C - Cross-correlation for audio delay estimation
- https://dsp.stackexchange.com/questions/736/how-do-i-implement-cross-correlation-to-prove-two-audio-files-are-similar
- Perform autocorrelation with vDSP_conv from Apple Accelerate Framework
There is source code available for vDSP (Accelerate framework) on Github by Kunal Kandekar. It could be a good starting point.
https://github.com/kunalkandekar/vDSPxcorr
回答3:
If you know the echo delay length you may be able to construct a far more efficient filter:
https://dsp.stackexchange.com/questions/14951/trivial-echo-separation
Where are you reading to use the cepstrum?
来源:https://stackoverflow.com/questions/22305359/auto-correlating-the-cepstrum