问题
I've been stuck on this problem for days now and have looked through nearly every related StackOverflow page. Through this, I now have a much greater understanding of what FFT is and how it works. Despite this, I'm having extreme difficulties implementing it into my application.
In short, what I am trying to do is make a spectrum visualizer for my application (Similar to this). From what I've gathered, I'm pretty sure I need to use the magnitudes of the sound as the heights of my bars. So with all this in mind, currently I am able to analyze an entire .caf file all at once. To do this, I am using the following code:
let audioFile = try! AVAudioFile(forReading: soundURL!)
let frameCount = UInt32(audioFile.length)
let buffer = AVAudioPCMBuffer(PCMFormat: audioFile.processingFormat, frameCapacity: frameCount)
do {
try audioFile.readIntoBuffer(buffer, frameCount:frameCount)
} catch {
}
let log2n = UInt(round(log2(Double(frameCount))))
let bufferSize = Int(1 << log2n)
let fftSetup = vDSP_create_fftsetup(log2n, Int32(kFFTRadix2))
var realp = [Float](count: bufferSize/2, repeatedValue: 0)
var imagp = [Float](count: bufferSize/2, repeatedValue: 0)
var output = DSPSplitComplex(realp: &realp, imagp: &imagp)
vDSP_ctoz(UnsafePointer<DSPComplex>(buffer.floatChannelData.memory), 2, &output, 1, UInt(bufferSize / 2))
vDSP_fft_zrip(fftSetup, &output, 1, log2n, Int32(FFT_FORWARD))
var fft = [Float](count:Int(bufferSize / 2), repeatedValue:0.0)
let bufferOver2: vDSP_Length = vDSP_Length(bufferSize / 2)
vDSP_zvmags(&output, 1, &fft, 1, bufferOver2)
This works fine and outputs a long array of data. However, the problem with this code is it analyzes the entire audio file at once. What I need is to be analyzing the audio file as it is playing, very similar to this video: Spectrum visualizer.
So I guess my question is this: How do you perform FFT analysis while the audio is playing?
Also, on top of this, how do I go about converting the output of an FFT analysis to actual heights for a bar? One of the outputs I received for an audio file using the FFT analysis code from above was this: http://pastebin.com/RBLTuGx7. The only reason for the pastebin is due to how long it is. I'm assuming I average all these numbers together and use those values instead? (Just for reference, I got that array by printing out the 'fft' variable in the code above)
I've attempted reading through the EZAudio code, however I am unable to find how they are reading in samples of audio in live time. Any help is greatly appreciated.
回答1:
Here's how it is done in AudioKit, using EZAudio's FFT tools:
Create a class for your FFT that will hold the data:
@objc public class AKFFT: NSObject, EZAudioFFTDelegate {
internal let bufferSize: UInt32 = 512
internal var fft: EZAudioFFT?
/// Array of FFT data
public var fftData = [Double](count: 512, repeatedValue: 0.0)
...
}
Initialize the class and setup the FFT. Also install the tap on the appropriate node.
public init(_ input: AKNode) {
super.init()
fft = EZAudioFFT.fftWithMaximumBufferSize(vDSP_Length(bufferSize), sampleRate: 44100.0, delegate: self)
input.avAudioNode.installTapOnBus(0, bufferSize: bufferSize, format: AKManager.format) { [weak self] (buffer, time) -> Void in
if let strongSelf = self {
buffer.frameLength = strongSelf.bufferSize;
let offset: Int = Int(buffer.frameCapacity - buffer.frameLength);
let tail = buffer.floatChannelData[0];
strongSelf.fft!.computeFFTWithBuffer(&tail[offset], withBufferSize: strongSelf.bufferSize)
}
}
}
Then implement the callback to load your internal fftData array:
@objc public func fft(fft: EZAudioFFT!, updatedWithFFTData fftData: UnsafeMutablePointer<Float>, bufferSize: vDSP_Length) {
dispatch_async(dispatch_get_main_queue()) { () -> Void in
for i in 0...511 {
self.fftData[i] = Double(fftData[i])
}
}
}
AudioKit's implementation may change so you should check https://github.com/audiokit/AudioKit/ to see if any improvements were made. EZAudio is at https://github.com/syedhali/EZAudio
来源:https://stackoverflow.com/questions/34712707/perform-audio-analysis-with-fft