I am building a tool which is supposed to run on a server and analyze sound files. I want to do this in Ruby as all my other tools are written in Ruby as well. But I am hav
Here's the final solution to what I was trying to achieve, thanks a lot to Randall Cook's helpful advice. The code to extract sound wave and FFT of a wav file in Ruby:
require "ruby-audio"
require "fftw3"
fname = ARGV[0]
window_size = 1024
wave = Array.new
fft = Array.new(window_size/2,[])
begin
buf = RubyAudio::Buffer.float(window_size)
RubyAudio::Sound.open(fname) do |snd|
while snd.read(buf) != 0
wave.concat(buf.to_a)
na = NArray.to_na(buf.to_a)
fft_slice = FFTW3.fft(na).to_a[0, window_size/2]
j=0
fft_slice.each { |x| fft[j] << x; j+=1 }
end
end
rescue => err
log.error "error reading audio file: " + err
exit
end
# now I can work on analyzing the "fft" and "wave" arrays...
I think there are two problems here. One is getting the samples, the other is performing the FFT.
To get the samples, there are two main steps: decoding and downmixing. To decode wav files, you just need to parse the header so you can know how to interpret the samples. For mp3 files, you'll need to do a full decode. Once the audio has been decoded, if you are not interested in processing the stereo channels separately, you may need to downmix it into mono, since the FFT expects a single channel as input. If you don't mind venturing outside of Ruby, the sox tool makes this easy. For example sox song.mp3 -b 16 song.raw channels 1
should convert an mp3 to a mono file of pure PCM samples (i.e. 16-bit integers). BTW, a quick search revealed the ruby/audio library (perhaps it is the one mentioned in your post). It looks pretty good, especially since it wraps libsndfile.
To perform the FFT, I see three options. One is to use this snippet of code that performs an FFT. I'm no Ruby expert, but it looks like it might be OK. The second option is to use NArray. It has a ton of mathematical methods, including FFTW, available in a separate module, a tarball for which is linked in the middle of the NArray page. The third option is to write your own FFT code. It's not an especially complicated algorithm, and could give you great experience with numerical processing in Ruby (if you need that).
You are probably aware of this, but the FFT expects complex input and generates complex output. Audio signals are real, of course, so the imaginary component of the input should always be zero (a + 0*i
). Since your input is real, the output will be symmetrical about the midpoint of the output array. You can safely ignore the upper half. If you want the energy in a particular frequency bin (they are spaced linearly up to half the sample rate), you'll need to compute the magnitude of the complex value (sqrt(real*real + imag*imag)
).
One more thing: Because frequency zero (the DC offset of the signal) and the Nyquist frequency (half the sample rate) have no phase components, some FFT implementations put them together into the same complex bin (one in the real component, one in the imaginary component, typically of the first bin). You can create some simple signals (all 1s for just a DC signal, and alternating +1, -1 for a Nyquist signal) and see what the FFT output looks like.