How to get BPM and tempo audio features in Python [closed]

问题

I am involved in a project which requires me to extract song features like beats per minute (BPM), tempo, etc. However, I have not found a suitable Python library that can accurately detect these features.

Does anyone have any advice?

(In Matlab, I do know of a project called Mirtoolbox, which can give the BPM and tempo information after processing the local mp3 file.)

回答1:

Echo Nest API is what you are looking for:

http://echonest.github.io/remix/

Python bindings are rich, though installing Echo Nest can be pain as the team does not seem to be able to build solid installers.

However it does not do local processing. Instead, it calculates audio fingerprint and uploads the song for Echo Nest servers for the information extraction using algorithms they don't expose.

回答2:

This answer comes a year later, but anyway, for the record. I found three audio libraries with python bindings that extract features from audio. They are not that easy to install since they are really in C and you need to properly compile the python bindings and add them to the path to import, but here they are:

Yaafe
Aubio
LibXtract
Essentia

回答3:

i've found this code by @scaperot here that could help you:

import wave, array, math, time, argparse, sys
import numpy, pywt
from scipy import signal
import pdb
import matplotlib.pyplot as plt

def read_wav(filename):

    #open file, get metadata for audio
    try:
        wf = wave.open(filename,'rb')
    except IOError, e:
        print e
        return

    # typ = choose_type( wf.getsampwidth() ) #TODO: implement choose_type
    nsamps = wf.getnframes();
    assert(nsamps > 0);

    fs = wf.getframerate()
    assert(fs > 0)

    # read entire file and make into an array
    samps = list(array.array('i',wf.readframes(nsamps)))
    #print 'Read', nsamps,'samples from', filename
    try:
        assert(nsamps == len(samps))
    except AssertionError, e:
        print  nsamps, "not equal to", len(samps)

    return samps, fs

# print an error when no data can be found
def no_audio_data():
    print "No audio data for sample, skipping..."
    return None, None

# simple peak detection
def peak_detect(data):
    max_val = numpy.amax(abs(data)) 
    peak_ndx = numpy.where(data==max_val)
    if len(peak_ndx[0]) == 0: #if nothing found then the max must be negative
        peak_ndx = numpy.where(data==-max_val)
    return peak_ndx

def bpm_detector(data,fs):
    cA = [] 
    cD = []
    correl = []
    cD_sum = []
    levels = 4
    max_decimation = 2**(levels-1);
    min_ndx = 60./ 220 * (fs/max_decimation)
    max_ndx = 60./ 40 * (fs/max_decimation)

    for loop in range(0,levels):
        cD = []
        # 1) DWT
        if loop == 0:
            [cA,cD] = pywt.dwt(data,'db4');
            cD_minlen = len(cD)/max_decimation+1;
            cD_sum = numpy.zeros(cD_minlen);
        else:
            [cA,cD] = pywt.dwt(cA,'db4');
        # 2) Filter
        cD = signal.lfilter([0.01],[1 -0.99],cD);

        # 4) Subtractargs.filename out the mean.

        # 5) Decimate for reconstruction later.
        cD = abs(cD[::(2**(levels-loop-1))]);
        cD = cD - numpy.mean(cD);
        # 6) Recombine the signal before ACF
        #    essentially, each level I concatenate 
        #    the detail coefs (i.e. the HPF values)
        #    to the beginning of the array
        cD_sum = cD[0:cD_minlen] + cD_sum;

    if [b for b in cA if b != 0.0] == []:
        return no_audio_data()
    # adding in the approximate data as well...    
    cA = signal.lfilter([0.01],[1 -0.99],cA);
    cA = abs(cA);
    cA = cA - numpy.mean(cA);
    cD_sum = cA[0:cD_minlen] + cD_sum;

    # ACF
    correl = numpy.correlate(cD_sum,cD_sum,'full') 

    midpoint = len(correl) / 2
    correl_midpoint_tmp = correl[midpoint:]
    peak_ndx = peak_detect(correl_midpoint_tmp[min_ndx:max_ndx]);
    if len(peak_ndx) > 1:
        return no_audio_data()

    peak_ndx_adjusted = peak_ndx[0]+min_ndx;
    bpm = 60./ peak_ndx_adjusted * (fs/max_decimation)
    print bpm
    return bpm,correl


if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Process .wav file to determine the Beats Per Minute.')
    parser.add_argument('--filename', required=True,
                   help='.wav file for processing')
    parser.add_argument('--window', type=float, default=3,
                   help='size of the the window (seconds) that will be scanned to determine the bpm.  Typically less than 10 seconds. [3]')

    args = parser.parse_args()
    samps,fs = read_wav(args.filename)

    data = []
    correl=[]
    bpm = 0
    n=0;
    nsamps = len(samps)
    window_samps = int(args.window*fs)         
    samps_ndx = 0;  #first sample in window_ndx 
    max_window_ndx = nsamps / window_samps;
    bpms = numpy.zeros(max_window_ndx)

    #iterate through all windows
    for window_ndx in xrange(0,max_window_ndx):

        #get a new set of samples
        #print n,":",len(bpms),":",max_window_ndx,":",fs,":",nsamps,":",samps_ndx
        data = samps[samps_ndx:samps_ndx+window_samps]
        if not ((len(data) % window_samps) == 0):
            raise AssertionError( str(len(data) ) ) 

        bpm, correl_temp = bpm_detector(data,fs)
        if bpm == None:
            continue
        bpms[window_ndx] = bpm
        correl = correl_temp

        #iterate at the end of the loop
        samps_ndx = samps_ndx+window_samps;
        n=n+1; #counter for debug...

    bpm = numpy.median(bpms)
    print 'Completed.  Estimated Beats Per Minute:', bpm

    n = range(0,len(correl))
    plt.plot(n,abs(correl)); 
    plt.show(False); #plot non-blocking
    time.sleep(10);
plt.close();

回答4:

Librosa has the librosa.beat.beat_track() method, but you need to supply an estimate of the BMP as the "start_bpm" parameter. Not sure how accurate it is, but perhaps worth a shot.

回答5:

Well i recently came across Vampy which is wrapper plugin that enables you to use Vamp plugins written in Python in any Vamp host. Vamp is an audio processing plugin system for plugins that extract descriptive information from audio data. Hope it helps.

回答6:

librosa is the package you are looking for. It contains extensive range of functions for audio analysis. librosa.beat.beat_track() and librosa.beat.tempo() functions will extract the required features for you.

Spectral features like chroma, MFCC, Zero-crossing rate, and rhythm features such as tempogram can also be obtained using the functions available in librosa.

来源：https://stackoverflow.com/questions/8635063/how-to-get-bpm-and-tempo-audio-features-in-python

标签

python

audio

tempo