Find sound effect inside an audio file

后端 未结 4 999
攒了一身酷
攒了一身酷 2021-01-15 03:12

I have a load of 3 hour MP3 files, and every ~15 minutes a distinct 1 second sound effect is played, which signals the beginning of a new chapter.

Is it possible to

4条回答
  •  梦毁少年i
    2021-01-15 03:46

    This is an Audio Event Detection problem. If the sound is always the same and there are no other sounds at the same time, it can probably be solved with a Template Matching approach. At least if there is no other sounds with other meanings that sound similar.

    The simplest kind of template matching is to compute the cross-correlation between your input signal and the template.

    1. Cut out an example of the sound to detect (using Audacity). Take as much as possible, but avoid the start and end. Store this as .wav file
    2. Load the .wav template using librosa.load()
    3. Chop up the input file into a series of overlapping frames. Length should be same as your template. Can be done with librosa.util.frame
    4. Iterate over the frames, and compute cross-correlation between frame and template using numpy.correlate.
    5. High values of cross-correlation indicate a good match. A threshold can be applied in order to decide what is an event or not. And the frame number can be used to calculate the time of the event.

    You should probably prepare some shorter test files which have both some examples of the sound to detect as well as other typical sounds.

    If the volume of the recordings is inconsistent you'll want to normalize that before running detection.

    If cross-correlation in the time-domain does not work, you can compute the melspectrogram or MFCC features and cross-correlate that. If this does not yield OK results either, a machine learning model can be trained using supervised learning, but this requires labeling a bunch of data as event/not-event.

提交回复
热议问题