Find sound effect inside an audio file

后端 未结 4 997
攒了一身酷
攒了一身酷 2021-01-15 03:12

I have a load of 3 hour MP3 files, and every ~15 minutes a distinct 1 second sound effect is played, which signals the beginning of a new chapter.

Is it possible to

4条回答
  •  心在旅途
    2021-01-15 04:01

    This might not be an answer, it's just where I got to before I start researching the answers by @jonnor and @paul-john-leonard.

    I was looking at the Spectrograms you can get by using librosa stft and amplitude_to_db, and thinking that if I take the data that goes in to the graphs, with a bit of rounding, I could potentially find the 1 sound effect being played:

    https://librosa.github.io/librosa/generated/librosa.display.specshow.html

    The code I've written below kind of works; although it:

    1. Does return quite a few false positives, which might be fixed by tweaking the parameters of what is considered a match.

    2. I would need to replace the librosa functions with something that can parse, round, and do the match checks in one pass; as a 3 hour audio file causes python to run out of memory on a computer with 16GB of RAM after ~30 minutes before it even got to the rounding bit.


    import sys
    import numpy
    import librosa
    
    #--------------------------------------------------
    
    if len(sys.argv) == 3:
        source_path = sys.argv[1]
        sample_path = sys.argv[2]
    else:
        print('Missing source and sample files as arguments');
        sys.exit()
    
    #--------------------------------------------------
    
    print('Load files')
    
    source_series, source_rate = librosa.load(source_path) # The 3 hour file
    sample_series, sample_rate = librosa.load(sample_path) # The 1 second file
    
    source_time_total = float(len(source_series) / source_rate);
    
    #--------------------------------------------------
    
    print('Parse Data')
    
    source_data_raw = librosa.amplitude_to_db(abs(librosa.stft(source_series, hop_length=64)))
    sample_data_raw = librosa.amplitude_to_db(abs(librosa.stft(sample_series, hop_length=64)))
    
    sample_height = sample_data_raw.shape[0]
    
    #--------------------------------------------------
    
    print('Round Data') # Also switches X and Y indexes, so X becomes time.
    
    def round_data(raw, height):
    
        length = raw.shape[1]
    
        data = [];
    
        range_length = range(1, (length - 1))
        range_height = range(1, (height - 1))
    
        for x in range_length:
    
            x_data = []
    
            for y in range_height:
    
                # neighbours = []
                # for a in [(x - 1), x, (x + 1)]:
                #     for b in [(y - 1), y, (y + 1)]:
                #         neighbours.append(raw[b][a])
                #
                # neighbours = (sum(neighbours) / len(neighbours));
                #
                # x_data.append(round(((raw[y][x] + raw[y][x] + neighbours) / 3), 2))
    
                x_data.append(round(raw[y][x], 2))
    
            data.append(x_data)
    
        return data
    
    source_data = round_data(source_data_raw, sample_height)
    sample_data = round_data(sample_data_raw, sample_height)
    
    #--------------------------------------------------
    
    sample_data = sample_data[50:268] # Temp: Crop the sample_data (318 to 218)
    
    #--------------------------------------------------
    
    source_length = len(source_data)
    sample_length = len(sample_data)
    sample_height -= 2;
    
    source_timing = float(source_time_total / source_length);
    
    #--------------------------------------------------
    
    print('Process series')
    
    hz_diff_match = 18 # For every comparison, how much of a difference is still considered a match - With the Source, using Sample 2, the maximum diff was 66.06, with an average of ~9.9
    
    hz_match_required_switch = 30 # After matching "start" for X, drop to the lower "end" requirement
    hz_match_required_start = 850 # Out of a maximum match value of 1023
    hz_match_required_end = 650
    hz_match_required = hz_match_required_start
    
    source_start = 0
    sample_matched = 0
    
    x = 0;
    while x < source_length:
    
        hz_matched = 0
        for y in range(0, sample_height):
            diff = source_data[x][y] - sample_data[sample_matched][y];
            if diff < 0:
                diff = 0 - diff
            if diff < hz_diff_match:
                hz_matched += 1
    
        # print('  {} Matches - {} @ {}'.format(sample_matched, hz_matched, (x * source_timing)))
    
        if hz_matched >= hz_match_required:
    
            sample_matched += 1
    
            if sample_matched >= sample_length:
    
                print('      Found @ {}'.format(source_start * source_timing))
    
                sample_matched = 0 # Prep for next match
    
                hz_match_required = hz_match_required_start
    
            elif sample_matched == 1: # First match, record where we started
    
                source_start = x;
    
            if sample_matched > hz_match_required_switch:
    
                hz_match_required = hz_match_required_end # Go to a weaker match requirement
    
        elif sample_matched > 0:
    
            # print('  Reset {} / {} @ {}'.format(sample_matched, hz_matched, (source_start * source_timing)))
    
            x = source_start # Matched something, so try again with x+1
    
            sample_matched = 0 # Prep for next match
    
            hz_match_required = hz_match_required_start
    
        x += 1
    
    #--------------------------------------------------
    

提交回复
热议问题