Find sound effect inside an audio file

后端未结

关注

 4  997

攒了一身酷 2021-01-15 03:12

I have a load of 3 hour MP3 files, and every ~15 minutes a distinct 1 second sound effect is played, which signals the beginning of a new chapter.

Is it possible to

4条回答

心在旅途 (楼主)

2021-01-15 04:01

This might not be an answer, it's just where I got to before I start researching the answers by @jonnor and @paul-john-leonard.

I was looking at the Spectrograms you can get by using librosa stft and amplitude_to_db, and thinking that if I take the data that goes in to the graphs, with a bit of rounding, I could potentially find the 1 sound effect being played:

https://librosa.github.io/librosa/generated/librosa.display.specshow.html

The code I've written below kind of works; although it:

Does return quite a few false positives, which might be fixed by tweaking the parameters of what is considered a match.
I would need to replace the librosa functions with something that can parse, round, and do the match checks in one pass; as a 3 hour audio file causes python to run out of memory on a computer with 16GB of RAM after ~30 minutes before it even got to the rounding bit.

import sys
import numpy
import librosa

#--------------------------------------------------

if len(sys.argv) == 3:
    source_path = sys.argv[1]
    sample_path = sys.argv[2]
else:
    print('Missing source and sample files as arguments');
    sys.exit()

#--------------------------------------------------

print('Load files')

source_series, source_rate = librosa.load(source_path) # The 3 hour file
sample_series, sample_rate = librosa.load(sample_path) # The 1 second file

source_time_total = float(len(source_series) / source_rate);

#--------------------------------------------------

print('Parse Data')

source_data_raw = librosa.amplitude_to_db(abs(librosa.stft(source_series, hop_length=64)))
sample_data_raw = librosa.amplitude_to_db(abs(librosa.stft(sample_series, hop_length=64)))

sample_height = sample_data_raw.shape[0]

#--------------------------------------------------

print('Round Data') # Also switches X and Y indexes, so X becomes time.

def round_data(raw, height):

    length = raw.shape[1]

    data = [];

    range_length = range(1, (length - 1))
    range_height = range(1, (height - 1))

    for x in range_length:

        x_data = []

        for y in range_height:

            # neighbours = []
            # for a in [(x - 1), x, (x + 1)]:
            #     for b in [(y - 1), y, (y + 1)]:
            #         neighbours.append(raw[b][a])
            #
            # neighbours = (sum(neighbours) / len(neighbours));
            #
            # x_data.append(round(((raw[y][x] + raw[y][x] + neighbours) / 3), 2))

            x_data.append(round(raw[y][x], 2))

        data.append(x_data)

    return data

source_data = round_data(source_data_raw, sample_height)
sample_data = round_data(sample_data_raw, sample_height)

#--------------------------------------------------

sample_data = sample_data[50:268] # Temp: Crop the sample_data (318 to 218)

#--------------------------------------------------

source_length = len(source_data)
sample_length = len(sample_data)
sample_height -= 2;

source_timing = float(source_time_total / source_length);

#--------------------------------------------------

print('Process series')

hz_diff_match = 18 # For every comparison, how much of a difference is still considered a match - With the Source, using Sample 2, the maximum diff was 66.06, with an average of ~9.9

hz_match_required_switch = 30 # After matching "start" for X, drop to the lower "end" requirement
hz_match_required_start = 850 # Out of a maximum match value of 1023
hz_match_required_end = 650
hz_match_required = hz_match_required_start

source_start = 0
sample_matched = 0

x = 0;
while x < source_length:

    hz_matched = 0
    for y in range(0, sample_height):
        diff = source_data[x][y] - sample_data[sample_matched][y];
        if diff < 0:
            diff = 0 - diff
        if diff < hz_diff_match:
            hz_matched += 1

    # print('  {} Matches - {} @ {}'.format(sample_matched, hz_matched, (x * source_timing)))

    if hz_matched >= hz_match_required:

        sample_matched += 1

        if sample_matched >= sample_length:

            print('      Found @ {}'.format(source_start * source_timing))

            sample_matched = 0 # Prep for next match

            hz_match_required = hz_match_required_start

        elif sample_matched == 1: # First match, record where we started

            source_start = x;

        if sample_matched > hz_match_required_switch:

            hz_match_required = hz_match_required_end # Go to a weaker match requirement

    elif sample_matched > 0:

        # print('  Reset {} / {} @ {}'.format(sample_matched, hz_matched, (source_start * source_timing)))

        x = source_start # Matched something, so try again with x+1

        sample_matched = 0 # Prep for next match

        hz_match_required = hz_match_required_start

    x += 1

#--------------------------------------------------

0 讨论(0)

查看其它4个回答