Split audio files using silence detection

后端 未结 3 766
无人共我
无人共我 2020-12-08 03:21

I\'ve more than 200 MP3 files and I need to split each one of them by using silence detection. I tried Audacity and WavePad but they do not have batch processes and it\'s ve

相关标签:
3条回答
  • 2020-12-08 03:44

    You can try using this for splitting audio on silence without the trouble of exploring possibilities for the silence threshold

    def split(file, filepath):
        sound = AudioSegment.from_wav(filepath)
        dBFS = sound.dBFS
        chunks = split_on_silence(sound, 
            min_silence_len = 500,
            silence_thresh = dBFS-16,
            keep_silence = 250 //optional
        )
    

    Note that the silence_thresh value need not be adjusted after using this.

    Additionally, if you want to split the audio by setting the min length of the audio chunk, you can add this after the above mentioned code.

    target_length = 25 * 1000 //setting minimum length of each chunk to 25 seconds
    output_chunks = [chunks[0]]
    for chunk in chunks[1:]:
        if len(output_chunks[-1]) < target_length:
            output_chunks[-1] += chunk
        else:
            # if the last output chunk is longer than the target length,
            # we can start a new one
            output_chunks.append(chunk)
    

    now we use output_chunks for further processing

    0 讨论(0)
  • 2020-12-08 04:01

    Having tested all of these solutions and none of them having worked for me I have found a solution that worked for me and is relatively fast.

    Prerequisites:

    1. It works with ffmpeg
    2. It is based on code by Vincent Berthiaume from this post (https://stackoverflow.com/a/37573133/2747626)
    3. It requires numpy (although it doesn't need much from numpy and a solution without numpy would probably be relatively easy to write and further increase speed)

    Mode of operation, rationale:

    1. The solutions provided here were based on AI, or were extremely slow, or loaded the entire audio into memory, which was not feasible for my purposes (I wanted to split the recording of all of Bach's Brandenburg Concertos into particular songs, the 2 LPs are 2 hours long, @ 44 kHz 16bit stereo that is 1.4 GB in memory and very slow). From the beginning when I stumbled upon this post I was telling myself that there must be a simple way as this is a mere threshold filter operation which doesn't need much overhead and could be accomplished on tiny chunks of audio at a time. A couple months later I stumbled upon https://stackoverflow.com/a/37573133/2747626 which gave me the idea to accomplish audio splitting relatively efficiently.
    2. The command line arguments give source mp3 (or whatever ffmpeg can read), silence duration and noise threshold value. For my Bach LP recording, 1 second junks of 0.01 of full amplitude did the trick.
    3. It lets ffmpeg convert the input to a lossless 16-bit 22kHz PCM and pass it back via subprocess.Popen, with the advantage that ffmpeg does so very fast and in little chunks which do not occupy much memory.
    4. Back in python, 2 temporary numpy arrays of the last and before last buffer are concatenated and checked if they surpass the given threshold. If they don't, it means there is a block of silence, and (naively I admit) simply count the time where there is "silence". If the time is at least as long as the given min. silence duration, (again naively) the middle of this current interval is taken as the splitting moment.
    5. The program actually doesn't do anything with the source file and instead creates a batch file that can be run that tells ffmpeg to take segments bounded by these "silences" and save them into separate files.
    6. The user can then run the output batch file, maybe filter through some repeating micro intervals with tiny chunks of silence in case there are long pauses between songs.
    7. This solution is both working and fast (none of the other solutions in this thread worked for me).

    The little code:

    import subprocess as sp
    import sys
    import numpy
    
    FFMPEG_BIN = "ffmpeg.exe"
    
    print 'ASplit.py <src.mp3> <silence duration in seconds> <threshold amplitude 0.0 .. 1.0>'
    
    src = sys.argv[1]
    dur = float(sys.argv[2])
    thr = int(float(sys.argv[3]) * 65535)
    
    f = open('%s-out.bat' % src, 'wb')
    
    tmprate = 22050
    len2 = dur * tmprate
    buflen = int(len2     * 2)
    #            t * rate * 16 bits
    
    oarr = numpy.arange(1, dtype='int16')
    # just a dummy array for the first chunk
    
    command = [ FFMPEG_BIN,
            '-i', src,
            '-f', 's16le',
            '-acodec', 'pcm_s16le',
            '-ar', str(tmprate), # ouput sampling rate
            '-ac', '1', # '1' for mono
            '-']        # - output to stdout
    
    pipe = sp.Popen(command, stdout=sp.PIPE, bufsize=10**8)
    
    tf = True
    pos = 0
    opos = 0
    part = 0
    
    while tf :
    
        raw = pipe.stdout.read(buflen)
        if raw == '' :
            tf = False
            break
    
        arr = numpy.fromstring(raw, dtype = "int16")
    
        rng = numpy.concatenate([oarr, arr])
        mx = numpy.amax(rng)
        if mx <= thr :
            # the peak in this range is less than the threshold value
            trng = (rng <= thr) * 1
            # effectively a pass filter with all samples <= thr set to 0 and > thr set to 1
            sm = numpy.sum(trng)
            # i.e. simply (naively) check how many 1's there were
            if sm >= len2 :
                part += 1
                apos = pos + dur * 0.5
                print mx, sm, len2, apos
                f.write('ffmpeg -i "%s" -ss %f -to %f -c copy -y "%s-p%04d.mp3"\r\n' % (src, opos, apos, src, part))
                opos = apos
    
        pos += dur
    
        oarr = arr
    
    part += 1    
    f.write('ffmpeg -i "%s" -ss %f -to %f -c copy -y "%s-p%04d.mp3"\r\n' % (src, opos, pos, src, part))
    f.close()
    
    0 讨论(0)
  • 2020-12-08 04:11

    I found pydub to be easiest tool to do this kind of audio manipulation in simple ways and with compact code.

    You can install pydub with

    pip install pydub
    

    You may need to install ffmpeg/avlib if needed. See this link for more details.

    Here is a snippet that does what you asked. Some of the parameters such as silence_threshold and target_dBFS may need some tuning to match your requirements. Overall, I was able to split mp3 files, although I had to try different values for silence_threshold.

    Snippet

    # Import the AudioSegment class for processing audio and the 
    # split_on_silence function for separating out silent chunks.
    from pydub import AudioSegment
    from pydub.silence import split_on_silence
    
    # Define a function to normalize a chunk to a target amplitude.
    def match_target_amplitude(aChunk, target_dBFS):
        ''' Normalize given audio chunk '''
        change_in_dBFS = target_dBFS - aChunk.dBFS
        return aChunk.apply_gain(change_in_dBFS)
    
    # Load your audio.
    song = AudioSegment.from_mp3("your_audio.mp3")
    
    # Split track where the silence is 2 seconds or more and get chunks using 
    # the imported function.
    chunks = split_on_silence (
        # Use the loaded audio.
        song, 
        # Specify that a silent chunk must be at least 2 seconds or 2000 ms long.
        min_silence_len = 2000,
        # Consider a chunk silent if it's quieter than -16 dBFS.
        # (You may want to adjust this parameter.)
        silence_thresh = -16
    )
    
    # Process each chunk with your parameters
    for i, chunk in enumerate(chunks):
        # Create a silence chunk that's 0.5 seconds (or 500 ms) long for padding.
        silence_chunk = AudioSegment.silent(duration=500)
    
        # Add the padding chunk to beginning and end of the entire chunk.
        audio_chunk = silence_chunk + chunk + silence_chunk
    
        # Normalize the entire chunk.
        normalized_chunk = match_target_amplitude(audio_chunk, -20.0)
    
        # Export the audio chunk with new bitrate.
        print("Exporting chunk{0}.mp3.".format(i))
        normalized_chunk.export(
            ".//chunk{0}.mp3".format(i),
            bitrate = "192k",
            format = "mp3"
        )
    

    If your original audio is stereo (2-channel), your chunks will also be stereo. You can check the original audio like this:

    >>> song.channels
    2
    
    0 讨论(0)
提交回复
热议问题