How to use a context window to segment a whole log Mel-spectrogram (ensuring the same number of segments for all the audios)?

后端 未结 1 637
渐次进展
渐次进展 2021-01-16 17:34

I have several audios with different duration. So I don\'t know how to ensure the same number N of segments of the audio. I\'m trying to implement an existing paper, so it\'

相关标签:
1条回答
  • 2021-01-16 17:48

    Loop over the frames along the time axis, moving forward 30 frames at a time, and extracting a window of last 64 frames. At the start and end you need to either truncate or pad the data to get full frames.

    import librosa
    import numpy as np
    import math
    
    audio_file = librosa.util.example_audio_file()
    y, sr = librosa.load(audio_file, sr=None, duration=5.0) # only load 5 seconds
    
    n_mels = 64
    n_fft = int(np.ceil(0.025*sr))
    win_length = int(np.ceil(0.025*sr))
    hop_length = int(np.ceil(0.010*sr))
    window = 'hamming'
    
    fmin = 20
    fmax = 8000
    
    S = librosa.core.stft(y, n_fft=n_fft, hop_length=hop_length, win_length=win_length, window=window, center=False)
    frames = np.log(librosa.feature.melspectrogram(y=y, sr=sr, S=S, n_mels=n_mels, fmin=fmin, fmax=fmax) + 1e-6)
    
    
    window_size = 64
    window_hop = 30
    
    # truncate at start and end to only have windows full data
    # alternative would be to zero-pad
    start_frame = window_size 
    end_frame = window_hop * math.floor(float(frames.shape[1]) / window_hop)
    
    for frame_idx in range(start_frame, end_frame, window_hop):
    
        window = frames[:, frame_idx-window_size:frame_idx]
        assert window.shape == (n_mels, window_size)
        print('classify window', frame_idx, window.shape)
    

    will output

    classify window 64 (64, 64)
    classify window 94 (64, 64)
    classify window 124 (64, 64)
    ...
    classify window 454 (64, 64)
    

    However the number of windows will depend on the length of the audio sample. So if it is important to only have the same number of windows, you need to make sure all audio samples are the same length.

    0 讨论(0)
提交回复
热议问题