I am working on the implementation of a deep learning model for sound event detection. The flow is as stated below:
I will input the mel-spectrogram features with dim