I have two different audio files, one has fs=8kHz, the other is 16kHz. According to scipy.signal.stft, the last axis of the resulting spectrogram corresponds to the time axi