问题
I have an audio sample of about 14 seconds in 8khz Sample Rate. Im using librosa to extract some features from this audio file.
y, sr = librosa.load(file_name)
stft = np.abs(librosa.stft(y, n_fft=n_fft))
# file_length = 14.650022675736961 #sec
# defaults
# n_fft =2048
# hop_length = 512 # win_length/4 = n_fft/4 = 512 (win_length = n_fft default)
#windowsTime = n_fft * Ts # (1/sr)
stft.shape
# (1025, 631)
Specshow :
librosa.display.specshow(stft, x_axis='time', y_axis='log')
[![stft sr = 22050][1]][1]
Now, i can understand the shape of the STFT
631 time bins = are 4 * ( file_length / Ts * windowsTime) #overlapping
1025 frequency bins = Frames frequency gap sr/n_fft.
so there are 1025 frequencies in 0 to sr/2(Nyquest)
what i cant understand is the different plot of two different sample rates with same ratios. 1 - 22050 as librosa default 2 - 8khz as sampling rate file
y2, sr = librosa.load(file_name, sr=None)
n_fft2 =743 # (same ratio to get same visuals for comparsion)
hop_length = 186 # (1/4 n_fft by default)
stft2 = np.abs(librosa.stft(y2, n_fft=n_fft2))
so ofc the shappe of stft will be different
stft2.shape
# (372, 634)
[![stft sr = 743][2]][2]
1. but why is the absoulte frequencies are not the same? its the same signal just not being oversampled so each sample is unique. what am i missing? is it the static y axis?
2. i couldnt understand the time bins values. im expecting bins in the number of frames when the first 1 is in the hop length and the second bin is windowTime from that point until the end of the file. but the units are wierd?
i want to be able to extract the magnitude of a specific Fbin in a specific time (frame) or additionally be able to sum some of those to get the magnitue for time RANGE.
Therefore, if i take stft[number of fBin] which is 1 row of 1025 fBins (stft[1025]) and look at it contents so stft[0] contains 630 points, which are exactly 630 time points for each frequency so each of the frames 1-1025 will have the same time points.
so if i take one sample which suits all the other fbins as well ( same time points) which is stft[0] i would be able to choose time frame and fBin and get the spcific magnitude:
times = librosa.core.frames_to_time(stft2[0], sr=sr2, n_fft=n_fft2, hop_length=hop_length)
fft_bin = 6
time_idx = 10
print('freq (Hz)', freqs[fft_bin])
print('time (s)', times[time_idx])
print('amplitude', stft[fft_bin, time_idx])
array([0.047375, 0.047625, 0.04825 , 0.04825 , 0.046875, 0.04675 , 0.05 , 0.051625, 0.051 , 0.048 , 0.05225 , 0.050375, 0.04925 , 0.04725 , 0.051625, 0.0465 , 0.05225 , 0.05 , 0.053 , 0.053875, 0.048 , 0.0485 , 0.047875, 0.04775 , 0.0485 , 0.049 , 0.051375, 0.047125, 0.051125, 0.047125, 0.04725 , 0.05025 , 0.05425 , 0.05475 , 0.051375, 0.060375, 0.050625, 0.04875 , 0.054125, 0.048 , 0.05025 , 0.052375, 0.04975 , 0.054125, 0.055625, 0.047125, 0.0475 , 0.047 , 0.049875, 0.05025 , 0.048375, 0.047 , 0.050625, 0.05 , 0.046625, 0.04925 , 0.048 , 0.049125, 0.05375 , 0.0545 , 0.04925 , 0.049125, 0.049125, 0.049625, 0.047 , 0.047625, 0.0535 , 0.051875, 0.05075 , 0.04975 , 0.047375, 0.049 , 0.0485 , 0.050125, 0.048 , 0.05475 , 0.05175 , 0.050125, 0.04725 , 0.0575 , 0.056875, 0.047 , 0.0485 , 0.055375, 0.04975 , 0.047 , 0.0495 , 0.051375, 0.04675 , 0.04925 , 0.052125, 0.04825 , 0.048125, 0.046875, 0.047 , 0.048625, 0.050875, 0.05125 , 0.04825 , 0.052125, 0.052375, 0.05125 , 0.049875, 0.048625, 0.04825 , 0.0475 , 0.048375, 0.050875, 0.052875, 0.0475 , 0.0485 , 0.05225 , 0.053625, 0.05075 , 0.0525 , 0.047125, 0.0485 , 0.048875, 0.049 , 0.0515 , 0.055875, 0.0515 , 0.05025 , 0.05125 , 0.054625, 0.05525 , 0.047 , 0.0545 , 0.052375, 0.049875, 0.051 , 0.048625, 0.0475 , 0.048 , 0.048875, 0.050625, 0.05375 , 0.051875, 0.048125, 0.052125, 0.048125, 0.051 , 0.052625, 0.048375, 0.047625, 0.05 , 0.048125, 0.050375, 0.049125, 0.053125, 0.053875, 0.05075 , 0.052375, 0.048875, 0.05325 , 0.05825 , 0.055625, 0.0465 , 0.05475 , 0.051125, 0.048375, 0.0505 , 0.04675 , 0.0495 , 0.04725 , 0.046625, 0.049625, 0.054 , 0.056125, 0.05175 , 0.050625, 0.050375, 0.047875, 0.047 , 0.048125, 0.048875, 0.050625, 0.049875, 0.047 , 0.0505 , 0.047 , 0.053125, 0.047625, 0.05025 , 0.04825 , 0.05275 , 0.051625, 0.05 , 0.051625, 0.05425 , 0.052 , 0.04775 , 0.047 , 0.049125, 0.05375 , 0.0535 , 0.04925 , 0.05125 , 0.046375, 0.04775 , 0.04775 , 0.0465 , 0.047 , 0.04675 , 0.04675 , 0.04925 , 0.05125 , 0.046375, 0.04825 , 0.0525 , 0.057875, 0.056375, 0.054375, 0.04825 , 0.0535 , 0.05475 , 0.0485 , 0.048875, 0.048625, 0.0485 , 0.047625, 0.046875, 0.0465 , 0.05125 , 0.054 , 0.05 , 0.048 , 0.047875, 0.0515 , 0.048125, 0.055875, 0.054875, 0.051625, 0.048125, 0.047625, 0.048375, 0.052875, 0.0485 , 0.0475 , 0.0495 , 0.05025 , 0.05675 , 0.0585 , 0.051625, 0.05625 , 0.0605 , 0.052125, 0.0495 , 0.049 , 0.047875, 0.051375, 0.054125, 0.0525 , 0.0515 , 0.057875, 0.055 , 0.05375 , 0.046375, 0.04775 , 0.0485 , 0.050125, 0.050875, 0.04925 , 0.049125, 0.0465 , 0.04975 , 0.053375, 0.05225 , 0.0475 , 0.046375, 0.05375 , 0.049875, 0.049875, 0.047375, 0.049125, 0.049375, 0.04875 , 0.048125, 0.05075 , 0.0505 , 0.046375, 0.047375, 0.048625, 0.0485 , 0.047125, 0.052625, 0.051125, 0.04725 , 0.050875, 0.053875, 0.0475 , 0.0495 , 0.051 , 0.055 , 0.053 , 0.050125, 0.04675 , 0.05375 , 0.054375, 0.04725 , 0.046875, 0.04925 , 0.04725 , 0.0495 , 0.05075 , 0.050875, 0.04775 , 0.05125 , 0.050125, 0.047875, 0.04825 , 0.046625, 0.0475 , 0.046375, 0.04775 , 0.05075 , 0.048125, 0.046375, 0.049625, 0.0495 , 0.04675 , 0.046625, 0.0475 , 0.04825 , 0.053 , 0.050875, 0.049 , 0.057875, 0.058875, 0.049875, 0.049125, 0.0475 , 0.05225 , 0.055 , 0.055375, 0.053875, 0.051125, 0.049875, 0.05025 , 0.050875, 0.049 , 0.0575 , 0.051875, 0.049375, 0.04775 , 0.051125, 0.050375, 0.0465 , 0.047375, 0.0465 , 0.046375, 0.048875, 0.051875, 0.047 , 0.047125, 0.047125, 0.046875, 0.049625, 0.048625, 0.051 , 0.049 , 0.046375, 0.049 , 0.056125, 0.054625, 0.047625, 0.046625, 0.0475 , 0.051875, 0.05175 , 0.047625, 0.050375, 0.055125, 0.05275 , 0.047125, 0.05325 , 0.060125, 0.056625, 0.053 , 0.052125, 0.047125, 0.04825 , 0.050375, 0.05025 , 0.048 , 0.046625, 0.047125, 0.04875 , 0.047 , 0.05525 , 0.0535 , 0.047 , 0.0495 , 0.0535 , 0.05125 , 0.046625, 0.0495 , 0.04675 , 0.04875 , 0.047125, 0.04975 , 0.047 , 0.049875, 0.046875, 0.047125, 0.048 , 0.046375, 0.0495 , 0.04975 , 0.05125 , 0.048375, 0.049125, 0.0515 , 0.048375, 0.052375, 0.051125, 0.046375, 0.047125, 0.050375, 0.0465 , 0.052375, 0.05375 , 0.04925 , 0.05025 , 0.0565 , 0.054875, 0.048 , 0.049375, 0.052625, 0.055375, 0.053375, 0.05075 , 0.048875, 0.05475 , 0.05075 , 0.0485 , 0.049125, 0.0475 , 0.047375, 0.047375, 0.047 , 0.052125, 0.053875, 0.049 , 0.052625, 0.0485 , 0.04675 , 0.04875 , 0.05 , 0.0545 , 0.05025 , 0.0495 , 0.0515 , 0.0485 , 0.05025 , 0.0465 , 0.0465 , 0.048375, 0.06375 , 0.10175 , 0.11975 , 0.118375, 0.121375, 0.12675 , 0.123 , 0.095375, 0.055 , 0.05525 , 0.04775 , 0.053125, 0.052375, 0.056625, 0.0565 , 0.046875, 0.048 , 0.05175 , 0.048 , 0.052 , 0.048 , 0.048 , 0.05175 , 0.05025 , 0.049625, 0.049625, 0.047375, 0.046625, 0.052375, 0.0555 , 0.051375, 0.050625, 0.052375, 0.050125, 0.048 , 0.052125, 0.052125, 0.0495 , 0.048875, 0.048 , 0.049875, 0.051125, 0.050625, 0.048 , 0.0465 , 0.048 , 0.04675 , 0.050875, 0.048 , 0.046625, 0.0495 , 0.050375, 0.046625, 0.0515 , 0.049875, 0.049625, 0.04675 , 0.049125, 0.05025 , 0.050375, 0.04725 , 0.047625, 0.047 , 0.051625, 0.0485 , 0.05225 , 0.046875, 0.0475 , 0.04825 , 0.050375, 0.05725 , 0.052375, 0.048 , 0.046375, 0.0475 , 0.0495 , 0.047875, 0.046375, 0.049875, 0.046875, 0.048 , 0.046875, 0.048625, 0.047125, 0.046625, 0.05 , 0.048875, 0.04675 , 0.050125, 0.05425 , 0.051375, 0.050125, 0.053375, 0.052 , 0.053875, 0.048 , 0.05575 , 0.049875, 0.052125, 0.048875, 0.047375, 0.048875, 0.049125, 0.047375, 0.047375, 0.047625, 0.0495 , 0.04825 , 0.047875, 0.04875 , 0.054 , 0.052125, 0.051 , 0.046625, 0.04925 , 0.05075 , 0.054375, 0.0555 , 0.051625, 0.046625, 0.052125, 0.055875, 0.047 , 0.053875, 0.050875, 0.0505 , 0.0465 , 0.053125, 0.050875, 0.050625, 0.051125, 0.050875, 0.056875, 0.04925 , 0.050625, 0.054125, 0.056625, 0.05025 , 0.0465 , 0.04675 , 0.049625, 0.047 , 0.048375, 0.047125, 0.04875 , 0.048375, 0.048875, 0.04775 , 0.04775 , 0.047 , 0.052125, 0.050875, 0.054 , 0.058375, 0.054 , 0.049125, 0.04675 , 0.051875, 0.05425 , 0.050125, 0.04675 , 0.047625, 0.046375, 0.05275 , 0.053 , 0.04875 , 0.049125, 0.047125, 0.049375, 0.0475 , 0.051125, 0.0495 , 0.052375, 0.047 , 0.047125, 0.050875])
[1]: https://i.imgur.com/OeKzvrb.png
[2]: https://i.imgur.com/ALtba5F.png
回答1:
Question 1:
You need to specify the sampling rate when using specshow
:
librosa.display.specshow(stft, x_axis='time', y_axis='log', sr=sr)
Otherwise the default value (22,050 Hz) will be used (see docs).
Question 2:
librosa.core.frames_to_time
does not take stft[0]
as argument, which would be the frequency bins of the first frame. Instead, it takes number of frames as first argument.
Imagine you have an audio signal with sr=10000
Hz. Then you run an STFT over it using n_fft=2000
and hop_length=1000
. Then you get one frame per hop and the hop is 0.1s long, because 10000 samples correspond to 1s and 1000 samples (1 hop) therefore correspond to 0.1s.
stft[0]
is not a frame number. Instead the first stft
is of shape (1 + n_fft/2, t)
(see here). This means the first dimension is the frequency bin and the second dimension is the frame number (t
).
The total number of frames in stft
is therefore stft.shape[1]
.
To get the length of the source audio, you could do:
time = librosa.core.frames_to_time(stft.shape[1], sr=sr, hop_length=hop_length, n_fft=n_fft)
来源:https://stackoverflow.com/questions/57058875/stft-understanding-using-librosa