问题
What does the output of play $file stat -freq
mean?
I recently ran the command, here's a sample of the output:
$ play 44100Hz/3660/6517/3660-6517-0024.flac stat -freq
44100Hz/3660/6517/3660-6517-0024.flac:
File Size: 214k Bit Rate: 325k
Encoding: FLAC Info: Processed by SoX
Channels: 1 @ 16-bit
Samplerate: 44100Hz
Replaygain: off
Duration: 00:00:05.28
In:0.00% 00:00:00.00 [00:00:05.28] Out:0 [ | ] Clip:0 0.000000 0.412632
10.766602 0.430416
21.533203 0.750785
32.299805 0.839694
43.066406 0.989763
53.833008 0.435572
64.599609 0.404773
75.366211 0.048392
86.132812 0.025195
96.899414 0.011314
...
In:3.52% 00:00:00.19 [00:00:05.09] Out:4.10k [ | ] Clip:0 0.000000 0.889006
10.766602 0.092675
21.533203 0.785106
32.299805 1.693663
43.066406 0.990839
53.833008 0.044969
64.599609 0.096066
75.366211 0.121797
86.132812 0.256809
96.899414 0.122486
107.666016 0.019195
...
How am I meant to understand this?
I hope that this is some Fourier transform and the above output represents a table like
Frequency | Level
But I don't know if that's the really case, or what level would be measured in were that the case.
And what do the lines starting with In:%
mean? Ending with Clip:0 ...
.
Please can someone explain the output of this command to me.
回答1:
From man page here:
The −freq option calculates the input’s power spectrum (4096 point DFT) instead of the statistics listed above. This should only be used with a single channel audio file.
As you said, it is a Frequency / Level table. So the last frequency is more or less the half of your sampling rate. I tried it with a pure tone (generated in audacity) and it works quite well.
Be careful, if file length exceeds 4096 bytes per channel then you will see several sets of DFT, as the length of each DFT window is 4096. If so, then you will see several tables concatenated.
I don't have any '%'. Did you convert your audio file in mono as said in the documentation?
回答2:
from man page here:
stat [-s scale] [-rms] [-freq] [-v] [-d] Display time and frequency domain statistical information about the audio. Audio is passed unmodified through the SoX processing chain. The information is output to the 'standard error' (stderr) stream and is calculated, where n is the duration of the audio in samples, c is the number of audio channels, r is the audio sample rate, and x k represents the PCM value (in the range -1 to +1 by default) of each successive sample in the audio, as follows:
...
The -freq option calculates the input's power spectrum (4096 point DFT) instead of the statistics listed above.
...
来源:https://stackoverflow.com/questions/47452888/play-stat-freq-what-does-the-output-mean