问题
I know that questions about .wav files in Python have been just about beaten to death, but I am extremely frustrated as no one's answer seems to be working for me. What I'm trying to do seems relatively simple to me: I want to know exactly what frequencies there are in a .wav file at given times. I want to know, for example, "from the time n milliseconds to n + 10 milliseconds, the average frequency of the sound was x hertz". I have seen people talking about Fourier transforms and Goertzel algorithms, as well as various modules, that I can't seem to figure out how to get to do what I've described. I've tried looking up such things as "find frequency of a wav file in python" about twenty times to no avail. Can someone please help me?
What I'm looking for is a solution like this pseudocode, or at least one that will do something like what the pseudocode is getting at:
import some_module_that_can_help_me_do_this as freq
file = 'output.wav'
start_time = 1000 # Start 1000 milliseconds into the file
end_time = 1010 # End 10 milliseconds thereafter
print("Average frequency = " + str(freq.average(start_time, end_time)) + " hz")
Please assume (as I'm sure you can tell) that I'm an idiot at math. This is my first question here so be gentle
回答1:
If you'd like to detect pitch of a sound (and it seems you do), then in terms of Python libraries your best bet is aubio. Please consult this example for implementation.
import sys
from aubio import source, pitch
win_s = 4096
hop_s = 512
s = source(your_file, samplerate, hop_s)
samplerate = s.samplerate
tolerance = 0.8
pitch_o = pitch("yin", win_s, hop_s, samplerate)
pitch_o.set_unit("midi")
pitch_o.set_tolerance(tolerance)
pitches = []
confidences = []
total_frames = 0
while True:
samples, read = s()
pitch = pitch_o(samples)[0]
pitches += [pitch]
confidence = pitch_o.get_confidence()
confidences += [confidence]
total_frames += read
if read < hop_s: break
print("Average frequency = " + str(np.array(pitches).mean()) + " hz")
Be sure to check docs on pitch detection methods.
I also thought you might be interested in estimation of mean frequency and some other audio parameters without using any special libraries. Let's just use numpy! This should give you much better insight into how such audio features can be calculated. It's based off specprop from seewave package. Check docs for meaning of computed features.
import numpy as np
def spectral_properties(y: np.ndarray, fs: int) -> dict:
spec = np.abs(np.fft.rfft(y))
freq = np.fft.rfftfreq(len(y), d=1 / fs)
spec = np.abs(spec)
amp = spec / spec.sum()
mean = (freq * amp).sum()
sd = np.sqrt(np.sum(amp * ((freq - mean) ** 2)))
amp_cumsum = np.cumsum(amp)
median = freq[len(amp_cumsum[amp_cumsum <= 0.5]) + 1]
mode = freq[amp.argmax()]
Q25 = freq[len(amp_cumsum[amp_cumsum <= 0.25]) + 1]
Q75 = freq[len(amp_cumsum[amp_cumsum <= 0.75]) + 1]
IQR = Q75 - Q25
z = amp - amp.mean()
w = amp.std()
skew = ((z ** 3).sum() / (len(spec) - 1)) / w ** 3
kurt = ((z ** 4).sum() / (len(spec) - 1)) / w ** 4
result_d = {
'mean': mean,
'sd': sd,
'median': median,
'mode': mode,
'Q25': Q25,
'Q75': Q75,
'IQR': IQR,
'skew': skew,
'kurt': kurt
}
return result_d
回答2:
Try something along the below, it worked for me with a sine wave file with a freq of 1234 I generated from this page.
from scipy.io import wavfile
def freq(file, start_time, end_time):
sample_rate, data = wavfile.read(file)
start_point = int(sample_rate * start_time / 1000)
end_point = int(sample_rate * end_time / 1000)
length = (end_time - start_time) / 1000
counter = 0
for i in range(start_point, end_point):
if data[i] < 0 and data[i+1] > 0:
counter += 1
return counter/length
freq("sin.wav", 1000 ,2100)
1231.8181818181818
edited: cleaned up for loop a bit
来源:https://stackoverflow.com/questions/54612204/trying-to-get-the-frequencies-of-a-wav-file-in-python