How to handle in_data in Pyaudio callback mode?

后端 未结 2 1249
[愿得一人]
[愿得一人] 2020-12-18 04:50

I\'m doing a project on Signal Processing in python. So far I\'ve had a little succes with the nonblocking mode, but it gave a considerable amount of delay and clipping to t

相关标签:
2条回答
  • 2020-12-18 05:20

    I had a similar issue trying to work with the PyAudio callback mode, but my requirements where:

    • Working with stereo output (2 channels).
    • Processing in real time.
    • Processing the input signal using an arbitrary impulse response, that could change in the middle of the process.

    I succeeded after a few tries, and here are fragments of my code (based on the PyAudio example found here):

    import pyaudio
    import scipy.signal as ss
    import numpy as np
    import librosa   
    
    
    
    track1_data, track1_rate = librosa.load('path/to/wav/track1', sr=44.1e3, dtype=np.float64)
    track2_data, track2_rate = librosa.load('path/to/wav/track2', sr=44.1e3, dtype=np.float64)
    track3_data, track3_rate = librosa.load('path/to/wav/track3', sr=44.1e3, dtype=np.float64)
    
    # instantiate PyAudio (1)
    p = pyaudio.PyAudio()
    count = 0
    IR_left = first_IR_left # Replace for actual IR
    IR_right = first_IR_right # Replace for actual IR
    
    # define callback (2)
    def callback(in_data, frame_count, time_info, status):
        global count
    
        track1_frame = track1_data[frame_count*count : frame_count*(count+1)]
        track2_frame = track2_data[frame_count*count : frame_count*(count+1)]
        track3_frame = track3_data[frame_count*count : frame_count*(count+1)]
    
        track1_left = ss.fftconvolve(track1_frame, IR_left)
        track1_right = ss.fftconvolve(track1_frame, IR_right)
        track2_left = ss.fftconvolve(track2_frame, IR_left)
        track2_right = ss.fftconvolve(track2_frame, IR_right)
        track3_left = ss.fftconvolve(track3_frame, IR_left)
        track3_right = ss.fftconvolve(track3_frame, IR_right)
    
        track_left = 1/3 * track1_left + 1/3 * track2_left + 1/3 * track3_left
        track_right = 1/3 * track1_right + 1/3 * track2_right + 1/3 * track3_right
    
        ret_data = np.empty((track_left.size + track_right.size), dtype=track1_left.dtype)
        ret_data[1::2] = br_left
        ret_data[0::2] = br_right
        ret_data = ret_data.astype(np.float32).tostring()
        count += 1
        return (ret_data, pyaudio.paContinue)
    
    # open stream using callback (3)
    stream = p.open(format=pyaudio.paFloat32,
                    channels=2,
                    rate=int(track1_rate),
                    output=True,
                    stream_callback=callback,
                    frames_per_buffer=2**16)
    
    # start the stream (4)
    stream.start_stream()
    
    # wait for stream to finish (5)
    while_count = 0
    while stream.is_active():
        while_count += 1
        if while_count % 3 == 0:
            IR_left = first_IR_left # Replace for actual IR
            IR_right = first_IR_right # Replace for actual IR
        elif while_count % 3 == 1:
            IR_left = second_IR_left # Replace for actual IR
            IR_right = second_IR_right # Replace for actual IR
        elif while_count % 3 == 2:
            IR_left = third_IR_left # Replace for actual IR
            IR_right = third_IR_right # Replace for actual IR
    
        time.sleep(10)
    
    # stop stream (6)
    stream.stop_stream()
    stream.close()
    
    # close PyAudio (7)
    p.terminate()
    

    Here are some important reflections about the code above:

    • Working with librosa instead of wave allows me to use numpy arrays for processing which is much better than the chunks of data from wave.readframes.
    • The data type you set in p.open(format= must match the format of the ret_data bytes. And PyAudio works with float32 at most.
    • Even index bytes in ret_data go to the right headphone, and odd index bytes go to the left one.

    Just to clarify, this code sends the mix of three tracks to the output audio in stereo, and every 10 seconds it changes the impulse response and thus the filter being applied. I used this for testing a 3d audio app I'm developing, and so the impulse responses where Head Related Impulse Responses (HRIRs), that changed the position of the sound every 10 seconds.


    EDIT:
    This code had a problem: the output had a noise of a frequency corresponding to the size of the frames (higher frequency when size of frames was smaller). I fixed that by manually doing an overlap and add of the frames. Basically, the ss.oaconvolve returned an array of size track_frame.size + IR.size - 1, so I separated that array into the first track_frame.size elements (which was then used for ret_data), and then the last IR.size - 1 elements I saved for later. Those saved elements would then be added to the first IR.size - 1 elements of the next frame. The first frame adds zeros.

    0 讨论(0)
  • 2020-12-18 05:25

    Found the answer to my question in the meantime, the callback looks like this:

    def callback(in_data, frame_count, time_info, flag):
        global b,a,fulldata #global variables for filter coefficients and array
        audio_data = np.fromstring(in_data, dtype=np.float32)
        #do whatever with data, in my case I want to hear my data filtered in realtime
        audio_data = signal.filtfilt(b,a,audio_data,padlen=200).astype(np.float32).tostring()
        fulldata = np.append(fulldata,audio_data) #saves filtered data in an array
        return (audio_data, pyaudio.paContinue)
    
    0 讨论(0)
提交回复
热议问题