问题
I have put together the following code using PyAudio for playing a wav file, following the official code examples, and I encounter consistent trouble getting a clean playback with it. The trouble is not specific to certain files, it ensues for audio wav files coming from multiple different sources and having different (standard) sample frequencies, all alike.
Details
With the synchronous stream writing variant demonstrated in the code below, most of the time the wav file will not play normally, but rather, the data will be written to the stream and handled by PyAudio without PyAudio making an noticeable sound, as if the stream reading is not synchronized with the writing from python and not really being processed; a wav file of ~1.5 seconds will finish processing in less than 1/10 of a second, producing sound only during that proportionally small span of time, and the same will happen with wav files containing many seconds of audio).
On the other hand, similarly yet differently ― with the explicitly evented variant also included in the code below ― the beginning second or so of audio will sound as if there are discontinuities in the playback, until playback becomes audibly correct (properly and contiguously sounding) during the remainder of the wav file. Before tweaking the sleep
call in my code, this problem was even more felt and frequent.
Whereas the severity/extent of both issues varies, e.g. the discontinuities felt on the first ~1 second of any file may be more severe sometimes than other times running over the same wav file, they are constantly reproducible.
The Code
Here's the code I'm using (inside it a flag variable controls which of the two approaches from the two I have just described, will be taken)
import logging
import pyaudio
import wave
import time
from threading import Event
class Speaking(ControllableThread): # nascent thread wrapper
def __init__(self, **kwds):
super().__init__(**kwds)
self.audio = pyaudio.PyAudio()
self.chunk_size = 1024 # number of samples per output stream push
def run(self):
self.start_actions()
while not self.shutdown_ask.is_set():
inbound_event = self.in_q.get(block=True)
if inbound_event == 'speak':
self._speak_pre_recorded('OSR_us_000_0010_8k.wav', Event())
#self._speak_pre_recorded('transcribing.downsampled.wav', Event())
#self._speak_pre_recorded('long_speech.downsampled.wav', Event())
print("returned from _speak_pre_recorded")
def _speak_pre_recorded(self, filename, stop_speach_ask):
direct_feed = False # whether to use the synchronous v.s. the evented API approach
logging.info("speaking")
audio_file = wave.open(f'nui/user_solicitation/speech/{filename}', 'rb')
print(audio_file.getnframes())
self.event_emitter.emit({
'time': time.time(),
'event': 'started speaking to user'
})
now = time.time()
if direct_feed:
stream = self.audio.open(
format=self.audio.get_format_from_width(audio_file.getsampwidth()),
channels=audio_file.getnchannels(),
rate=audio_file.getframerate(),
frames_per_buffer=self.chunk_size,
output=True # indicates that the sound will be played rather than recorded
)
# Read data in chunks and push to output audio stream
data = audio_file.readframes(self.chunk_size)
while len(data) > 0 and not stop_speach_ask.is_set():
print(f'speaking {len(data)} bytes, elapsed {time.time() - now}')
stream.write(data)
data = audio_file.readframes(self.chunk_size)
else:
# this approach works, but tuning the sleep time was crucial!
def callback(in_data, frame_count, time_info, status):
data = audio_file.readframes(self.chunk_size)
return data, pyaudio.paContinue
stream = self.audio.open(
format=self.audio.get_format_from_width(audio_file.getsampwidth()),
channels=audio_file.getnchannels(),
rate=audio_file.getframerate(),
frames_per_buffer=self.chunk_size,
stream_callback=callback,
output=True)
stream.start_stream()
while stream.is_active():
time.sleep(0.05)
stream.stop_stream()
stream.close()
audio_file.close()
print("done speaking")
self.event_emitter.emit({
'time': time.time(),
'event': 'finished speaking to user'
})
def finalize(self):
self.audio.terminate()
Is this trouble typical of any versions of PortAudio? Is there an official patched version that is free of these troubles?
These playback issues are not inherent in any of the wav files I have tested before coming here, they all sound perfect with Ubuntu's aplay
command-line.
Why I'm using a patched PortAudio
My version of PortAudio is however patched up. I have patched it as it seems as if my Ubuntu 18.04 shipped one crashes on recording (which I need to do elsewhere in my application). Here's how I patched it, inspired by this solution to that recording related problem. After having patched up like that, my earlier issues with audio recording have been solved give or take a frequent "input overflow" warning indicating that I'm perhaps pushing too fast into the recording buffer.
sudo apt-get remove libportaudio2
sudo apt-get install libasound2-dev
git clone -b alsapatch https://github.com/gglockner/portaudio
mv portaudio portaudio-patched
cd portaudio-patched
./configure --prefix=`$CONDA_PREFIX`
make
make install
sudo ldconfig
cd ..
There are various patch versions for that issue floating around the Internet, I could not find at the time of writing an indication for the underlying recording issue being solved in the PortAudio official code repository as of now.
Here's a publicly available example wav file, for which too, the issues reproduce on my machine: https://www.voiptroubleshooter.com/open_speech/american/OSR_us_000_0010_8k.wav
来源:https://stackoverflow.com/questions/62268931/pyaudio-playback-problems-with-patched-portaudio-on-ubuntu