The situation
I am using VAD (Voice Activity Detection) from WebRTC by using WebRTC-VAD, a Python adapter. The example implementation from the GitHub re
It seems that WebRTC-VAD, and the Python wrapper, py-webrtcvad, expects the audio data to be 16bit PCM little-endian - as is the most common storage format in WAV files.
librosa
and its underlying I/O library pysoundfile
however always returns floating point arrays in the range [-1.0, 1.0]
. To convertt this to bytes containing 16bit PCM you can use the following float_to_pcm16
function.
def float_to_pcm16(audio):
import numpy
ints = (audio * 32767).astype(numpy.int16)
little_endian = ints.astype('<u2')
buf = little_endian.tostring()
return buf
def read_pcm16(path):
import soundfile
audio, sample_rate = soundfile.read(path)
assert sample_rate in (8000, 16000, 32000, 48000)
pcm_data = float_to_pcm16(audio)
return pcm_data, sample_rate