I receive from the fronted base64 data (I can encode it into bytes), which contains a human voice recorded through a microphone in realtim
base64
bytes