I was trying to use a voice emotion detecton model on github HERE. Based on their examples, I was able to implement the following code to predict the final emotion of an aud