I\'m trying to predict a word from corresponding audio slices. I thought maybe a seq2seq model would be appropriate so i tried that out. I tried both this - blog.keras.io/a-