I\'m trying to train a sound classification model in CoreML. For my purpose I have ~9000 .mp3 files separated in three classes. The goal is to detect the tone in the spoken word