I\'m trying to reproduce "Towards end-to-end speech recognition with deep convolutional neural networks." (https://arxiv.org/abs/1701.02720). In the paper, the authors