Gooodfellow writes in his book Deep learning that if we want to train with softmax function and Maximum Log-Likelihood we have to maximize P(y=i;z) = log softmax(z)
P(y=i;z) = log softmax(z)