In the cs231 Computing the Analytic Gradient with Backpropagation which is first implementing a Softmax Classifier, the gradient from (softmax + log loss)