I am watching some videos for Stanford CS231: Convolutional Neural Networks for Visual Recognition but do not quite understand how to calculate analytical gradient for softmax l
Not sure if this helps, but:
is really the indicator function , as described here. This forms the expression (j == y[i])
in the code.
Also, the gradient of the loss with respect to the weights is:
where
which is the origin of the X[:,i]
in the code.
I know this is late but here's my answer:
I'm assuming you are familiar with the cs231n Softmax loss function. We know that:
So just as we did with the SVM loss function the gradients are as follows:
Hope that helped.