I am watching some videos for Stanford CS231: Convolutional Neural Networks for Visual Recognition but do not quite understand how to calculate analytical gradient for softmax l
Not sure if this helps, but:
is really the indicator function , as described here. This forms the expression (j == y[i]) in the code.
(j == y[i])
Also, the gradient of the loss with respect to the weights is:
where
which is the origin of the X[:,i] in the code.
X[:,i]