I am watching some videos for Stanford CS231: Convolutional Neural Networks for Visual Recognition but do not quite understand how to calculate analytical gradient for softmax l
I know this is late but here's my answer:
I'm assuming you are familiar with the cs231n Softmax loss function. We know that:
So just as we did with the SVM loss function the gradients are as follows:
Hope that helped.