Gradient Descent vs Stochastic Gradient Descent algorithms
问题 I tried to train a FeedForward Neural Network on the MNIST Handwritten Digits dataset (includes 60K training samples). I each time iterated over all the training samples , performing Backpropagation for each such sample on every epoch. The runtime is of course too long. Is the algorithm I ran named Gradient Descent ? I read that for large datasets, using Stochastic Gradient Descent can improve the runtime dramatically. What should I do in order to use Stochastic Gradient Descent ? Should I