Batch size for Stochastic gradient descent is length of training data and not 1?
问题 I am trying to plot the different learning outcome when using Batch gradient descent, Stochastic gradient descent and mini-batch stochastic gradient descent. Everywhere i look, i read that a batch_size=1 is the same as having a plain SGD and a batch_size=len(train_data) is the same as having the Batch gradient descent. I know that stochastic gradient descent is when you use only one single data sample for every update and batch gradient descent uses the entire training data set to compute the