优化算法(Optimization algorithms)
1.Mini-batch 梯度下降(Mini-batch gradient descent) batch gradient descent :一次迭代同时处理整个train data Mini-batch gradient descent: 一次迭代处理单一的mini-batch (X {t} ,Y {t} ) Choosing your mini-batch size : if train data m<2000 then batch ,else mini-batch=64~512 (2的n次方),需要多次尝试来确定mini-batch size A variant of this is Stochastic Gradient Descent (SGD), which is equivalent to mini-batch gradient descent where each mini-batch has just 1 example. The update rule that you have just implemented does not change. What changes is that you would be computing gradients on just one training example at a time, rather than on