吴恩达第2课第2周编程习题
目标:使用mini―batch来加快学习速度;比较梯度下降,momentum,adam的效果
核心:指数加权平均值得计算及其意义,它是momentum,RMSProp,Adam算法的基石
不足:本例程没有使用学习率衰减的步骤,同时本例程只适于3层的二分法的神经网络
常记点:
1. 偏差修正时是除以,此处是-,t从1开始;
2. L=len(parameters) //2 ,这个L不等于网络层数,range(1,L+1)=range(1,len(layers_dims))
3. Adam算法求s时,需要平方(np.square),便于后面分母除根号(np.sqrt)
4. np.random.permutation(m),把range(m)重排列,用于把样本打乱,每一代都要打乱一次
5. arr[:,:]:逗号前面表示行的选取,后面表示列的选取
- ‘‘‘‘‘
- 1.
- 2.mini-batch
- 3.momentum
- 4.Adam
- ‘‘‘
- import
- import
- import
- import
- import
- import
- import
- import
- plt.rcParams[‘figure.figsize‘
- plt.rcParams[‘image.interpolation‘]=‘nearest‘
- plt.rcParams[‘image.cmap‘]=‘gray‘
- #,
- def
- #parametersWb
- forin#L1L
- ‘W‘+str(l)]=parameters[‘W‘+str(l)]-learning_rate*grads[‘dW‘
- ‘b‘+str(l)]=parameters[‘b‘+str(l)]-learning_rate*grads[‘db‘
- return
- ‘‘‘‘‘
- mini-batch
- ‘‘‘
- def
- forin
- #mini_batch_size
- if
- return
- ‘‘‘‘‘
- momentum
- ‘‘‘
- #v
- def
- forin
- ‘dW‘+str(l)]=np.zeros_like(parameters[‘W‘
- ‘db‘+str(l)]=np.zeros_like(parameters[‘b‘
- return
- #
- def
- forin
- ‘dW‘+str(l)]=beta*v[‘dW‘+str(l)]+(1-beta)*grads[‘dW‘
- ‘db‘+str(l)]=beta*v[‘db‘+str(l)]+(1-beta)*grads[‘db‘
- ‘W‘+str(l)]=parameters[‘W‘+str(l)]-learning_rate*v[‘dW‘
- ‘b‘+str(l)]=parameters[‘b‘+str(l)]-learning_rate*v[‘db‘
- return
- ‘‘‘‘‘
- Adam
- ‘‘‘
- #vs
- def
- forin
- ‘dW‘+str(l)]=np.zeros_like(parameters[‘W‘
- ‘db‘+str(l)]=np.zeros_like(parameters[‘b‘
- ‘dW‘+str(l)]=np.zeros_like(parameters[‘W‘
- ‘db‘+str(l)]=np.zeros_like(parameters[‘b‘
- return
- #
- def
- #t
- forin
- #
- ‘dW‘+str(l)]=beta1*v[‘dW‘+str(l)]+(1-beta1)*grads[‘dW‘
- ‘db‘+str(l)]=beta1*v[‘db‘+str(l)]+(1-beta1)*grads[‘db‘
- #
- ‘dW‘+str(l)]=v[‘dW‘
- ‘db‘+str(l)]=v[‘db‘
- #
- ‘dW‘+str(l)]=beta2*s[‘dW‘+str(l)]+(1-beta2)*np.square(grads[‘dW‘
- ‘db‘+str(l)]=beta2*s[‘db‘+str(l)]+(1-beta2)*np.square(grads[‘db‘
- #
- ‘dW‘+str(l)]=s[‘dW‘
- ‘db‘+str(l)]=s[‘db‘
- ‘W‘+str(l)]=parameters[‘W‘+str(l)]-learning_rate*(v_corrected[‘dW‘+str(l)]/np.sqrt(s_corrected[‘dW‘
- ‘b‘+str(l)]=parameters[‘b‘+str(l)]-learning_rate*(v_corrected[‘db‘+str(l)]/np.sqrt(s_corrected[‘db‘
- #vss=0sepsilon
- return
- ‘‘‘‘‘
- ‘‘‘
- def
- #paramvs
- if‘gd‘
- pass
- elif‘momentum‘
- elif‘adam‘
- else
- print(
- #
- forin
- #mini_batches
- forin
- #mini_batchX,Y
- #
- #
- #
- #
- if‘gd‘
- elif‘momentum‘
- elif‘adam‘
- if
- ifand
- print(+str(i)+‘:‘
- if
- ‘cost‘
- ‘epoch‘
- return
- ‘‘‘‘‘
- ‘‘‘
- "gd"
- "momentum"
- "adam"
- ‘‘‘‘‘
- adam
- ‘‘‘
原文:https://www.cnblogs.com/sytt3/p/9363326.html