Checking the gradient when doing gradient descent
问题 I'm trying to implement a feed-forward backpropagating autoencoder (training with gradient descent) and wanted to verify that I'm calculating the gradient correctly. This tutorial suggests calculating the derivative of each parameter one at a time: grad_i(theta) = (J(theta_i+epsilon) - J(theta_i-epsilon)) / (2*epsilon) . I've written a sample piece of code in Matlab to do just this, but without much luck -- the differences between the gradient calculated from the derivative and the gradient