Extremely small or NaN values appear in training neural network

前端 未结 1 1126
感动是毒
感动是毒 2021-01-29 18:38

I\'m trying to implement a neural network architecture in Haskell, and use it on MNIST.

I\'m using the hmatrix package for linear algebra. My training framew

1条回答
  •  不思量自难忘°
    2021-01-29 19:00

    Do you know about "vanishing" and "exploding" gradients in backpropagation? I'm not too familiar with Haskell so I can't easily see what exactly your backprop is doing, but it does look like you are using a logistic curve as your activation function.

    If you look at the plot of this function you'll see that the gradient of this function is nearly 0 at the ends (as input values get very large or very small, the slope of the curve is almost flat), so multiplying or dividing by this during backpropagation will result in a very big or very small number. Doing this repeatedly as you pass through multiple layers causes the activations to approach zero or infinity. Since backprop updates your weights by doing this during training, you end up with a lot of zeros or infinities in your network.

    Solution: there are loads of methods out there that you can search for to solve the vanishing gradient problem, but one easy thing to try is to change the type of activation function you are using to a non-saturating one. ReLU is a popular choice as it mitigates this particular problem (but might introduce others).

    0 讨论(0)
提交回复
热议问题