I\'m writing some basic neural network methods - specifically the activation functions - and have hit the limits of my rubbish knowledge of math. I understand the respective ran
The word is (and I've tested) that in some cases it might be better to use the tanh than the logistic since
Y = 0
on the logistic times a weight w
yields a value near 0
which doesn't have much effect on the upper layers which it affects (although absence also affects), however a value near Y = -1
on tahn times a weight w
might yield a large number which has more numeric effect.1 - y^2
) yields values greater than the logistic (y (1 -y) = y - y^2
). For example, when z = 0
, the logistic function yields y = 0.5
and y' = 0.25
, for tanh y = 0
but y' = 1
(you can see this in general just by looking at the graph). MEANING that a tanh layer might learn faster than a logistic layer because of the magnitude of the gradient.