If we primarily use LSTMs over RNNs to solve the vanishing gradient problem, why can't we just use ReLUs/leaky ReLUs with RNNs instead?

后端未结

关注

 0  1192

We all knows that vanishing gradient problem occurs when we are using deep neural network with sigmoid and if we use relu , it solves this problem but it creates dead neuron