Tensorflow weight initialization

前端未结

关注

 2  832

萌比男神i 2021-02-01 07:38

Regarding the MNIST tutorial on the TensorFlow website, I ran an experiment (gist) to see what the effect of different weight initializations would be on learning. I noticed tha

2条回答

余生分开走 (楼主)

2021-02-01 08:19

Logistic functions are more prone to vanishing gradient, because their gradients are all <1, so the more of them you multiply during back-propagation, the smaller your gradient becomes (and quite quickly), whereas RelU has a gradient of 1 on the positive part, so it does not have this problem.

Also, you network is not at all deep enough to suffer from that.

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...