Tensorflow weight initialization

前端 未结 2 832
萌比男神i
萌比男神i 2021-02-01 07:38

Regarding the MNIST tutorial on the TensorFlow website, I ran an experiment (gist) to see what the effect of different weight initializations would be on learning. I noticed tha

2条回答
  •  余生分开走
    2021-02-01 08:19

    Logistic functions are more prone to vanishing gradient, because their gradients are all <1, so the more of them you multiply during back-propagation, the smaller your gradient becomes (and quite quickly), whereas RelU has a gradient of 1 on the positive part, so it does not have this problem.

    Also, you network is not at all deep enough to suffer from that.

提交回复
热议问题